By: Jing Wen, Manasi Shrotri and Sharma VNL RIG Intern Researchers


Electronic Health Record (EHR) System

Healthcare electronic health record systems can be seen as a highly dynamic dataset containing important personal information. Various patient’s health and personal information has been stored in this system that lacks data integration. Models that focus on validating data integration and protecting data safety should be developed.

Health Level Seven (HL7) Messages

As Orion Health states, “HL7 standards act as a bridge between modern healthcare services and advancing information technology” [1]

By implementing HL7 standards, patients’ previous healthcare records can be easily accessed and transferred among healthcare providers, including clinics that are not integrated with the local hospital system.

To transfer the data electronically between various health care providers, HL7 messages are used. HL7 messages follow standard Human-readable ASCII format. The message consists of one or more segments, which are separated by a pipe (|) character and a segment can be divided into composites. If a composite contains another composite, then a (^) character is used as a separator. Each message type has a specific 3 letter code which triggers an event. Here MSH is the header and MFN stand for Master File Notification, which will support the distribution of changes to various master files between systems. An example of typical HL7 messages can be found below.


Data Integration and Validation

Despite having standard formats, HL7 has many data integration challenges. Integration problems mostly occur because of HL7 customization. Also, different vendors might use different versions and formats. With this variance across systems and standards, even though a specific standard for data format has been set, data integration still cannot be ensured and requires further validation.

Different validation methods can trigger multiple problems. Full Validation including terminology delays response times of production systems whereas individual data errors can cause subsequent missing data. However, a widely-used validation method has not been set up yet. Current HL7 message validation challenges include:

  • Most common errors need to be manually detected and corrected.
  • Data related to data with errors will be deleted without verifying the accuracy. (Accuracy cannot be easily evaluated manually.)

A potential solution for this problem is applying Functional Validation for data integration by protecting personal information and validating correctness of information. Solving possible problems that we have mentioned by using Functional Validation can reduce the error message count and decrease the manual efforts to correct these errors. A detailed solution is described in the following method section.


In HL7 validation, different segments in source and destination should exactly match. If there are any blank fields in source or destination systems, integration fails and saves the message in an error message queue. Here the analysts need to manually resend the message by correcting the blank fields. To overcome this problem, we can implement functional validation.

We create a function to predict the value of each segment in the destination system and assign trust to the segments by matching them with the segments in the source system. For example, if the source system has a Patient Identifier (PID) segment with values of their last name, first name, Medical Record Number and Admission Discharge Time (ADT) segment for the admission details of each patient. Now if in the destination system there is a missing value of the Medical Record Number in their PID segment, the integration message fails, but here we can compute the missing field by assigning the trust and create a function to assign the missing value. We can apply the RIG Dynamic Trust model and use a high trust level for data integration and controlling the weights assigned to the segments to predict the missing value segment. We first assign weights to each field in the segment that matches with the source and destination segments that are based on the trust level. Subsequently, this updates the missing value in the destination system with the value in the source system to reduce the error messages during integration. In our example we can assign equal weights to Last Name, First Name and the admission details and assign a threshold value for the weights for computing the missing Medical Record Number. In a similar manner we can assign the thresholds and weights for each segment to compute the missing value.  Thresholds can be assigned based on the segment matching and the trust level with the source and destination systems.

From four years of experience working in the Healthcare industry we observed that every day there will be millions of records that need to be integrated with different systems and most of them will fail because of the blank fields. As a result, the analyst needs to manually correct these messages and resend them. However, by applying functional validation for HL7 integration to predict the missing value in the target destination, we can reduce the efforts of the integration analysts who need to manually correct the errors when the integration message fails.

To learn more please visit