# Sequence Alignment in EHR

##### Posted on Nov 12, 2020 (Update: Nov 13, 2020)
Tags: Sequence Alignment, EHR

Background: Sequence alignment is a way of arranging sequences to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to identify patients of similar disease trajectory for more relevant and precise prognosis, diagnosis and treatment of patients.

Methods: two cutting-edge global sequence alignment methods together with their local modifications,

• dynamic time warping (DTW), and DTW for local alignment (DTWL)
• Needleman-Wunsch algorithm (NWA) and Smith-Waterman algorithm (SWA)

## Background

EHR of a patient can be viewed as a temporal of medical events.

Question: which type of sequence alignment method works best for EHR data?

Goal: compare the strengths and limitations of both global and local sequence alignment methods and evaluate their impact on patient similarity calculation.

Challenging for several reasons:

• patient medical records are complex
• thousands of diagnosis codes
• semantic meaning
• varied data quality
• no gold standard data is available for evaluating sequence alignment algorithms
• it can be very subjective and expensive to ask experts, such as physicians to evaluate and rank the results from different sequence alignment methods

The Rochester Epidemiology Project (REP) was established in the mid-1960s by Dr. Leonard T. Kurland, which contains complete patient medical records.

The paper only considered diagnosis information.

### Synthesis of patient medical records

synthesize 20 new patient medical records by applying one or more deleting, updating and switching operations, for each of the 4 seed patients.

### Metrics for patient similarity

for multiple codes, use Jaccard index $J(X, Y)$ to measure the similarity

## Discussion

### Limitations

• only used diagnosis codes in the experiments
• only used a limited number of operations to create synthetic patients records, and only 4 seed patients and 20 synthesized patient medical records.
• used self-defined scoring system to quantitatively evaluate sequence alignment.

## Conclusions

DTW (or DTWL) seemed to align better and identify more similarities between patient medical records than NWA (or SWA).

Published in categories Note