Sequence Alignment in EHR
Posted on (Update: )
This note is for Huang, M., Shah, N. D., & Yao, L. (2019). Evaluating global and local sequence alignment methods for comparing patient medical records. BMC Medical Informatics and Decision Making, 19(6), 263.
Background: Sequence alignment is a way of arranging sequences to identify the relatedness between two or more sequences and regions of similarity. For Electronic Health Records (EHR) data, sequence alignment helps to identify patients of similar disease trajectory for more relevant and precise prognosis, diagnosis and treatment of patients.
Methods: two cutting-edge global sequence alignment methods together with their local modifications,
- dynamic time warping (DTW), and DTW for local alignment (DTWL)
- Needleman-Wunsch algorithm (NWA) and Smith-Waterman algorithm (SWA)
EHR of a patient can be viewed as a temporal of medical events.
Question: which type of sequence alignment method works best for EHR data?
Goal: compare the strengths and limitations of both global and local sequence alignment methods and evaluate their impact on patient similarity calculation.
Challenging for several reasons:
- patient medical records are complex
- thousands of diagnosis codes
- semantic meaning
- varied data quality
- no gold standard data is available for evaluating sequence alignment algorithms
- it can be very subjective and expensive to ask experts, such as physicians to evaluate and rank the results from different sequence alignment methods
The Rochester Epidemiology Project (REP) was established in the mid-1960s by Dr. Leonard T. Kurland, which contains complete patient medical records.
The paper only considered diagnosis information.
Synthesis of patient medical records
synthesize 20 new patient medical records by applying one or more deleting, updating and switching operations, for each of the 4 seed patients.
Metrics for patient similarity
for multiple codes, use Jaccard index $J(X, Y)$ to measure the similarity
Pairwise global sequence alignment results
Pairwise local sequence alignment results
- only used diagnosis codes in the experiments
- only used a limited number of operations to create synthetic patients records, and only 4 seed patients and 20 synthesized patient medical records.
- used self-defined scoring system to quantitatively evaluate sequence alignment.
DTW (or DTWL) seemed to align better and identify more similarities between patient medical records than NWA (or SWA).