Identifying Myocardial Infarction Patients Using Automated Text Mining in Family Practice Electronic Medical Recoreds: A Validation Study of Emerging Methods

Thumbnail Image
Mitiku, Tezeta
Primary Care , Myocardial Infaction
Background: As Electronic Medical Records(EMRs) are being utilized increasingly in primary care physician’s offices, the potential exists to collect a vast amount of clinical information for research purposes. However, variation in clinician documentation of diagnoses makes it challenging to accurately identify diseases within the EMR. We set out to develop and validate a text-mining tool to identify MI diagnoses within the EMR. Methods: We selected a random 5% sample from the 19,376 active adult patients in our EMR database. This sample of 1293 charts was reviewed by trained abstractors and used as the gold standard for the evaluation of the validity and reliability of the automated text-mining tool in identifying myocardial infarction patients. We also compared the results of the manual EMR abstraction with the hospitalization records of each patient to evaluate the validity and reliability of myocardial infarction diagnosis in the EMR. We manually reviewed all discordant records to investigate and categorize the reasons for discordance. Lastly, we compared the results of using administrative data versus EMR data for the measurement of selected MI quality indicators. Results: When compared with the gold standard of manual chart review, the text-mining tool had a sensitivity of 97.4% (95% confidence interval [CI] 94.8%–99.2%), a specificity of 96.2% (CI 94.9%–97.4%), a positive predictive value (PPV) of 88.9% (CI 85.5%–92.7%) and a negative predictive value (NPV) of 99.1% (CI 98.6%–99.3%). When compared with the current standard of hospital discharge abstracts the EMR manual chart review had a sensitivity of 94.9% (CI 92.0%–97.7%), a specificity of 91.7% (CI 90.0%–93.5%) and PPV of 71.6% (CI 66.6%–79.2%) and a NPV of 98.8% (CI 98.1%–99.0%). The assessment of MI quality indicators were the same whether measured using the EMR or administrative data, with the exception of the proportion of patients on ASA (p<0.001). Conclusion: The text-mining tool identified myocardial infarction diagnosis in the EMR with a high level of accuracy. In addition, EMRs may represent an important data source for a comprehensive identification of MI patients and the evaluation of quality of care.
External DOI