Automated Data Extraction for Clinical Databases using Natural Language Processing

Abstract

Electronic health records (EHRs) are rich sources of multimodal clinical data, yet remain underutilized due to their largely unstructured nature and the manual effort required to extract structured information. This work focuses on automating population of the Society of Thoracic Surgeons Adult Cardiac Surgery Database (STS ACSD), a widely used clinical registry with over 1,000 structured variables. Currently, almost all US cardiac surgery programs populate the STS database regularly, a process requiring extensive manual work and resources. We present an AI-assisted pipeline trained and validated on Mass General Brigham (MGB) data, that can automatically populate 49.5% of the registry with accuracy exceeding 99%. External validation at Hartford Healthcare shows similar accuracy with a 43.2% completion rate. Our results highlight the potential of AI to automate EHR data abstraction at scale, enhancing the utility, scope, and availability of EHR-derived data for a variety of downstream applications.

Date
Oct 16, 2023 10:00 AM
Location
INFORMS Annual Meeting 2023
Phoenix, Arizona
Georgios Margaritis
Georgios Margaritis
PhD Candidate | Open to Work

My research interests lie in the intersection of Machine Learning, Mathematical Optimization and Software.

Related