Investigation of asymptomatic carotid stenosis treatment is hindered by the lack of a contemporary population-based disease cohort. We describe the use of natural language processing (NLP) to identify stenosis in patients undergoing carotid imaging. Adult patients with carotid imaging between 2008 and 2012 in a large integrated health care system were identified and followed through 2017. An NLP process was developed to characterize carotid stenosis according to the Society of Radiologists in Ultrasound (for ultrasounds) and North American Symptomatic Carotid Endarterectomy Trial (NASCET) (for axial imaging) guidelines. The resulting algorithm assessed text descriptors to categorize normal/non-hemodynamically significant stenosis, moderate or severe stenosis as well as occlusion in both carotid ultrasound (US) and axial imaging (computed tomography and magnetic resonance angiography [CTA/MRA]). For US reports, internal carotid artery systolic and diastolic velocities and velocity ratios were assessed and matched for laterality to supplement accuracy. To validate the NLP algorithm, positive predictive value (PPV or precision) and sensitivity (recall) were calculated from simple random samples from the population of all imaging studies. Lastly, all non-normal studies were manually reviewed for confirmation for prevalence estimates and disease cohort assembly. A total of 95,896 qualifying index studies (76,276 US and 19,620 CTA/MRA) were identified among 94,822 patients including 1059 patients who underwent multiple studies on the same day. For studies of normal/non-hemodynamically significant stenosis arteries, the NLP algorithm showed excellent performance with a PPV of 99% for US and 96.5% for CTA/MRA. PPV/sensitivity to identify a non-normal artery with correct laterality in the CTA/MRA and US samples were 76.9% (95% confidence interval [CI], 74.1%-79.5%)/93.1% (95% CI, 91.1%-94.8%) and 74.7% (95% CI, 69.3%-79.5%)/94% (95% CI, 90.2%-96.7%), respectively. Regarding cohort assembly, 15,522 patients were identified with diseased carotid artery, including 2674 exhibiting equal bilateral disease. This resulted in a laterality-specific cohort with 12,828 moderate, 5283 severe, and 1895 occluded arteries and 326 diseased arteries with unknown stenosis. During follow-up, 30.1% of these patients underwent 61,107 additional studies. Use of NLP to detect carotid stenosis or occlusion can result in accurate exclusion of normal/non-hemodynamically significant stenosis disease states with more moderate precision with lesion identification, which can substantially reduce the need for manual review. The resulting cohort allows for efficient research and holds promise for similar reporting in other vascular diseases.
Establishing a Carotid Artery Stenosis Disease Cohort for Comparative Effectiveness Research Using Natural Language Processing
Authors: Chang, Robert W; Tucker, Lue-Yen; Rothenberg, Kara A; Lancaster, Elizabeth M; Avins, Andrew L; Kuang, Hui C; Faruqi, Rishad M; Nguyen-Huynh, Mai N
J Vasc Surg. 2021 12;74(6):1937-1947.e3. Epub 2021-06-25.