By Landon Gray
Machine learning techniques might be able to better identify people who have long COVID (Lancet Digital Health 2022 May 16. doi:https://doi.org/10.1016/S2589-7500[22]00048-6).
A research team combed through a very large number of electronic health records (EHRs) in the National COVID Cohort Collaborative’s secure N3C Data Enclave, which is a national, centralized public database.
They defined their population base (N=1,793,604) as any living person 18 years of age and older with either an International Classification of Diseases-10-Clinical Modification COVID-19 diagnosis code (U07.1) from a hospital or an emergency room visit, or a positive SARS-CoV-2 polymerase chain reaction or antigen test result—at least 90 days after the earliest date of a positive indication of COVID-19.
Data collected from analysis of characteristics such as demographics, healthcare use, diagnoses and medications of nearly 98,000 adults with COVID-19 and almost 600 patients from three long COVID clinics were used to develop and train three machine learning models—the XGBoost machine learning models—to identify potential long COVID cases from all COVID-19 patients, including patients who were hospitalized and those who were not.
Based on the data provided and search criteria, the machine learning models identified with high accuracy those patients who potentially have long COVID and were similar to patients treated at long COVID clinics. Researchers achieved areas under the receiver operator characteristic curve of 0.92 for all patients, 0.90 for hospitalized patients and 0.85 for outpatients.
By October 2021, the researchers used the data to identify more than 100,000 long COVID cases. The cases have doubled to 200,000 as of May 2022.
As defined by the Shapley values, the most important features included healthcare use, age, dyspnea, and additional diagnosis and medication information included in the EHRs. Researchers found nonhospitalized long COVID clinic patients were mostly female, and patients who were hospitalized with acute COVID-19 were more likely to be Black and also were more likely to report a pre-COVID-19 comorbidity, such as diabetes, kidney disease and congestive heart failure.
Not surprisingly, post-COVID-19 respiratory symptoms were shown to be common, but researchers also found nonrespiratory symptoms, such as dyssomnia, chest pain, malaise, and treatments with lorazepam, melatonin and polyethylene glycol 3350.
The team concluded that long COVID may be better defined as “a set of related conditions with their own symptoms, trajectories and treatments,” but more and larger studies will need to be done to explore long COVID clusters.