Machine Learning Model Predicts Clinical Deterioration in Patients With COVID-19, Preserves Data Privacy

Researchers conducted an external validation study to determine whether a machine learning model accurately predicts the risk for clinical deterioration in patients hospitalized with COVID-19 infection.

An open source machine learning model designed to preserve data privacy was found to effectively predict the risk for clinical deterioration in patients hospitalized with COVID-19 infection, according to results of a study published in BMJ.

In this retrospective cohort study, researchers assessed the use of a machine learning model for predicting the risk for clinical deterioration among patients hospitalized with either respiratory distress or COVID-19 infection. The researchers enrolled patients into 2 cohorts to internally and externally validate the model. The internal and external validation cohorts comprised of patients who were admitted to 1 hospital in Michigan and patients hospitalized at 12 medical centers in the United States, respectively. Patients included in both cohorts were aged 18 years and older and received a diagnosis of COVID-19 infection between March 2020 and February 2021; all patients required supplemental oxygen. The model was designed to preserve data privacy and used patient data captured from electronic health records (EHR). The primary outcome was clinical deterioration within 5 days of hospital admission. The researchers defined clinical deterioration as mortality or the need for either invasive mechanical ventilation, supplemental oxygen via heated high flow nasal cannula, or intravenous vasopressors. To assess the performance and generalizability of the model, the researchers measured area under the receiver operating characteristic (AUROC) and precision-recall curve scores among patient subgroups stratified by sex, age, race, and ethnicity.

A total of 887 patients were included in the internal validation cohort, of whom 206 (21.6%) were considered to be at increased risk for clinical deterioration. The external validation cohort was a combination of 7 smaller cohorts and included a total of 7813 patients, 1304 (15.6%) of whom were at increased risk for clinical deterioration. The researchers assessed patient characteristics among both cohorts and found that the percentage of patients who were Hispanic or Latino was significantly increased among those in the external validation cohort (range, 13.5%-29.0%) vs those in the internal validation cohort (3.6%).

The researchers assessed the model’s performance among patients in both cohorts. In regard to the predictive performance of the model among patients in the internal validation cohort, the model achieved an AUROC score of 0.80 (95% CI, 0.77-0.84) and an area under the precision-recall curve score of 0.55 (95% CI, 0.48-0.63), with a calibration error of 0.01 (95% CI, 0.00-0.02). Similar results were observed among patients in the external validation cohort, with AUROC and external validation scores ranging between 0.77 and 0.84 and between 0.34 and 0.57, respectively; calibration errors ranged between 0.02 and 0.04. The mean AUROC score across both patient cohorts was 0.81.

After stratification by patient sex, age, race, and ethnicity, the effectiveness of the model in predicting the primary outcome was found to be significantly increased among those who self-reported their race as Asian vs those who self-reported their race as White. Further analysis showed that the model accurately identified patients in both cohorts who were at decreased risk for clinical deterioration within 48 hours of hospitalization. Of patients who were determined to be at decreased risk for the primary outcome, 5% or fewer experienced clinical deterioration, indicating that the model had a negative predictive value of 95%.

Limitations of this study included disruptions caused by the COVID-19 pandemic that may have decreased the predictive performance of the model and the use of only 1 type of EHR software for the collection of patient data.

The researchers noted that their “…method for external validation alleviates potential concerns surrounding patient privacy by foregoing the need for data sharing while still allowing for realistic and accurate evaluations of [the machine learning] model within different patients settings.” They concluded, “[these finding] can help develop models to predict patient deterioration within a single institution and… promote external validation and multicenter collaborations without the need for data sharing agreements.”

Disclosure: Some authors declared affiliations with biotech, pharmaceutical, and/or device companies. Please see the original reference for a full list of disclosures.


Kamran F, Tang S, Otles E, et al. Early identification of patients admitted to hospital for COVID-19 at risk of clinical deterioration: model development and multisite external validation study. BMJ. Published online February 17, 2022. doi:10.1136/bmj-2021-068576