We'd appreciate your feedback. Send feedback Subscribe to our newsletters and alerts


Interdisciplinary Research in Medical Sciences Specialty

2022 Volume 2 Issue 1

Interpretable Machine Learning Prediction of Clostridioides difficile Infection Using Three-Year Longitudinal EHR Data


, ,
  1. Department of Health Systems Science, School of Medicine, University of Michigan, Ann Arbor, USA.
Abstract

Clostridioides difficile infection poses major clinical and operational challenges. Hospitals have both quality and economic motivations to manage CDI effectively. Universal admission screening is rarely recommended, and prior modeling efforts often relied on limited samples, overly complex feature sets, or black-box techniques. Our goal was to create models using patient information to estimate the likelihood of a positive test with strong discrimination, clear interpretability, and a practical set of long-term health indicators. We used records from 157,493 UC San Diego Health patients seen between January 01, 2016, and July 03, 2019 who had at least 6 months of medication history. Pregnant individuals, patients under 18, and incarcerated persons were excluded. We trained Logistic Regression, Random Forest, and Ensemble models using hyperparameters tuned through 10-fold cross-validation. Performance was evaluated by AUROC. Logistic Regression coefficients were examined via odds ratios and p-values; Random Forest feature contributions were assessed using Gini importance. We also compared false-positive and false-negative predictions at selected thresholds.

The Logistic Regression, Random Forest, and Ensemble models produced AUROCs of 0.839, 0.851, and 0.866, respectively. Variables associated with elevated risk included age, use of immunosuppressive therapies, previous antibiotic exposure, and certain gastrointestinal medications. All models demonstrated strong discrimination (AUROC >0.83). Across analytic methods, similar predictors emerged as influential, many of which are consistent with established clinical risk factors for Clostridioides difficile. These human-readable models help identify factors shaping a patient’s likelihood of a positive test and the associated infection risk.


How to cite this article
Vancouver
Peterson J, Reynolds M, Brooks K. Interpretable Machine Learning Prediction of Clostridioides difficile Infection Using Three-Year Longitudinal EHR Data. Interdiscip Res Med Sci Spec. 2022;2(1):85-96. https://doi.org/10.51847/do0gNijk3T
APA
Peterson, J., Reynolds, M., & Brooks, K. (2022). Interpretable Machine Learning Prediction of Clostridioides difficile Infection Using Three-Year Longitudinal EHR Data. Interdisciplinary Research in Medical Sciences Specialty, 2(1), 85-96. https://doi.org/10.51847/do0gNijk3T

About GalaxyPub

Find out more

Our esteemed publisher is committed to advancing medical knowledge through rigorous research dissemination. We exclusively accept submissions related to the field of medicine.

Our journals provide a platform for clinicians, researchers, and scholars to share groundbreaking discoveries, clinical insights, and evidence-based practices. By maintaining this specialized focus, we ensure that their publications contribute significantly to the advancement of healthcare worldwide.