TEMPORAL STABILITY AND DISTRIBUTION DRIFT IN ICU MORTALITY MODELS: MIMIC-IV VALIDATION STUDY
Downloads
Keywords:
machine learning, intensive care, temporal drift, distribution shift, LightGBM, PSI, model calibration, SHAP analysisAbstract
Machine learning models for clinical decision support can degrade over time due to temporal drift and changes in patient populations; however, long-term temporal evaluations in ICU mortality prediction tasks remain limited. This study assesses the temporal stability, calibration, and distribution drift of a LightGBM model trained on MIMIC-IV data. The analysis included 65,355 adult patient hospitalizations from 2008 to 2019 and employed a three-level validation scheme: standard, temporal, and pseudo-external. The model demonstrated excellent discriminative performance (AUC-ROC up to 0.998) and improved calibration when tested on future data. PSI analysis revealed no significant feature drift, while SHAP analysis highlighted the dominant role of physiological laboratory parameters. The results suggest that models based on fundamental physiological features and subjected to rigorous validation can maintain stability over many years without frequent retraining.
