26 High-Dimensional Prediction Models
Recent article proposed and evaluated high-dimensional prediction models (hdPMs) using linked health administrative data to predict long-term mortality risk (Hossain et al. 2025). Their key objective was to assess whether hdPMs could compensate for the absence of important clinical predictors (e.g., age, smoking, BMI) by using a large number of routinely collected health-care variables (e.g., ICD-9/10 codes).
Based on simulations, their findings showed that Cox-LASSO hdPMs consistently outperformed conventional models in both discrimination (time-dependent c-statistic) and calibration, especially when strong clinical predictors were missing. For example, the c-statistic improved from 0.78 (conventional model) to 0.90 (LASSO-based hdPM) in simulations.
Feature | hdPM | hdPS / hdDRS |
---|---|---|
Goal | Risk prediction (e.g., mortality stratification) | Confounding adjustment in causal inference |
Target | Outcome model (e.g., Cox model for time to death) | Exposure model (PS) or outcome model (DRS) |
Use of empirical vars | Yes, extensive ICD-9/10 code-based variables | Yes, but fewer (typically 500 top-ranking) |
Shrinkage used | Regularization (LASSO) crucial for performance | Often none; or simple score summaries |
Interpretability | Less interpretable, not designed for clinical use | Often more interpretable in PS/DRS context |
Main application | Stratification, risk targeting at population level | Adjustment in comparative effectiveness studies |
This study shows that hdPMs are promising tools for population-level risk prediction, especially when clinical data are sparse. While hdPS/hdDRS target confounding control in causal inference, hdPMs aim to optimize outcome prediction, even when important variables are missing. The use of LASSO-based regularization is a key differentiator that enables hdPMs to avoid overfitting in high-dimensional spaces.