26  High-Dimensional Prediction Models

Recent article proposed and evaluated high-dimensional prediction models (hdPMs) using linked health administrative data to predict long-term mortality risk (Hossain et al. 2025). Their key objective was to assess whether hdPMs could compensate for the absence of important clinical predictors (e.g., age, smoking, BMI) by using a large number of routinely collected health-care variables (e.g., ICD-9/10 codes).

Based on simulations, their findings showed that Cox-LASSO hdPMs consistently outperformed conventional models in both discrimination (time-dependent c-statistic) and calibration, especially when strong clinical predictors were missing. For example, the c-statistic improved from 0.78 (conventional model) to 0.90 (LASSO-based hdPM) in simulations.

Feature hdPM hdPS / hdDRS
Goal Risk prediction (e.g., mortality stratification) Confounding adjustment in causal inference
Target Outcome model (e.g., Cox model for time to death) Exposure model (PS) or outcome model (DRS)
Use of empirical vars Yes, extensive ICD-9/10 code-based variables Yes, but fewer (typically 500 top-ranking)
Shrinkage used Regularization (LASSO) crucial for performance Often none; or simple score summaries
Interpretability Less interpretable, not designed for clinical use Often more interpretable in PS/DRS context
Main application Stratification, risk targeting at population level Adjustment in comparative effectiveness studies

This study shows that hdPMs are promising tools for population-level risk prediction, especially when clinical data are sparse. While hdPS/hdDRS target confounding control in causal inference, hdPMs aim to optimize outcome prediction, even when important variables are missing. The use of LASSO-based regularization is a key differentiator that enables hdPMs to avoid overfitting in high-dimensional spaces.