26 High-Dimensional Prediction Models

Recent article proposed and evaluated high-dimensional prediction models (hdPMs) using linked health administrative data to predict long-term mortality risk (Hossain et al. 2025). Their key objective was to assess whether hdPMs could compensate for the absence of important clinical predictors (e.g., age, smoking, BMI) by using a large number of routinely collected health-care variables (e.g., ICD-9/10 codes).

Tip

(Hossain et al. 2025)

Based on simulations, their findings showed that Cox-LASSO hdPMs consistently outperformed conventional models in both discrimination (time-dependent c-statistic) and calibration, especially when strong clinical predictors were missing. For example, the c-statistic improved from 0.78 (conventional model) to 0.90 (LASSO-based hdPM) in simulations.

Feature	hdPM	hdPS / hdDRS
Goal	Risk prediction (e.g., mortality stratification)	Confounding adjustment in causal inference
Target	Outcome model (e.g., Cox model for time to death)	Exposure model (PS) or outcome model (DRS)
Use of empirical vars	Yes, extensive ICD-9/10 code-based variables	Yes, but fewer (typically 500 top-ranking)
Shrinkage used	Regularization (LASSO) crucial for performance	Often none; or simple score summaries
Interpretability	Less interpretable, not designed for clinical use	Often more interpretable in PS/DRS context
Main application	Stratification, risk targeting at population level	Adjustment in comparative effectiveness studies

This study shows that hdPMs are promising tools for population-level risk prediction, especially when clinical data are sparse. While hdPS/hdDRS target confounding control in causal inference, hdPMs aim to optimize outcome prediction, even when important variables are missing. The use of LASSO-based regularization is a key differentiator that enables hdPMs to avoid overfitting in high-dimensional spaces.