Challenges

Issues with hdPS

Univariate selection of many proxies
  • Recurrent covariates selected separately / univariately
  • can be correlated (coming from same patient) and cause multicollinearity
  • may inflate variance
  • General overfitting problem. Too many adjustment variables?

Potential ways to improve

  • Multiple recurrent covariates could provide same information, may not be useful anymore in the presence of others. Multivariate structure could be good to consider in a single model.
  • Machine learning variable selection methods could be useful to combat multicollinearity.
  • Sample splitting methods could be useful in combating overfitting in high dimensions.

Cross-validation is embedded within super (ensemble) learning.

Controversy

Researchers argue that the PS model, which does not allow for data-driven selection of variables, is a more principled approach to adjusting for confounding in observational studies, without introducing any bias in the analysis .

Other researchers argue that the hdPS approach can improve the precision of effect estimates by including additional variables that are empirically associated with both the exposure and the outcome, which may reduce residual confounding.

Machine learning alternatives have the same criticism as some of them depend on association with the outcome.

Tip

hdPS can only control for observed confounding, and cannot guarantee the direction or magnitude of residual confounding that may still exist. This is why sensitivity analyses and model diagnostics are important in assessing the robustness of hdPS results.