High-Dimensional Propensity Scores

Author

Ehsan Karim, School of Population and Public Health, UBC

Published

February 16, 2026

Background

The use of retrospective health care claims datasets is frequently criticized for lacking complete information on potential confounders. Ultimately, the treatment effects estimated utilizing such data sources may be subject to residual confounding. Digital electronic administrative records routinely collect a large volume of health-related information; and many of whom are usually not considered in conventional pharmacoepidemiological studies.

Proposal to reduce residual confounding bias

In 2009, a high-dimensional propensity score (hdPS) algorithm was proposed that utilizes such information as surrogates or proxies for mismeasured and unobserved confounders in an effort to reduce residual confounding bias. Since then, many machine learning and semi-parametric extensions of this algorithm have been proposed to exploit the wealth of high-dimensional proxy information properly.

Schneeweiss et al. (2009)

Purpose of the workshop

The primary goal of this workshop is to provide a practical foundation in hdPS. Specifically, we will:

  1. Contrast Methodologies: Clarify the distinctions between standard Propensity Score analysis and the high-dimensional framework.

  2. Systematize the hdPS Workflow: Walk through the underlying logic, sequential steps, and implementation guidelines using reproducible R code applied to an open-source dataset.

  3. Algorithmic Extensions: The rationale for integrating Machine Learning with hdPS.

Extended Learning Resources

To provide a comprehensive view beyond our live session, this website includes supplementary modules designed for independent study. These materials offer deeper dives into:

  1. Complex Data Structures: The adaptation and implementation of hdPS within survival analysis and longitudinal data frameworks to handle time-varying complexities.

  2. Critical Appraisal & Publication: A discussion on current controversies, methodological advantages, and best practices for reporting hdPS results in peer-reviewed manuscripts.

Workshop prerequisite

Attendees should have prerequisite knowledge of multiple regression analysis and working knowledge in R (e.g., basic data manipulation and regression fitting).

R Codes

R Codes can be downloaded here:

Version history

Infographics and visual syntheses in this presentation were prepared using NotebookLM, while the statistical findings and R-based visualizations were generated by the presenter.

Different versions and updates of the materials were presented in the following sessions

Additional relevant talks (selected):

Citation

How to cite

Karim, M. E. (2025). High-dimensional propensity score and its machine learning extensions in residual confounding control. The American Statistician, 79(1), 72-90. DOI: 10.1080/00031305.2024.2368794.

Comments

For any comments regarding this document, reach out to me.