Propensity score

Background

This chapter provides a comprehensive set of tutorials that guide readers through various methodologies of Propensity Score Matching (PSM) and Multiple Imputation (MI) using R, with practical applications using datasets like the Canadian Community Health Survey (CCHS) and the National Health and Nutrition Examination Survey (NHANES). The tutorials explore different scenarios and methodologies in handling and analyzing data, particularly focusing on estimating treatment effects and managing missing data. They delve into specific examples, such as exploring the relationship between Osteoarthritis (OA) and Cardiovascular Disease (CVD), and between Body Mass Index (BMI) and diabetes, while emphasizing the importance of accurate data handling, variable management, and robust analysis through PSM and MI. The tutorials are meticulously structured, providing step-by-step guides, code snippets, and thorough explanations, ensuring that readers can comprehend and replicate the processes in their research, thereby enhancing the reliability and robustness of their analyses, especially in the presence of missing data.

Stepping into this chapter, we are diving deeper into the world of survey data analysis, exploring how to combine propensity score matching (PSM) and strategies for handling missing data. PSM helps us balance our data, making sure our study groups are comparable, while managing missing data ensures our results are as accurate as possible. In the upcoming tutorials, we will weave through the steps of using PSM while also dealing with the gaps in our data, ensuring our analyses are solid and dependable. So, this chapter is not just a next step, but a leap into a more advanced exploration, blending matching methods with careful data handling strategies.

Note

Should you find yourself seeking a refresher on PSM, we invite you to revisit our dedicated/external tutorial, which elucidates PSM within a non-survey data analysis context. This resource not only provides a foundational understanding but also serves as a comprehensive guide through the nuanced steps of PSM. Additionally, our external discussion page offers a succinct summary of the tutorial and thoughtfully extends the conversation into more intricate directions, exploring the complexities and advanced applications of PSM (propensity score weighting, categorical and continuous exposure). Both resources are crafted to enhance your understanding and application of PSM, ensuring a robust and informed approach to your data analysis journey

Important

Datasets:

All of the datasets used in this tutorial can be accessed from this GitHub repository folder

Overview of tutorials

Covariate matching using CCHS: example of OA-CVD

The tutorial illustrate a comprehensive data analysis workflow using R, focusing on matching methods to estimate treatment effects with the CCHS data. Initially, we conduct data pre-processing steps to handle categorical variables and missing data. Subsequent sections delve into setting up design objects for survey-weighted analyses and conducting preliminary analyses to explore variable distributions and treatment effects. The core of the analysis involves implementing matching techniques, starting with a single variable and progressively including more variables to refine the matching. Various matching scenarios are explored, each followed by logistic regression models to estimate treatment effects.

Propensity score matching using CCHS: revisiting example of OA-CVD

The tutorial provides a thorough walkthrough of implementing Propensity Score Matching (PSM) in R, specifically in the context of an OA - CVD health study from the CCHS. PSM is utilized to mitigate bias from confounding variables in observational studies by pairing treated and control units with analogous propensity scores. The guide underscores that PSM is iterative, often requiring refinement of the matching strategy to achieve satisfactory covariate balance in the matched sample. Various strategies for estimating treatment effects in the matched sample are explored, each with distinct assumptions and implications. The tutorial also delves into different matching strategies, such as nearest-neighbor matching with and without calipers, matching with different ratios, and matching with replacement, all while emphasizing the importance of assessing and re-assessing covariate balance at each step using both graphical and numerical methods.

Propensity score matching using NHANES: example of OA - CVD

The provided text outlines methodologies for conducting PSM using the NHANES dataset, with a particular emphasis on handling survey design and weights in the analysis. Three distinct approaches, attributed to Zanutto (2006), DuGoff et al. (2014), and Austin et al. (2018), are delineated, each with a structured four-step process: 1) specifying the propensity score model, 2) matching treated and untreated subjects based on estimated propensity scores, 3) comparing baseline characteristics between matched groups, and 4) estimating treatment effects using the matched sample. The procedures utilize various R packages and functions to manipulate data, visualize missing data patterns, format variables, and perform analyses, ensuring that survey weights and design are appropriately considered to avoid bias in population-level effect estimates. The text underscores the importance of incorporating survey design into at least propensity score outcome analysis (e.g., during step 4: treatment effect estimation), as neglecting survey weights can significantly impact the estimates of population-level effects.

Propensity score matching using NHANES: example of BMI - diabetes

The tutorial provides a comprehensive guide on implementing PSM in R, utilizing the NHANES dataset, with a specific focus on diabetes as an outcome and body mass index (BMI) as an exposure variable. The methodology encompasses ensuring accurate and reproducible results in PSM. The tutorial, again, meticulously follows three distinct approaches for PSM, as recommended by Zanutto (2006), DuGoff et al. (2014), and Austin et al. (2018), each providing a unique perspective on handling and analyzing variables within the propensity score model. Notably, the tutorial introduces a nuanced approach to variable handling, model specifications, and matching steps, ensuring a thorough understanding of implementing PSM with varied methodologies. Furthermore, the tutorial introduces a “double adjustment” step in each approach, providing a robust estimate of the treatment effect while adjusting for covariates, thereby offering readers a holistic view on conducting PSM with a different set of variables and methodologies in the analysis steps.

Propensity score matching using NHANES when some variables have missing observations

This tutorial offers a clear and straightforward guide on how to use Propensity Score Matching (PSM) and Multiple Imputation (MI) in R, using the NHANES dataset for practical illustration. The main goal is to explore the relationship between “diabetes” (outcome) and being “born in the US” (exposure), while effectively managing missing data through MI. The first part of the tutorial, focusing on logistic regression, explains how to perform multiple imputations, fit a logistic regression model to all imputed datasets, and then obtain pooled Odds Ratios (OR) and 95% confidence intervals. Following this, the PSM analysis section carefully applies the PSM method, following Zanutto E. L. (2006), to all imputed datasets, and presents the pooled OR estimates and 95% confidence intervals. The tutorial emphasizes the crucial role of managing missing data through multiple imputation and provides a detailed, step-by-step guide, including code and thorough explanations, to ensure a deep understanding and ability to replicate the PSM with MI process in epidemiological research. This resource is invaluable for researchers and data analysts looking to strengthen their analyses when dealing with missing data.

Propensity score weighting

In this section, propensity score weighting with different methods is employed to estimate treatment effects in a modified NHANES dataset. Three different propensity score weighting approaches are applied: Zanutto’s (2006), DuGoff et al.’s (2014), and Austin et al.’s (2018). Each approach involves multiple steps, including the estimation of propensity scores, the calculation of weights (both unstabilized and stabilized), balance checking to ensure covariate balancing, and fitting outcome models using the weighted data. The double adjustment technique is also considered in each approach.

Propensity score matching with multiple imputation in subpopulations

In this section, the goal is to use propensity score matching (PSM) with multiple imputation (MI) to analyze a modified dataset from NHANES 2017-2018. The analysis focuses on specific subpopulations defined by eligibility criteria. This analysis provides insights into how to handle complex survey data with missing values and perform PSM with MI for subpopulations defined by eligibility criteria.

Propensity score weighting with multiple imputation in subpopulations

In this chapter, we employ propensity score (PS) weighting with MI to analyze a modified dataset from NHANES 2017-2018, focusing on specific subpopulations defined by eligibility criteria. Here we demonstrate how to perform PS weighting with MI for subpopulations defined by eligibility criteria.

Propensity score weighting for multiple treatment categories

In this chapter, we use propensity score weighting for multiple treatment categories using CCHS data.

Tip

Optional Content:

You’ll find that some sections conclude with an optional video walkthrough that demonstrates the code. Keep in mind that the content might have been updated since these videos were recorded. Watching these videos is optional.

Warning

Bug Report:

Fill out this form to report any issues with the tutorial.