20  Deep Learning

Recent work extends traditional hdPS analyses by introducing and explaining neural representation learning methods for causal inference in observational studies. It focuses on NHANES data (2013–2018) and highlights how recent innovations in machine learning can address residual confounding and model misspecification challenges commonly encountered in high-dimensional data settings.

Based on recent work by Karim & Wang (2025).

20.1 Plasmode Simulation

Simulation Element Description
Source Dataset NHANES 2013–2018
Simulation Framework Plasmode simulation preserving empirical covariate and exposure distributions
Simulated Sample Size 3,000 participants per iteration
Iterations 500 replicates
Prevalence Scenarios 1. Frequent exposure & frequent outcome
2. Rare exposure & frequent outcome
3. Frequent exposure & rare outcome
True Effect OR = 1 (null); RD = 0
Outcome Generation Logistic regression model with:
- Nonlinear transformations (log, poly)
- Interactions
- Proxy-derived comorbidity index
Confounding Simulation Unmeasured confounding mimicked using high-dimensional proxy variables

20.2 Estimators Compared

Method Core Idea Key Features Use of Propensity Score Optimization & Regularization
PSW (hdPS) Baseline method using logistic regression on investigator and proxy covariates High-dimensional covariates selected via hdPS Explicitly modeled via logistic regression None
TMLE (SL Smooth) (Balzer and Westling 2021) Semiparametric estimator using Super Learner Combines outcome and treatment models; uses smooth learners (logistic regression, LASSO, MARS) Explicitly modeled and used for targeting Super Learner; Donsker-compliant learners
TMLE (SL Unsmooth) More flexible TMLE with XGBoost in Super Learner Allows complex nonlinearities; lower variance reliability in small samples Explicitly modeled and used for targeting Super Learner including unsmooth learners (e.g., XGBoost)
DCTMLE (Zivich and Breskin 2021) TMLE with double cross-fitting Reduces overfitting in TMLE with flexible learners Explicitly modeled and used for targeting Double cross-fitting for robustness
TARNET (Shalit, Johansson, and Sontag 2017) Neural net with treatment-agnostic shared representation Two heads for outcome under treatment/control; most precise in frequent exposure/outcome Not used explicitly Targeted regularization; Adam + SGD with early stopping
Dragonnet (Shi, Blei, and Veitch 2019) Neural net that jointly models outcomes and propensity score Adds third head for PS; enforces balance and semiparametric alignment Modeled as an explicit third output Targeted regularization; multitask learning
NEDnet (Shi, Blei, and Veitch 2019) Sequential neural network for treatment then outcome Stage 1: predict treatment; Stage 2: freeze representation and predict outcomes Modeled separately in Stage 1 Targeted regularization; two-stage optimization

20.3 Simulation Results

Figure 1. Bias across Methods in NHANES Plasmode Simulation

Figure 2. Relative error across Methods in NHANES Plasmode Simulation

Results are fully accessible via a Shiny app:

👉 Interactive Causal Benchmark App

Explore bias, SEs, and coverage metrics across methods and simulation conditions.

20.4 Conclusion

  • PSW remains an interpretable benchmark
  • TMLE and neural methods extend this framework by improving bias-variance trade-offs and enabling better performance in complex settings
  • Among deep learning methods, Dragonnet offers the best average trade-off; NEDnet excels in coverage but is computationally heavy; TARNET offers precision
  • These methods are particularly useful when dealing with residual confounding, nonlinear effects, and proxy variable structures