class: center, middle, inverse, title-slide .title[ # Can We Train Machine Learning Methods to Outperform the High-Dimensional Propensity Score Algorithm? ] .subtitle[ ## ⚔ ] .author[ ### M. Ehsan. Karim; UBC ] .institute[ ### 2022 Sentinel Innovation and Methods Seminar Series ] .date[ ### May 11, 2022 ] --- class: full <img src="images/top0.png" width="100%" style="display: block; margin: auto;" /> <img src="images/top.png" width="100%" style="display: block; margin: auto;" /> <img src="images/top0.png" width="100%" style="display: block; margin: auto;" /> --- # Outline .center[Slides at [tinyurl.com/hdps2022](https://tinyurl.com/hdps2022)] ### 1. hdPS - Basic terminology ### 2. Machine learning-based hdPS - [Karim et al. 2018](https://doi.org/10.1097/ede.0000000000000787) Epidemiology - Joint work with - .red[Menglan Pang] and .red[Robert W Platt] - McGill, CNODES Methods; Fund CIHR, Grant #DSE – 146021 - General idea ### 3. Related research - Not exhaustive --- class: inverse, center, middle # hdPS --- ## Motivating Example [Basham et al. 2021](https://doi.org/10.1016/j.eclinm.2021.100752) EClinicalMedicine: [CC BY license](http://creativecommons.org/licenses/by/4.0/) <img src="images/tbdag.png" width="80%" style="display: block; margin: auto;" /> .red[Healthcare claims data] for immigrants to British Columbia, Canada, 1985–2015 --- ## Health care database: Advantages vs Disadvantages .pull-left[ 1. Larger .red[sample size]; 1. Diverse population; 1. .red[Longitudinal records] /many years; 1. .red[Detailed] health encounters, comorbidity, drug exposure history; 1. possibility to .red[link] other databases. ] .pull-right[ 1. Not specifically designed for answering a particular .red[research question]; 1. .red[Data sparsity]: relies on visits and encounters; 1. No control over which factors were measured. ] .footnote[TLDR: .red[**May not have all confounders**].] --- ## How to select adjustment variables? **Modified disjunctive cause criterion** .red[Adjust] for variables that are - causes of exposure or outcome or both, - discard: known instrument, - including .red[good proxies] for unmeasured common causes [VanderWeele et al. 2019](https://doi.org/10.1007/s10654-019-00494-6) European Journal of Epidemiology: [CC BY license](http://creativecommons.org/licenses/by/4.0/) .left-column[ - `\(U\)` = Smoking - `\(C_1\)` = Tobacco use ] .right-column[ <img src="images/proxy.png" width="85%" style="display: block; margin: auto;" /> ] --- ## Proxy information in Admin data [Schneeweiss et al. 2018](https://dx.doi.org/10.2147/CLEP.S166545) Clinical Epidemiology: [CC BY NC license](http://creativecommons.org/licenses/by-nc/3.0/) Regular epidemiological studies vs. .red[Proxies of underlying confounders] <img src="images/hdpscov.png" width="90%" style="display: block; margin: auto;" /> --- ## High-dimensional proxy information - Adjusting for something that **may not be interpretable** directly with the context of the research question. - .red[**Logic**]: measures from same subject should be **correlated** = relevant proxy information <img src="index_files/figure-html/cite-1.png" width="50%" style="display: block; margin: auto;" /> --- ## hdPS: General Idea [Karim et al. 2018](https://doi.org/10.1097/ede.0000000000000787) Epidemiology: Clinical Practice Research Datalink (1998–2012) <img src="images/proxy330.png" width="90%" style="display: block; margin: auto;" /> <img src="images/fu.png" width="80%" style="display: block; margin: auto;" /> --- ## hdPS: General Idea [Karim et al. 2018](https://doi.org/10.1097/ede.0000000000000787) Epidemiology: Clinical Practice Research Datalink (1998–2012) <img src="images/proxy330a.png" width="90%" style="display: block; margin: auto;" /> <img src="images/fu.png" width="80%" style="display: block; margin: auto;" /> --- ## hdPS: General Idea [Karim et al. 2018](https://doi.org/10.1097/ede.0000000000000787) Epidemiology: Clinical Practice Research Datalink (1998–2012) <img src="images/proxy33.png" width="90%" style="display: block; margin: auto;" /> <img src="images/fu.png" width="80%" style="display: block; margin: auto;" /> --- ## hdPS: General Idea List of additional proxy variables (.red[empirical covariates] / .red[EC]): | Practice (Dimension 1) | Diagnostic (Dimension 2) | Procedure (Dimension 3) | Medication (Dimension 4) | |----------------------|----------------------|---------------------|----------------------| | EC-dim1-1-once | EC-dim2-1-once | EC-dim3-1-once | EC-dim4-1-once | | EC-dim1-1-sporadic | EC-dim2-1-sporadic | EC-dim3-1-sporadic | EC-dim4-1-sporadic | | EC-dim1-1-frequent | EC-dim2-1-frequent | EC-dim3-1-frequent | EC-dim4-1-frequent | | `\(\ldots\)` | `\(\ldots\)` | `\(\ldots\)` | `\(\ldots\)` | | EC-dim1-.red[686]-frequent | EC-dim2-.red[328]-frequent | EC-dim3-.red[76]-frequent | EC-dim4-.red[284]-frequent | - Total (686+328+76+284)*3 = .red[4,122 ECs] - 4 dimension `\(\times\)` 3 intensity `\(\times\)` 200 .red[most prevalent codes] [*] = .red[**2,400 ECs**] - [*] [Schuster et al. (2015)](https://doi.org/10.1002/pds.3773) PDS recommended omitting prevalence-based selection --- ## hdPS: General Idea <img src="images/psmodelks.png" width="100%" style="display: block; margin: auto;" /> .pull-left[ PS from only baseline confounders  ] .pull-right[ PS from kitchen sink model!  ] --- ## hdPS mechanism: find useful ECs <img src="images/eq1.png" width="95%" style="display: block; margin: auto auto auto 0;" /> .pull-left[ In our example, $$ `\begin{aligned} U &=& \text{smoking status} \end{aligned}` $$ ] .pull-right[ - [Bross (1966)](https://pubmed.ncbi.nlm.nih.gov/5966011/) formula requires - binary U - binary Y - binary A ] --- ## hdPS mechanism: find useful ECs <img src="images/eq2.png" width="100%" style="display: block; margin: auto auto auto 0;" /> .pull-left[In our example, $$ `\begin{aligned} EC &=& \text{EC-dim1-21-once} \\ &=& \text{EC-dim2-95-once} \\ && \ldots\\ &=& \text{EC-dim4-64-once} \end{aligned}` $$ ] .pull-right[ - [Bross (1966)](https://pubmed.ncbi.nlm.nih.gov/5966011/) formula requires - binary EC - binary Y - binary A ] --- ## hdPS mechanism: find useful ECs Rank (descending) each EC by the magnitude of log-bias: Absolute log `\(Bias_M\)` | Rank by bias | Absolute log `\(Bias_M\)` | EC | |--------------|-------------------------|----------------------| | 1 | 0.42 | EC-dim1-21-once | | 2 | 0.32 | EC-dim2-95-once | | 3 | 0.25 | EC-dim4-289-once | | `\(\ldots\)` | `\(\ldots\)` | `\(\ldots\)` | | 2,400 | 0.01 | EC-dim4-64-frequent | Take top .red[100] or .red[500] of these ECs. These are hdPS variables. <img src="images/psmodelhd.png" width="100%" style="display: block; margin: auto;" /> --- ## hdPS: Assumption .large[ - **The selected ECs collectively serve as .red[proxies of all unmeasured or residual confounding]** - **Implication**: an hdPS analysis may adjust for the unmeasured or residual confounding - This assumption is strong and often not verifiable. - Helpful in practice? ] --- ## hdPS: Balance .pull-left[ PS from kitchen sink model!  ] .pull-right[ PS from 500-hdPS!  ] --- ## hdPS: estimate treatment effect - [Karim et al. 2018](https://doi.org/10.1097/ede.0000000000000787) Epidemiology - Previous research: [Pang et al. (2016)](https://dx.doi.org/10.1097%2FEDE.0000000000000487): Epidemiology <img src="images/AllORs.png" width="100%" style="display: block; margin: auto auto auto 0;" /> --- ## hdPS: Ways to improve | Rank by bias | Absolute log `\(Bias_M\)` | EC | |--------------|-------------------------|----------------------| | 1 | 0.42 | EC-dim1-21-once | | 2 | 0.32 | EC-dim2-95-once | | 3 | 0.25 | EC-dim4-289-once | | `\(\ldots\)` | `\(\ldots\)` | `\(\ldots\)` | | 500 | 0.03 | EC-dim4-63-frequent | - ECs selected separately / .red[univatiately] [VanderWeele et al. 2019](https://doi.org/10.1007/s10654-019-00494-6) EJE - can be **correlated** (coming from same patient), - providing same information - **may not be useful anymore** in the presence of others - .red[Multivariate] structure is good to consider - Model-specification --- class: inverse, center, middle # Machine learning-based hdPS --- ## Variable selection in PS context .pull-left[ .red[**Literature**] - [Brookhart et al. (2006)](https://doi.org/10.1093/aje/kwj149) AJE - [Myers et al. (2011)](https://doi.org/10.1093/aje/kwr364) AJE - [Pearl (2011)](https://doi.org/10.1093/aje/kwr352) AJE - [Schuster et al (2016)](https://doi.org/10.1016/j.jclinepi.2016.05.017) JCE ] .pull-right[ - bias amplification - inflated variance - overfitting ] <img src="images/vartype.png" width="65%" style="display: block; margin: auto;" /> --- ## Variable selection in PS context <img src="images/vartype2.png" width="90%" style="display: block; margin: auto;" /> - How to select variables to adjust? - Same idea for the proxies. - .red[Pre-exposure] measurements (no mediator, collider, effect). - .red[Associated with Y] (irrespective of association with A) --- ## Variable selection via ML - .red[Jointly] consider in 1 model: - Perform variable selection based on .red[association with outcome] | Approach | Advantage | Limitations | |--------------|-------------------------|----------------------| | LASSO [Franklin et al. (2015)](https://doi.org/10.1093/aje/kwv108) AJE | Variable selection by dropping .red[collinear] variables | Tends to select one variable from a group, ignoring the rest | | Elastic net | More .red[stable] than LASSO | Non-linear and non-additive terms need to be specified | | Random forest [Low et al. (2016)](https://doi.org/10.2217/cer.15.53) J. Comp. Eff. Res. | Automatically detect non-linearity and .red[non-additivity] | Only provides **variable importance**, but no cut-points | <img src="images/varimp.png" width="40%" style="display: block; margin: auto;" /> --- ## Machine learning-based hdPS ### Pure ML approach Start with all ECs <img src="images/mlpure.png" width="100%" style="display: block; margin: auto;" /> Say, 100 ECs (associated with Y) were selected by Elastic net approach <img src="images/psmodelrefined.png" width="100%" style="display: block; margin: auto;" /> --- ## Machine learning-based hdPS ### Hybrid approach (hdPS, then ML) Start with top 500 ECs selected by Bross formula / prioritization <img src="images/mlhybrid.png" width="100%" style="display: block; margin: auto;" /> Say, 100 ECs (associated with Y) were selected by Elastic net approach <img src="images/psmodelrefined.png" width="100%" style="display: block; margin: auto;" /> This approach is different than [Schneeweiss et al. (2017)](https://doi.org/10.1097/EDE.0000000000000581) Epidemiology, where prioritization was used after applying LASSO. --- ## hdPS vs. ML: estimate treatment effect [Karim et al. 2018](https://doi.org/10.1097/ede.0000000000000787) Epidemiology <img src="images/AllOR2.png" width="100%" style="display: block; margin: auto auto auto 0;" /> .center[.red[**Only ~ 30% of the selected ECs were common.**]] --- ## hdPS: estimate treatment effect [Schneeweiss et al. 2018](https://dx.doi.org/10.2147/CLEP.S166545) Clinical Epidemiology: [CC BY NC license](http://creativecommons.org/licenses/by-nc/3.0/) .right-column[ <img src="images/hdpsonly.png" width="100%" style="display: block; margin: auto auto auto 0;" /> ] .left-column[ *"This strongly suggests that even .red[without the investigator-specifying covariates] for adjustment, the .red[algorithm alone] optimizes confounding adjustment."* ] --- ## hdPS vs. ML: estimate treatment effect [Karim et al. 2018](https://doi.org/10.1097/ede.0000000000000787) Epidemiology <img src="images/AllORu.png" width="100%" style="display: block; margin: auto auto auto 0;" /> .center[.red[**Quality of proxy information matters.**]] --- ## Plasmode Simulation [Franklin et al. (2014)](https://doi.org/10.1016/j.csda.2013.10.018) CSDA <img src="images/plasmode.png" width="100%" style="display: block; margin: auto;" /> Another baseline set with .red[no unmeasured confounding] (1-A to 9-A). --- ## Plasmode Simulation: Leaderboard .red[Answer] to the question in the title of this talk (**bold** = pure ML): <img src="images/resultplasmode.png" width="75%" style="display: block; margin: auto;" /> --- ## Plasmode Simulation Comparable if .red[adequate proxies] incorporated (RD estimates) <img src="images/res.png" width="60%" style="display: block; margin: auto;" /> --- ## Shared Limitations .left-column[ - M-bias [Liu et al (2012)](https://doi.org/10.1093/aje/kws165) AJE - Z-bias [Myers et al. (2011)](https://doi.org/10.1093/aje/kwr364) AJE ] .right-column[ - .red[EC interpretation] unclear vs. causal inference - not collected for research purposes - EC used in PS - Primarily to deal with .red[residual confounding] - Not a straightforward extension to PS analysis - .red[Motivation of PS and hdPS are different to begin with] - .red[No separation] of design and analysis stages in bias-based - exposure-based is OK; but has own issues - post-selection bias [Taylor and Tibshirani (2015)](https://doi.org/10.1073/pnas.1507583112) ] --- ## Advantage and Limitations .pull-left[ - Alternative ways to prioritize / rank - Automatic .red[cut-off] of how many variables - .red[Ranking] - Pure ML methods can be used for .red[non-binary] outcomes and proxies - binary - categorized - continuous - survival ] .pull-right[ - .red[Coverage] not assessed [Morris et al. (2019)](https://doi.org/10.1002/sim.8086) - Only a few ML methods assessed - DR methods not covered ] --- ## Motivating Example Basham et al. 2021 [EClinicalMedicine](https://doi.org/10.1016/j.eclinm.2021.100752): [CC BY license](http://creativecommons.org/licenses/by/4.0/) <img src="images/tbanalysis.png" width="100%" style="display: block; margin: auto;" /> - Prefer to use hdPS / ML with ECs as a .red[**secondary analysis**] - Proxy adjustment method (methods vs. subject area journals). --- ## JAMA Example [Brown et al. (2017)](https://doi.org/10.1001/jama.2017.3415) | Method | HR | CI `\(95\%\)` | |--|-|-| | Unadjusted | 2.16 | 1.64-2.86 | | Regression | 1.59 | 1.17-2.17 | | IPTW hdPS | .red[1.61] | .red[0.997-2.59] | | 1-1 hdPS matching | 1.64 | 1.07-2.53 | | Pre-pregnancy | 1.85 | 1.37-2.51 | - Conclusion: **not associated** - More discussion: [Amrhein, Trafimow, Greenland, 2019](https://doi.org/10.1080/00031305.2018.1543137) The American Statistician --- class: inverse, center, middle # Related research directions --- ## Related research directions ### AI : Autoencoders [Weberpals et al. (2021)](https://doi.org/10.1097/ede.0000000000001338) used autoencoders (3, 5, 7 layers) to reduce EC dimensions. <img src="images/ae.png" width="30%" style="display: block; margin: auto;" /> - Autoencoder-based hdPS is useful. - Shallow learning (less layers) had better MSE. - .red[Did not perform better than LASSO]. --- ## Related research directions ### TMLE Targetted learning approach [Pang et al. (2016)](https://dx.doi.org/10.1097%2FEDE.0000000000000487): Epidemiology | Model | Max SW weight| |--|-| | Only important 5 confounders | 1.78| | 29 confounders | 69.67| | 29 confounders + 400 ECs | .red[390.77]| - better covariate balance vs. overfitting - Varying number of covariates selected [Tazare et al. 2022](https://doi.org/10.1002/pds.5412) [Haris and Platt (2021)](https://doi.org/10.48550/arXiv.2112.08495) arxiv - group importance score - extension of the hdPS (.red[hdCS]) to non-binary outcome and confounders --- ## Related research directions ### Sample splitting [Naimi et al. (2021)](https://doi.org/10.1093/aje/kwab201) AJE SL, TMLE, AIPW and usefulness of sample splitting - ML based .red[singly robust methods] should be avoided - Use .red[sample splitting] - .red[rich SL library] of flexible regression as well as higher order interactions --- ## Related research directions ### Cross-fitting [Zivich and Breskin (2021)](https://doi.org/10.1097/ede.0000000000001332) Epidemiology - Cross-fitting + together with double-robust approaches <img src="images/cf.png" width="70%" style="display: block; margin: auto;" /> --- ## Related research directions ### SL library [Balzer and Westling (2021)](https://doi.org/10.1093/aje/kwab200) AJE - TMLE without sample-splitting with a carefully chosen SL library [Meng and Huang (2021)](https://doi.org/10.48550/arXiv.2105.13148) arxiv - SL with .red[smooth] (differentiable: LASSO, spline) learners outperform those that included non-smooth learners --- ## Take home message .large[ - hdPS and ML alternatives generally reduces .red[residual confounding] - [*] if .red[good proxies] available. - hdPS: dependent on .red[Bross-formula] (all binary) - Non-binary outcome: consider ML methods. - .red[Hybrid-methods] performed better (MSE). - Active area of research ] --- class: center, middle # Thanks! ### http://ehsank.com/