Can We Train Machine Learning Methods to Outperform the High-Dimensional Propensity Score Algorithm?

.title[
# Can We Train Machine Learning Methods to Outperform the High-Dimensional Propensity Score Algorithm?
]
.subtitle[
## ⚔
]
.author[
### M. Ehsan. Karim; UBC
]
.institute[
### 2022 Sentinel Innovation and Methods Seminar Series
]
.date[
### May 11, 2022
]

---

<img src="images/top0.png" width="100%" style="display: block; margin: auto;" />
<img src="images/top.png" width="100%" style="display: block; margin: auto;" />
<img src="images/top0.png" width="100%" style="display: block; margin: auto;" />
---

# Outline

### 1. hdPS

- Basic terminology

### 2. Machine learning-based hdPS

- [Karim et al. 2018](https://doi.org/10.1097/ede.0000000000000787) Epidemiology
- Joint work with  
  - .red[Menglan Pang] and .red[Robert W Platt]
  - McGill, CNODES Methods; Fund CIHR, Grant #DSE – 146021
- General idea

### 3. Related research

- Not exhaustive

---

# hdPS

---

## Motivating Example

[Basham et al. 2021](https://doi.org/10.1016/j.eclinm.2021.100752) EClinicalMedicine: [CC BY license](http://creativecommons.org/licenses/by/4.0/)

---

## Health care database: Advantages vs Disadvantages

1. Diverse population;

1. .red[Longitudinal records] /many years;

1. .red[Detailed] health encounters, comorbidity, drug exposure history;

1. possibility to .red[link] other databases.
]

1. .red[Data sparsity]: relies on visits and encounters;

1. No control over which factors were measured.
]

---

## How to select adjustment variables?

**Modified disjunctive cause criterion**

.red[Adjust] for variables that are 
- causes of exposure or outcome or both,  
- discard: known instrument, 
- including .red[good proxies] for unmeasured common causes

[VanderWeele et al. 2019](https://doi.org/10.1007/s10654-019-00494-6) European Journal of Epidemiology: [CC BY license](http://creativecommons.org/licenses/by/4.0/)

- `$C_1$` = Tobacco use
]

]
---

## Proxy information in Admin data

[Schneeweiss et al. 2018](https://dx.doi.org/10.2147/CLEP.S166545) Clinical Epidemiology: [CC BY NC license](http://creativecommons.org/licenses/by-nc/3.0/)

Regular epidemiological studies vs. .red[Proxies of underlying confounders]

---

## High-dimensional proxy information

- Adjusting for something that **may not be interpretable** directly with the context of the research question.

- .red[**Logic**]: measures from same subject should be **correlated** = relevant proxy information

---

## hdPS: General Idea

[Karim et al. 2018](https://doi.org/10.1097/ede.0000000000000787) Epidemiology: Clinical Practice Research Datalink (1998–2012)

---

## hdPS: General Idea

[Karim et al. 2018](https://doi.org/10.1097/ede.0000000000000787) Epidemiology: Clinical Practice Research Datalink (1998–2012)

---

## hdPS: General Idea

[Karim et al. 2018](https://doi.org/10.1097/ede.0000000000000787) Epidemiology: Clinical Practice Research Datalink (1998–2012)

---

## hdPS: General Idea

List of additional proxy variables (.red[empirical covariates] / .red[EC]):

| Practice (Dimension 1)             | Diagnostic (Dimension 2)           | Procedure (Dimension 3)           | Medication (Dimension 4)           |
|----------------------|----------------------|---------------------|----------------------|
| EC-dim1-1-once       | EC-dim2-1-once       | EC-dim3-1-once      | EC-dim4-1-once       |
| EC-dim1-1-sporadic   | EC-dim2-1-sporadic   | EC-dim3-1-sporadic  | EC-dim4-1-sporadic   |
| EC-dim1-1-frequent   | EC-dim2-1-frequent   | EC-dim3-1-frequent  | EC-dim4-1-frequent   |
| `$\ldots$`             | `$\ldots$`             | `$\ldots$`            | `$\ldots$`            |
| EC-dim1-.red[686]-frequent | EC-dim2-.red[328]-frequent | EC-dim3-.red[76]-frequent | EC-dim4-.red[284]-frequent |

- Total (686+328+76+284)*3 = .red[4,122 ECs]
- 4 dimension `$\times$` 3 intensity `$\times$` 200 .red[most prevalent codes] [*] = .red[**2,400 ECs**]

- [*] [Schuster et al. (2015)](https://doi.org/10.1002/pds.3773) PDS recommended omitting prevalence-based selection

---

## hdPS: General Idea

.pull-left[
PS from only baseline confounders 
![](images/balancePS.png)
]
.pull-right[
PS from kitchen sink model!
![](images/balanceallH.png)
]

---
## hdPS mechanism: find useful ECs

.pull-left[
In our example, 
$$
`\begin{aligned}
U &=& \text{smoking status}
\end{aligned}`
$$
]
.pull-right[
- [Bross (1966)](https://pubmed.ncbi.nlm.nih.gov/5966011/) formula requires
  - binary U
  - binary Y
  - binary A
]  
---

## hdPS mechanism: find useful ECs

.pull-left[In our example, 
$$
`\begin{aligned}
EC &=& \text{EC-dim1-21-once} \\
&=& \text{EC-dim2-95-once} \\
&&  \ldots\\
&=& \text{EC-dim4-64-once}
\end{aligned}`
$$
]

.pull-right[
- [Bross (1966)](https://pubmed.ncbi.nlm.nih.gov/5966011/) formula requires
  - binary EC
  - binary Y
  - binary A
] 
---

## hdPS mechanism: find useful ECs

Rank (descending) each EC by the magnitude of log-bias: Absolute  log `$Bias_M$`

| Rank by bias | Absolute log `$Bias_M$` | EC |
|--------------|-------------------------|----------------------|
| 1            | 0.42                    | EC-dim1-21-once      |
| 2            | 0.32                    | EC-dim2-95-once      |
| 3            | 0.25                    | EC-dim4-289-once     |
| `$\ldots$`     | `$\ldots$`                | `$\ldots$`             |
| 2,400        | 0.01                    | EC-dim4-64-frequent  |

Take top .red[100] or .red[500] of these ECs. These are hdPS variables.

---

## hdPS: Assumption

.large[
- **The selected ECs collectively serve as .red[proxies of all unmeasured or residual confounding]**

- **Implication**: an hdPS analysis may adjust for the unmeasured or residual confounding

- This assumption is strong and often not verifiable.

- Helpful in practice?

]

---

## hdPS: Balance

.pull-left[
PS from kitchen sink model!
![](images/balanceallH.png)
]
.pull-right[
PS from 500-hdPS!
![](images/balance500hdps.png)
]

---

## hdPS: estimate treatment effect

- [Karim et al. 2018](https://doi.org/10.1097/ede.0000000000000787) Epidemiology
- Previous research: [Pang et al. (2016)](https://dx.doi.org/10.1097%2FEDE.0000000000000487): Epidemiology

---

## hdPS: Ways to improve

| Rank by bias | Absolute log `$Bias_M$` | EC |
|--------------|-------------------------|----------------------|
| 1            | 0.42                    | EC-dim1-21-once      |
| 2            | 0.32                    | EC-dim2-95-once      |
| 3            | 0.25                    | EC-dim4-289-once     |
| `$\ldots$`     | `$\ldots$`                | `$\ldots$`             |
| 500        | 0.03                    | EC-dim4-63-frequent  |

- ECs selected separately / .red[univatiately] [VanderWeele et al. 2019](https://doi.org/10.1007/s10654-019-00494-6) EJE
    - can be **correlated** (coming from same patient), 
      - providing same information
      - **may not be useful anymore** in the presence of others

- .red[Multivariate] structure is good to consider
  - Model-specification

---

# Machine learning-based hdPS

---

## Variable selection in PS context

- [Brookhart et al. (2006)](https://doi.org/10.1093/aje/kwj149) AJE

- [Myers et al. (2011)](https://doi.org/10.1093/aje/kwr364) AJE

- [Pearl (2011)](https://doi.org/10.1093/aje/kwr352) AJE

- [Schuster et al (2016)](https://doi.org/10.1016/j.jclinepi.2016.05.017) JCE
]

- inflated variance

- overfitting
]

---

## Variable selection in PS context

- How to select variables to adjust?
- Same idea for the proxies. 
- .red[Pre-exposure] measurements (no mediator, collider, effect).
- .red[Associated with Y] (irrespective of association with A)

---

## Variable selection via ML

- .red[Jointly] consider in 1 model: 
  - Perform variable selection based on .red[association with outcome]

| Approach | Advantage | Limitations |
|--------------|-------------------------|----------------------|
| LASSO [Franklin et al. (2015)](https://doi.org/10.1093/aje/kwv108) AJE          | Variable selection by dropping .red[collinear] variables                    |   Tends to select one variable from a group, ignoring the rest     |
| Elastic net            |  More .red[stable] than LASSO                    | Non-linear and non-additive terms need to be specified      |
| Random forest [Low et al. (2016)](https://doi.org/10.2217/cer.15.53) J. Comp. Eff. Res.          |  Automatically detect non-linearity and .red[non-additivity]                    | Only provides **variable importance**, but no cut-points     |

---

## Machine learning-based hdPS

### Pure ML approach

Start with all ECs

Say, 100 ECs (associated with Y) were selected by Elastic net approach

---

## Machine learning-based hdPS

### Hybrid approach (hdPS, then ML)

Start with top 500 ECs selected by Bross formula / prioritization

Say, 100 ECs (associated with Y) were selected by Elastic net approach

This approach is different than [Schneeweiss et al. (2017)](https://doi.org/10.1097/EDE.0000000000000581) Epidemiology, where prioritization was used after applying LASSO.

---

## hdPS vs. ML: estimate treatment effect

[Karim et al. 2018](https://doi.org/10.1097/ede.0000000000000787) Epidemiology

---

## hdPS: estimate treatment effect

[Schneeweiss et al. 2018](https://dx.doi.org/10.2147/CLEP.S166545) Clinical Epidemiology: [CC BY NC license](http://creativecommons.org/licenses/by-nc/3.0/)

.right-column[
<img src="images/hdpsonly.png" width="100%" style="display: block; margin: auto auto auto 0;" />
]

.left-column[
*"This strongly suggests that even .red[without the investigator-specifying covariates] for adjustment, the .red[algorithm alone] optimizes confounding adjustment."*
]
---

## hdPS vs. ML: estimate treatment effect

[Karim et al. 2018](https://doi.org/10.1097/ede.0000000000000787) Epidemiology

---

## Plasmode Simulation

[Franklin et al. (2014)](https://doi.org/10.1016/j.csda.2013.10.018) CSDA

Another baseline set with .red[no unmeasured confounding] (1-A to 9-A).

---

## Plasmode Simulation: Leaderboard

---

## Plasmode Simulation

Comparable if .red[adequate proxies] incorporated (RD estimates)

---

## Shared Limitations

- Z-bias [Myers et al. (2011)](https://doi.org/10.1093/aje/kwr364) AJE
]

.right-column[
- .red[EC interpretation] unclear vs. causal inference
  - not collected for research purposes
  - EC used in PS

- Primarily to deal with .red[residual confounding]
  - Not a straightforward extension to PS analysis
  - .red[Motivation of PS and hdPS are different to begin with]

- .red[No separation] of design and analysis stages in bias-based
  - exposure-based is OK; but has own issues

- post-selection bias [Taylor and Tibshirani (2015)](https://doi.org/10.1073/pnas.1507583112)
]

---

## Advantage and Limitations

.pull-left[
- Alternative ways to prioritize / rank 
  - Automatic .red[cut-off] of how many variables 
  - .red[Ranking]

- Pure ML methods can be used for .red[non-binary] outcomes and proxies
  - binary
  - categorized
  - continuous
  - survival
]

- Only a few ML methods assessed

- DR methods not covered
]

---

## Motivating Example

Basham et al. 2021 [EClinicalMedicine](https://doi.org/10.1016/j.eclinm.2021.100752): [CC BY license](http://creativecommons.org/licenses/by/4.0/)

- Prefer to use hdPS / ML with ECs as a .red[**secondary analysis**] 
- Proxy adjustment method (methods vs. subject area journals).

---

## JAMA Example

[Brown et al. (2017)](https://doi.org/10.1001/jama.2017.3415)

| Method                 | HR                     | CI `$95\%$`                    |
|--|-|-|
| Unadjusted             | 2.16                   | 1.64-2.86                    |
| Regression | 1.59                   | 1.17-2.17                    |
| IPTW hdPS              | .red[1.61] | .red[0.997-2.59] |
| 1-1 hdPS matching      | 1.64                   | 1.07-2.53                    |
| Pre-pregnancy      | 1.85                   | 1.37-2.51                    |

- Conclusion: **not associated**

- More discussion: [Amrhein, Trafimow, Greenland, 2019](https://doi.org/10.1080/00031305.2018.1543137) The American Statistician

---

# Related research directions

---

## Related research directions

### AI : Autoencoders

[Weberpals et al. (2021)](https://doi.org/10.1097/ede.0000000000001338) used autoencoders (3, 5, 7 layers) to reduce EC dimensions.

- Autoencoder-based hdPS is useful.

- Shallow learning (less layers) had better MSE.

- .red[Did not perform better than LASSO].

---

## Related research directions

### TMLE

Targetted learning approach
[Pang et al. (2016)](https://dx.doi.org/10.1097%2FEDE.0000000000000487): Epidemiology

| Model | Max SW weight|
|--|-|
| Only important 5 confounders | 1.78|
| 29 confounders | 69.67|
| 29 confounders + 400 ECs | .red[390.77]|

- better covariate balance vs. overfitting
  - Varying number of covariates selected [Tazare et al. 2022](https://doi.org/10.1002/pds.5412)

[Haris and Platt (2021)](https://doi.org/10.48550/arXiv.2112.08495) arxiv

- group importance score
- extension of the hdPS (.red[hdCS]) to non-binary outcome and confounders

---
## Related research directions

### Sample splitting

[Naimi et al. (2021)](https://doi.org/10.1093/aje/kwab201) AJE

SL, TMLE, AIPW and usefulness of sample splitting

- ML based .red[singly robust methods] should be avoided

- Use .red[sample splitting]

- .red[rich SL library] of flexible regression as well as higher order interactions

---

## Related research directions

### Cross-fitting

[Zivich and Breskin (2021)](https://doi.org/10.1097/ede.0000000000001332) Epidemiology

- Cross-fitting + together with double-robust approaches

---
## Related research directions

### SL library

[Balzer and Westling (2021)](https://doi.org/10.1093/aje/kwab200) AJE

- TMLE without sample-splitting with a carefully chosen SL library

[Meng and Huang (2021)](https://doi.org/10.48550/arXiv.2105.13148) arxiv
- SL with .red[smooth] (differentiable: LASSO, spline) learners outperform those that included non-smooth learners

---

## Take home message

.large[
- hdPS and ML alternatives generally reduces .red[residual confounding] 
  
  - [*] if .red[good proxies] available.

- hdPS: dependent on .red[Bross-formula] (all binary)

- Non-binary outcome: consider ML methods.

- .red[Hybrid-methods] performed better (MSE).

- Active area of research
]

---

# Thanks!

### http://ehsank.com/