15  Hybrid ML

Instead of all recurrence variables, you start with the hdPS variables chosen by the hdPS algorithm first.

15.1 Build model formula based on selected variables

From hdPS variables, we try to identify proxies that are empirically associated with the outcome based on a multivariate LASSO (outcome with all proxies in one model).

length(proxy.list.sel)
#> [1] 100
proxy.list <- names(out3$autoselected_covariate_df[,-1])
covarsTfull <- c(investigator.specified.covariates, proxy.list)
Y.form <- as.formula(paste0(c("outcome~ exposure", 
                              covarsTfull), collapse = "+") )
covar.mat <- model.matrix(Y.form, data = hdps.data)[,-1]
lasso.fit<-glmnet::cv.glmnet(y = hdps.data$outcome, 
                             x = covar.mat, 
                             type.measure='mse',
                             family="binomial",
                             alpha = 1, 
                             nfolds = 5)
coef.fit<-coef(lasso.fit,s='lambda.min',exact=TRUE)
sel.variables<-row.names(coef.fit)[which(as.numeric(coef.fit)!=0)]
proxy.list.sel.hybrid <- proxy.list[proxy.list %in% sel.variables]
length(proxy.list.sel.hybrid)
#> [1] 52
proxyform <- paste0(proxy.list.sel.hybrid, collapse = "+")
rhsformula <- paste0(c(covform, proxyform), collapse = "+")
ps.formula <- as.formula(paste0("exposure", "~", rhsformula))

Build propensity score model based on selected variables based on LASSO.

15.1.1 Fit the PS model

W.out <- weightit(ps.formula, 
                    data = hdps.data, 
                    estimand = "ATE",
                    method = "ps")

Propensity score model fit to be able to calculate the inverse probability weights.

15.1.2 Obtain log-OR from unadjusted outcome model

out.formula <- as.formula(paste0("outcome", "~", "exposure"))
fit <- glm(out.formula,
            data = hdps.data,
            weights = W.out$weights,
            family= binomial(link = "logit"))
fit.summary <- summary(fit)$coef["exposure",
                                 c("Estimate", 
                                   "Std. Error", 
                                   "Pr(>|z|)")]
fit.ci <- confint(fit, level = 0.95)["exposure", ]
fit.summary_with_ci.h <- c(fit.summary, fit.ci)
round(fit.summary_with_ci.h,2) 
#>   Estimate Std. Error   Pr(>|z|)      2.5 %     97.5 % 
#>       0.44       0.04       0.00       0.36       0.51

Summary of results (log-OR).

Alternative process

It is also possible to start with ML selection, and then applying Bross’s formula on top of it (Schneeweiss et al. 2017).