15 Hybrid ML
Instead of all recurrence variables, you start with the hdPS variables chosen by the hdPS algorithm first.
15.1 Build model formula based on selected variables
From hdPS variables, we try to identify proxies that are empirically associated with the outcome based on a multivariate LASSO (outcome with all proxies in one model).
length(proxy.list.sel)
#> [1] 100
<- names(out3$autoselected_covariate_df[,-1])
proxy.list <- c(investigator.specified.covariates, proxy.list)
covarsTfull <- as.formula(paste0(c("outcome~ exposure",
Y.form collapse = "+") )
covarsTfull), <- model.matrix(Y.form, data = hdps.data)[,-1]
covar.mat <-glmnet::cv.glmnet(y = hdps.data$outcome,
lasso.fitx = covar.mat,
type.measure='mse',
family="binomial",
alpha = 1,
nfolds = 5)
<-coef(lasso.fit,s='lambda.min',exact=TRUE)
coef.fit<-row.names(coef.fit)[which(as.numeric(coef.fit)!=0)]
sel.variables<- proxy.list[proxy.list %in% sel.variables]
proxy.list.sel.hybrid length(proxy.list.sel.hybrid)
#> [1] 52
<- paste0(proxy.list.sel.hybrid, collapse = "+")
proxyform <- paste0(c(covform, proxyform), collapse = "+")
rhsformula <- as.formula(paste0("exposure", "~", rhsformula)) ps.formula
Build propensity score model based on selected variables based on LASSO.
15.1.1 Fit the PS model
<- weightit(ps.formula,
W.out data = hdps.data,
estimand = "ATE",
method = "ps")
Propensity score model fit to be able to calculate the inverse probability weights.
15.1.2 Obtain log-OR from unadjusted outcome model
<- as.formula(paste0("outcome", "~", "exposure"))
out.formula <- glm(out.formula,
fit data = hdps.data,
weights = W.out$weights,
family= binomial(link = "logit"))
<- summary(fit)$coef["exposure",
fit.summary c("Estimate",
"Std. Error",
"Pr(>|z|)")]
<- confint(fit, level = 0.95)["exposure", ]
fit.ci <- c(fit.summary, fit.ci)
fit.summary_with_ci.h round(fit.summary_with_ci.h,2)
#> Estimate Std. Error Pr(>|z|) 2.5 % 97.5 %
#> 0.44 0.04 0.00 0.36 0.51
Summary of results (log-OR).
Alternative process
It is also possible to start with ML selection, and then applying Bross’s formula on top of it (Schneeweiss et al. 2017).