14  Pure ML

Tip

We show an example using LASSO aproach

15 Pure ML approach (LASSO)

Start with all recurrence variables (EC in the following equation)

Say, 100 proxies (associated with outcome) were selected by LASSO approach (ML-hdPS)

15.1 Choose variables associated with outcome

proxy.list <- names(out3$autoselected_covariate_df[,-1])
covarsTfull <- c(investigator.specified.covariates, proxy.list)
Y.form <- as.formula(paste0(c("outcome~ exposure", 
                              covarsTfull), collapse = "+") )
covar.mat <- model.matrix(Y.form, data = hdps.data)[,-1]
lasso.fit<-glmnet::cv.glmnet(y = hdps.data$outcome, 
                             x = covar.mat, 
                             type.measure='mse',
                             family="binomial",
                             alpha = 1, 
                             nfolds = 5)
coef.fit<-coef(lasso.fit,s='lambda.min',exact=TRUE)
sel.variables<-row.names(coef.fit)[which(as.numeric(coef.fit)!=0)]
proxy.list.sel.ml <- proxy.list[proxy.list %in% sel.variables]
length(proxy.list.sel.ml)
#> [1] 61
  • From all proxies, we try to identify proxies that are empirically associated with the outcome based on a multivariate LASSO (outcome with all proxies in one model).
  • Note that LASSO model is choosing variables based on association with the outcome conditional on the ’exposure`.
  • Variable selection is only happening for proxy variables.
  • Investigator specified variables are not being subject to variable selection.

15.2 Build model formula based on selected variables

covform <- paste0(investigator.specified.covariates, collapse = "+")
proxyform <- paste0(proxy.list.sel.ml, collapse = "+")
rhsformula <- paste0(c(covform, proxyform), collapse = "+")
ps.formula <- as.formula(paste0("exposure", "~", rhsformula))

Build propensity score model based on selected variables based on LASSO.

15.3 Fit the PS model

require(WeightIt)
W.out <- weightit(ps.formula, 
                    data = hdps.data, 
                    estimand = "ATE",
                    method = "ps")

Propensity score model fit to be able to calculate the inverse probability weights.

15.4 Obtain log-OR from unadjusted outcome model

out.formula <- as.formula(paste0("outcome", "~", "exposure"))
fit <- glm(out.formula,
            data = hdps.data,
            weights = W.out$weights,
            family= binomial(link = "logit"))
fit.summary <- summary(fit)$coef["exposure",
                                 c("Estimate", 
                                   "Std. Error", 
                                   "Pr(>|z|)")]
fit.ci <- confint(fit, level = 0.95)["exposure", ]
fit.summary_with_ci <- c(fit.summary, fit.ci)
round(fit.summary_with_ci,2) 
#>   Estimate Std. Error   Pr(>|z|)      2.5 %     97.5 % 
#>       0.41       0.04       0.00       0.34       0.49

Summary of results (log-OR).