Q1. Say, we are exploring the association between diabetes (exposure) and CVD (outcome) using a dataset where data were collecting using simple random sampling. Age, sex, race, income, diet, smoking, and hypertension are the confounders. The name of the analytic datset is dat.analytic. How could you fit cross-validated Super Learner to estimate the propensity scores?

A. fit.ps <- glm(cvd ~ age + sex + race + income + diet + smoking + hypertension, data = dat.analytic, family = binomial)
B. fit.ps <- SuperLearner(Y = dat.analytic$cvd, X = dat.analytic[, c('age', 'sex', 'race', 'income', 'diet', 'smoking', 'hypertension')], family = 'binomial', SL.library = c('SL.glm', 'SL.glmnet', 'SL.rpart'), verbose = FALSE, method = 'method.NNLS')
C. fit.ps <- CV.SuperLearner(Y = dat.analytic$cvd, X = dat.analytic$diabetes, family = 'binomial', SL.library = c('SL.glm', 'SL.glmnet', 'SL.rpart'), verbose = FALSE, V = 10, method = 'method.NNLS')
D. fit.ps <- SuperLearner(Y = dat.analytic$diabetes, X = dat.analytic[, c('age', 'sex', 'race', 'income', 'diet', 'smoking', 'hypertension')], family = 'binomial', SL.library = c('SL.glm', 'SL.glmnet', 'SL.rpart'), verbose = FALSE, method = 'method.NNLS')

Q2. How could you extract the cross-validated propensity score from Q1?

A. pscore <- cbind(fit.ps$SL.predict, fit.ps$library.predict)[,1]
B. pscore <- cbind(fit.ps$SL.predict, fit.ps$library.predict)[1,]
C. pscore <- cbind(fit.ps$SL.predict, fit.ps$library.predict)[,3]
D. pscore <- cbind(fit.ps$SL.predict, fit.ps$library.predict)[,4]

Q3. Using the propensity score calculated in Q2, we match an exposed (having diabetes) with a control (no diabetes). The name of the matched dataset is 'dat.match'. What could be the outcome model in estimating the effect of diabetes on CVD?

A. fit <- glm(cvd ~ diabetes, data = dat.match)
B. fit <- glm(cvd ~ diabetes, data = dat.match, family = binomial)
C. fit <- glm(diabetes ~ cvd, design = dat.match, family = binomial)
D. fit <- glm(cvd ~ diabetes, design = dat.analytic, family = binomial)

Q4. Reconsider Q1 where data were collected using a complex survey design, e.g., NHANES 2017-18. The name of the survey features are: psu, strata, survey.weight. How could could fit your model in Question Q1 using the DuGoff et al. (2014) approach?

A. fit.ps <- SuperLearner(Y = dat.analytic$diabetes, X = dat.analytic[, c('age', 'sex', 'race', 'income', 'diet', 'smoking', 'hypertension')], family = 'binomial', SL.library = c('SL.glm', 'SL.glmnet', 'SL.rpart'), verbose = FALSE, method = 'method.NNLS')
B. fit.ps <- CV.SuperLearner(Y = dat.analytic$cvd, X = dat.analytic$diabetes, family = 'binomial', SL.library = c('SL.glm', 'SL.glmnet', 'SL.rpart'), verbose = FALSE, V = 10, method = 'method.NNLS')
C. fit.ps <- SuperLearner(Y = dat.analytic$diabetes, X = dat.analytic[, c('age', 'sex', 'race', 'income', 'diet', 'smoking', 'hypertension', 'psu', 'strata', 'survey.weight')], family = 'binomial', SL.library = c('SL.glm', 'SL.glmnet', 'SL.rpart'), verbose = FALSE, method = 'method.NNLS')
D. fit.ps <- SuperLearner(Y = dat.analytic$cvd, X = dat.analytic[, c('age', 'sex', 'race', 'income', 'diet', 'smoking', 'hypertension', 'psu', 'strata', 'survey.weight')], family = 'binomial', SL.library = c('SL.glm', 'SL.glmnet', 'SL.rpart'), verbose = FALSE, method = 'method.NNLS')