Q1. We are exploring the bivariate association between treatment (1: received treatment, 0: otherwise) and diabetes (1: yes, 0: otherwise) with a complex survey dataset. The name of the analytic dataset is `dat.analytic`, and the design `survey.design` is created using the svydesign function. Which R code could be used to explore the association? Note: survey features should be utilized to answer this and next questions.
- A. table(dat.analytic$treatment, dat.analytic$diabetes)
- B. prop.table(table(dat.analytic$treatment, dat.analytic$diabetes))
- C. svyby(~treatment, ~diabetes, design = survey.design, svymean, deff = TRUE)
- D. svychisq(~diabetes + treatment, design = survey.design, statistic = 'F')
Q2. Which codes could be used to explore the relationship between treatment and diabetes in Q1 using regression analysis?
- A. fit2 <- glm(diabetes ~ treatment, data = dat.analytic); summary(fit)
- B. fit2 <- svyglm(diabetes ~ treatment, design = survey.design); summary(fit)
- C. fit2 <- svyglm(diabetes ~ treatment, design = survey.design, family = binomial); summary(fit)
- D. fit2 <- svyglm(treatment ~ diabetes, design = survey.design, family = binomial); summary(fit)
question("Q3. From the literature, you know that sex and race needs to be adjusted in the model, but you are not sure about income and diet. How could you run an AIC based backward selection process to figure out whether you
- A. i) Fit a model with treatment, sex and race; ii) Fit the second model with treatment, income and diet; iii) Compare models i and ii using the AIC() function and choose the model with the lowest AIC; iv) Report the odds ratio with 95% CI.
- B. i) Fit a model with treatment, sex, race, income and diet; ii) Fit the second model with treatment, income and diet; iii) Compare models i and ii using the AIC() function and choose the model with the lowest AIC; iv) Report the odds ratio with 95% CI.
- C. i) Fit a model with treatment, sex, race, income and diet; ii) Define the range of models examined in the stepwise search; iii) Use the step() function with the initial model, scope and direction arguments to automate the variable selection; iv) Report the odds ratio with 95% CI.
- D. i) Fit a model with treatment, sex, race, income and diet; ii) Use stepAIC() function to automate the variable selection; iii) Report the odds ratio with 95% CI.
How could you test for adding an interaction term between sex and race in the final model in Q3 (fit3)?
- A. fit4 <- update(fit3, sex + race + income + diet + sex:race); anova(fit4, fit3)$p
- B. fit4 <- update(fit3, .~. + interaction(sex*race)); anova(fit4, fit3)$p
- C. fit4 <- update(fit3, .~. + sex:race); anova(fit4, fit3)$p
- D. fit4 <- update(fit3, sex + race + income + diet + interaction(sex, race)); anova(fit4, fit3)$p
Note: the update() function helps update and refit a model. Ensure to set an appropriate formula.
What is the appropriate way to fit a regression model on sub-population, e.g., only on females?
- A. i) Subset the analytic dataset for females using the subset() function; ii) Set up the design using svydesign() function; iii) Fit the regression model using the glm() function
- B. i) Set up the design using svydesign() function on full dataset; ii) Subset the design for females using the subset() function; iii) Fit the regression model using the glm() function
- C. Both are wrong
- D. Both are correct
To correctly estimate the variance, we must consider the full complex survey design structure.