One way to identify collinear predictors is hierarchical clustering approach. Which R function can be used to run a hierarchical cluster analysis?

A. plot function
B. describe function from Hmisc package
C. varclus function from Hmisc package

Which R function can be used to visualize a correlation matrix that shows the relationships between continuous variables?

A. table
B. corrplot
C. plot
D. vif

Say, we aim to develop a prediction model to predict diabetes (a binary variable) based on some sociodemographic and clinical risk factors. \n We fit logistic regression model as follows: mod <- glm(diabetes ~ age + sex + race + education + triglycerides + protein + bilirubin + phosphorus + sodium + potassium + globulin + calcium, data = dat.train, family = binomial). \n The predicted probability of diabetes is calculated as: pred.diabetes <- predict(mod, type = 'response', newdata = dat.test). How would you calculate the area under the curve (AUC) values on the test data (dat.test)?

A. pROC::roc(dat.train$diabetes, pred.diabetes)
B. pROC::roc(mod)
C. pROC::roc(dat.test$diabetes, pred.diabetes)
D. pROC::roc(dat.test$diabetes)
E. pROC::roc(dat.test$pred.diabetes)

Which methods could be used to measure prediction error for continuous outcomes?

A. R-square, Adjusted R-square, RMSE, Coefficient of determination
B. R-square, Adjusted R-square, RMSE
C. R-square, Adjusted R-square, RMSE, AUC
D. R-square, Adjusted R-square, RMSE, C-statistic

Say, you aim to build a prediction model to predict CVD among Canadian adults using logistic regression. Which methods could be used to deal with model overfitting? (select ALL that apply)

A. Model fitting on full dataset
B. Selecting 20% of data
C. Spliting the dataset into training and testing sets
D. 1-fold/leave-one-out cross-validation
E. Increasing the number of predictors in the model
F. 10-fold cross-validation
G. Bootstrapping