CCHS: Regression
This tutorial is for a complex data analysis, specifically using regression techniques to analyze survey data.
Let us load necessary R packages for the analysis:
Load data
Loads a dataset and provides some quick data checks, like the dimensions and summary of weights.
Logistic for complex survey
Performs a simple logistic regression using the complex survey data, focusing on the relationship between cardiovascular disease and osteoarthritis.
Multivariable analysis
Runs a more complex logistic regression model, adding multiple covariates to better understand the relationship.
formula1 <- as.formula(I(CVD=="event") ~ OA + age + sex + married + race + 
              edu + income + bmi + phyact + doctor + stress + 
              smoke + drink + fruit + bp + diab + province + immigrate)
fit3 <- svyglm(formula1, 
              design = w.design, 
              family = binomial(logit))
publish(fit3)
#>   Variable             Units OddsRatio         CI.95     p-value 
#>         OA           Control       Ref                           
#>                           OA      1.52   [1.40;1.66]     < 1e-04 
#>        age       20-29 years       Ref                           
#>                  30-39 years      1.29   [0.98;1.69]   0.0707636 
#>                  40-49 years      2.74   [2.17;3.47]     < 1e-04 
#>                  50-59 years      6.24   [4.97;7.83]     < 1e-04 
#>                  60-64 years      9.71  [7.68;12.29]     < 1e-04 
#>            65 years and over     15.85 [12.57;20.00]     < 1e-04 
#>                         teen      0.46   [0.20;1.07]   0.0707295 
#>        sex            Female       Ref                           
#>                         Male      1.73   [1.60;1.88]     < 1e-04 
#>    married        not single       Ref                           
#>                       single      0.97   [0.90;1.05]   0.5209444 
#>       race         Non-white       Ref                           
#>                        White      1.44   [1.19;1.75]   0.0002219 
#>        edu          < 2ndary       Ref                           
#>                    2nd grad.      0.90   [0.80;1.00]   0.0512330 
#>              Other 2nd grad.      0.97   [0.83;1.13]   0.6737665 
#>               Post-2nd grad.      0.93   [0.85;1.02]   0.1016562 
#>     income   $29,999 or less       Ref                           
#>              $30,000-$49,999      0.74   [0.67;0.81]     < 1e-04 
#>              $50,000-$79,999      0.64   [0.58;0.72]     < 1e-04 
#>              $80,000 or more      0.58   [0.51;0.66]     < 1e-04 
#>        bmi       Underweight       Ref                           
#>               healthy weight      0.86   [0.67;1.10]   0.2350526 
#>                   Overweight      0.88   [0.69;1.12]   0.3033213 
#>     phyact            Active       Ref                           
#>                     Inactive      1.21   [1.10;1.34]   0.0001345 
#>                     Moderate      1.08   [0.97;1.21]   0.1771985 
#>     doctor                No       Ref                           
#>                          Yes      1.75   [1.49;2.06]     < 1e-04 
#>     stress  Not too stressed       Ref                           
#>                     stressed      1.30   [1.18;1.42]     < 1e-04 
#>      smoke      Never smoker       Ref                           
#>               Current smoker      1.18   [1.05;1.32]   0.0050518 
#>                Former smoker      1.21   [1.11;1.33]     < 1e-04 
#>      drink       Never drank       Ref                           
#>              Current drinker      0.82   [0.68;0.98]   0.0290605 
#>                Former driker      1.13   [0.93;1.36]   0.2133779 
#>      fruit 0-3 daily serving       Ref                           
#>            4-6 daily serving      0.94   [0.86;1.03]   0.1758214 
#>             6+ daily serving      1.09   [0.97;1.23]   0.1311029 
#>         bp                No       Ref                           
#>                          Yes      2.35   [2.16;2.55]     < 1e-04 
#>       diab                No       Ref                           
#>                          Yes      1.86   [1.66;2.07]     < 1e-04 
#>   province             South       Ref                           
#>                        North      1.21   [0.90;1.62]   0.2103030 
#>  immigrate     not immigrant       Ref                           
#>                   > 10 years      1.02   [0.89;1.16]   0.8057243 
#>                       recent      1.06   [0.73;1.53]   0.7651069Model fit assessment
Variability explained
Pseudo-R-square values indicate how much of the total variability in the outcomes is explainable by the fitted model (analogous to R-square). For a continuous outcome, we can compute R-square, while R-square cannot be calculated when the outcome variable is categorical, nominal or ordinal. We use pseudo-R-squared measures when the dependent variable is not continuous but a likelihood function is used to fit a model. Popular Pseudo-R-square measures are:
- Cox/Snell (never reaches max 1)
 - Nagelkerke R-square (scaled to max 1)
 
- The larger Cox & Snell estimate is the better the model.
 - These Pseudo-R-square values should be interpreted with caution (if not ignored).
 - They offer little confidence in interpreting the model fit.
 - Survey weighted version of them are available.
 - Not trivial to decide which statistic to use under complex surveys.
 
Evaluates the model fit using Akaike Information Criterion (AIC) and pseudo R-squared metrics.
fit3 <- svyglm(formula1, 
              design = w.design, 
              family = quasibinomial(logit)) # publish does not work
AIC(fit3) 
#>        eff.p          AIC     deltabar 
#>    67.752064 45362.706032     2.053093
# AIC for survey weighted regressions
psrsq(fit3, method = "Cox-Snell")
#> [1] 0.06091896
psrsq(fit3, method = "Nagelkerke")
#> [1] 0.2307586
# Nagelkerke and Cox-Snell pseudo-rsquared statisticsBackward Elimination
- Model comparisons
- LRT-aprroximation
 - Wald-based
 
 
Checking one by one
Checks the significance of each variable one by one and removes those that are not statistically significant.
round(sort(summary(fit3)$coef[,"Pr(>|t|)"]),2)
#>            (Intercept)   age65 years and over                  bpYes 
#>                   0.00                   0.00                   0.00 
#>         age60-64 years         age50-59 years                sexMale 
#>                   0.00                   0.00                   0.00 
#>                diabYes                   OAOA  income$80,000 or more 
#>                   0.00                   0.00                   0.00 
#>         age40-49 years  income$50,000-$79,999              doctorYes 
#>                   0.00                   0.00                   0.00 
#>  income$30,000-$49,999         stressstressed     smokeFormer smoker 
#>                   0.00                   0.00                   0.00 
#>         phyactInactive              raceWhite    smokeCurrent smoker 
#>                   0.00                   0.00                   0.01 
#>   drinkCurrent drinker           edu2nd grad.                ageteen 
#>                   0.03                   0.05                   0.07 
#>         age30-39 years      eduPost-2nd grad.  fruit6+ daily serving 
#>                   0.07                   0.10                   0.13 
#> fruit4-6 daily serving         phyactModerate          provinceNorth 
#>                   0.18                   0.18                   0.21 
#>     drinkFormer driker      bmihealthy weight          bmiOverweight 
#>                   0.21                   0.24                   0.30 
#>          marriedsingle     eduOther 2nd grad.        immigraterecent 
#>                   0.52                   0.67                   0.77 
#>    immigrate> 10 years 
#>                   0.81
# bmiOverweight is associated with largest p-value
# but what about other categories?
regTermTest(fit3,~bmi) # coef of all bmi cat = 0
#> Wald test for bmi
#>  in svyglm(formula = formula1, design = w.design, family = quasibinomial(logit))
#> F =  0.7591291  on  2  and  185579  df: p= 0.46808
fit4 <- update(fit3, .~. -bmi) 
anova(fit3, fit4)
#> Working (Rao-Scott+F) LRT for bmi
#>  in svyglm(formula = formula1, design = w.design, family = quasibinomial(logit))
#> Working 2logLR =  1.424634 p= 0.49071 
#> (scale factors:  1.1 0.93 );  denominator df= 185579
# high p-value (in both wald and Anova) makes it more likely that you should exclude bmi
AIC(fit3,fit4) 
#>         eff.p      AIC deltabar
#> [1,] 67.75206 45362.71 2.053093
#> [2,] 64.30460 45358.26 2.074342
round(sort(summary(fit4)$coef[,"Pr(>|t|)"]),2)
#>            (Intercept)   age65 years and over                  bpYes 
#>                   0.00                   0.00                   0.00 
#>         age60-64 years         age50-59 years                sexMale 
#>                   0.00                   0.00                   0.00 
#>                diabYes                   OAOA  income$80,000 or more 
#>                   0.00                   0.00                   0.00 
#>         age40-49 years  income$50,000-$79,999              doctorYes 
#>                   0.00                   0.00                   0.00 
#>  income$30,000-$49,999         stressstressed     smokeFormer smoker 
#>                   0.00                   0.00                   0.00 
#>         phyactInactive              raceWhite    smokeCurrent smoker 
#>                   0.00                   0.00                   0.00 
#>   drinkCurrent drinker           edu2nd grad.         age30-39 years 
#>                   0.03                   0.05                   0.07 
#>                ageteen      eduPost-2nd grad.  fruit6+ daily serving 
#>                   0.07                   0.10                   0.13 
#> fruit4-6 daily serving         phyactModerate          provinceNorth 
#>                   0.17                   0.17                   0.21 
#>     drinkFormer driker          marriedsingle     eduOther 2nd grad. 
#>                   0.21                   0.53                   0.67 
#>        immigraterecent    immigrate> 10 years 
#>                   0.76                   0.81Using AIC to automate
Uses stepwise regression guided by AIC to automatically select the most important variables.
require(MASS)
formula1b <- as.formula(I(CVD=="event") ~ OA + age + sex)
fit1b <- svyglm(formula1b, 
              design = w.design, 
              family = binomial(logit))
fit5 <- stepAIC(fit1b, direction = "backward")
#> Start:  AIC=47384.51
#> I(CVD == "event") ~ OA + age + sex
#>        Df Deviance   AIC
#> <none>       47353 47385
#> - sex   1    47634 47658
#> - OA    1    47679 47702
#> - age   6    54414 54291publish(fit5)
#>  Variable             Units OddsRatio         CI.95   p-value 
#>        OA           Control       Ref                         
#>                          OA      1.81   [1.66;1.97]   < 1e-04 
#>       age       20-29 years       Ref                         
#>                 30-39 years      1.40   [1.07;1.84]   0.01374 
#>                 40-49 years      3.39   [2.69;4.27]   < 1e-04 
#>                 50-59 years      9.42  [7.55;11.76]   < 1e-04 
#>                 60-64 years     17.78 [14.22;22.23]   < 1e-04 
#>           65 years and over     33.82 [27.26;41.97]   < 1e-04 
#>                        teen      0.45   [0.20;1.02]   0.05462 
#>       sex            Female       Ref                         
#>                        Male      1.57   [1.46;1.69]   < 1e-04
round(sort(summary(fit5)$coef[,"Pr(>|t|)"]),2)
#>          (Intercept) age65 years and over       age60-64 years 
#>                 0.00                 0.00                 0.00 
#>       age50-59 years                 OAOA              sexMale 
#>                 0.00                 0.00                 0.00 
#>       age40-49 years       age30-39 years              ageteen 
#>                 0.00                 0.01                 0.05Using AIC, but keeping importants
Similar to the previous step but ensures certain important variables remain in the model.
formula1c <- as.formula(I(CVD=="event") ~ OA + age + sex + married + race +
                         edu + income + bmi + phyact + fruit + bp + diab + 
                         doctor + stress + smoke + drink + province + immigrate)
scope <- list(upper = ~ OA + age + sex + married + race +
                         edu + income + bmi + phyact + fruit + bp + diab + 
                         doctor + stress + smoke + drink + province + immigrate,
              lower = ~ OA + age + sex + married + race +
                         edu + income + bmi + phyact + fruit + bp + diab)
fit1c <- svyglm(formula1c, design = w.design, family = binomial(logit))
fitstep <- step(fit1c, scope = scope, trace = FALSE, k = 2, direction = "backward")
# k = 2 gives the genuine AICpublish(fitstep)
#>  Variable             Units OddsRatio         CI.95     p-value 
#>        OA           Control       Ref                           
#>                          OA      1.52   [1.40;1.66]     < 1e-04 
#>       age       20-29 years       Ref                           
#>                 30-39 years      1.29   [0.98;1.70]   0.0697699 
#>                 40-49 years      2.74   [2.17;3.47]     < 1e-04 
#>                 50-59 years      6.24   [4.97;7.83]     < 1e-04 
#>                 60-64 years      9.71  [7.68;12.28]     < 1e-04 
#>           65 years and over     15.85 [12.57;19.98]     < 1e-04 
#>                        teen      0.46   [0.20;1.07]   0.0705660 
#>       sex            Female       Ref                           
#>                        Male      1.73   [1.60;1.88]     < 1e-04 
#>   married        not single       Ref                           
#>                      single      0.97   [0.90;1.05]   0.5042496 
#>      race         Non-white       Ref                           
#>                       White      1.41   [1.18;1.70]   0.0001986 
#>       edu          < 2ndary       Ref                           
#>                   2nd grad.      0.90   [0.80;1.00]   0.0524940 
#>             Other 2nd grad.      0.97   [0.83;1.13]   0.6732446 
#>              Post-2nd grad.      0.93   [0.85;1.02]   0.1045975 
#>    income   $29,999 or less       Ref                           
#>             $30,000-$49,999      0.74   [0.67;0.81]     < 1e-04 
#>             $50,000-$79,999      0.64   [0.58;0.72]     < 1e-04 
#>             $80,000 or more      0.58   [0.51;0.66]     < 1e-04 
#>       bmi       Underweight       Ref                           
#>              healthy weight      0.86   [0.67;1.10]   0.2316915 
#>                  Overweight      0.88   [0.69;1.12]   0.2982852 
#>    phyact            Active       Ref                           
#>                    Inactive      1.22   [1.10;1.34]   0.0001227 
#>                    Moderate      1.08   [0.97;1.21]   0.1754422 
#>     fruit 0-3 daily serving       Ref                           
#>           4-6 daily serving      0.94   [0.86;1.03]   0.1807666 
#>            6+ daily serving      1.09   [0.97;1.23]   0.1295281 
#>        bp                No       Ref                           
#>                         Yes      2.35   [2.16;2.55]     < 1e-04 
#>      diab                No       Ref                           
#>                         Yes      1.85   [1.66;2.07]     < 1e-04 
#>    doctor                No       Ref                           
#>                         Yes      1.75   [1.49;2.05]     < 1e-04 
#>    stress  Not too stressed       Ref                           
#>                    stressed      1.30   [1.18;1.42]     < 1e-04 
#>     smoke      Never smoker       Ref                           
#>              Current smoker      1.17   [1.05;1.31]   0.0053412 
#>               Former smoker      1.21   [1.10;1.33]     < 1e-04 
#>     drink       Never drank       Ref                           
#>             Current drinker      0.82   [0.68;0.98]   0.0254942 
#>               Former driker      1.12   [0.93;1.36]   0.2205315
round(sort(summary(fitstep)$coef[,"Pr(>|t|)"]),2)
#>            (Intercept)   age65 years and over                  bpYes 
#>                   0.00                   0.00                   0.00 
#>         age60-64 years         age50-59 years                sexMale 
#>                   0.00                   0.00                   0.00 
#>                diabYes                   OAOA  income$80,000 or more 
#>                   0.00                   0.00                   0.00 
#>         age40-49 years  income$50,000-$79,999              doctorYes 
#>                   0.00                   0.00                   0.00 
#>  income$30,000-$49,999         stressstressed     smokeFormer smoker 
#>                   0.00                   0.00                   0.00 
#>         phyactInactive              raceWhite    smokeCurrent smoker 
#>                   0.00                   0.00                   0.01 
#>   drinkCurrent drinker           edu2nd grad.         age30-39 years 
#>                   0.03                   0.05                   0.07 
#>                ageteen      eduPost-2nd grad.  fruit6+ daily serving 
#>                   0.07                   0.10                   0.13 
#>         phyactModerate fruit4-6 daily serving     drinkFormer driker 
#>                   0.18                   0.18                   0.22 
#>      bmihealthy weight          bmiOverweight          marriedsingle 
#>                   0.23                   0.30                   0.50 
#>     eduOther 2nd grad. 
#>                   0.67Assess interactions
Check biologically interesting ones.
Check one by one
Checks if there is a significant interaction effect between ‘age’ and ‘sex’.
fit8a <- update(fitstep, .~. + interaction(age,sex))
anova(fitstep, fit8a) # keep interaction
#> Working (Rao-Scott+F) LRT for interaction(age, sex)
#>  in svyglm(formula = I(CVD == "event") ~ OA + age + sex + married + 
#>     race + edu + income + bmi + phyact + fruit + bp + diab + 
#>     doctor + stress + smoke + drink + interaction(age, sex), 
#>     design = w.design, family = binomial(logit))
#> Working 2logLR =  40.16528 p= 1.2167e-06 
#> (scale factors:  1.3 1.2 1.2 0.93 0.78 0.71 );  denominator df= 185576Checks if there is a significant interaction effect between ‘sex’ and ‘diabetes’.
fit8b <- update(fitstep, .~. + interaction(sex,diab))
anova(fitstep, fit8b) # Do not keep this interaction
#> Working (Rao-Scott+F) LRT for interaction(sex, diab)
#>  in svyglm(formula = I(CVD == "event") ~ OA + age + sex + married + 
#>     race + edu + income + bmi + phyact + fruit + bp + diab + 
#>     doctor + stress + smoke + drink + interaction(sex, diab), 
#>     design = w.design, family = binomial(logit))
#> Working 2logLR =  0.4591456 p= 0.49597 
#> df=1;  denominator df= 185581Checks if there is a significant interaction effect between ‘BMI’ and ‘diabetes’.
fit8c <- update(fitstep, .~. + interaction(bmi,diab))
anova(fitstep, fit8c) # keep this interaction
#> Working (Rao-Scott+F) LRT for interaction(bmi, diab)
#>  in svyglm(formula = I(CVD == "event") ~ OA + age + sex + married + 
#>     race + edu + income + bmi + phyact + fruit + bp + diab + 
#>     doctor + stress + smoke + drink + interaction(bmi, diab), 
#>     design = w.design, family = binomial(logit))
#> Working 2logLR =  7.92727 p= 0.02533 
#> (scale factors:  1.4 0.6 );  denominator df= 185580Add all significant interactions in 1 model
Updates the model to include significant interaction terms.
Note that we have 0 effect modifier, 2 interactions
fit9 <- update(fitstep, .~. + interaction(age,sex) + interaction(bmi,diab))
require(jtools)
summ(fit9, confint = TRUE, digits = 3)| Observations | 185613 | 
| Dependent variable | I(CVD == "event") | 
| Type | Survey-weighted generalized linear model | 
| Family | binomial | 
| Link | logit | 
| Pseudo-R² (Cragg-Uhler) | 0.233 | 
| Pseudo-R² (McFadden) | 0.207 | 
| AIC | 41548.160 | 
| Est. | 2.5% | 97.5% | t val. | p | |
|---|---|---|---|---|---|
| (Intercept) | -5.590 | -6.046 | -5.135 | -24.069 | 0.000 | 
| OAOA | 0.438 | 0.352 | 0.523 | 10.043 | 0.000 | 
| age30-39 years | 0.299 | -0.118 | 0.717 | 1.405 | 0.160 | 
| age40-49 years | 1.158 | 0.794 | 1.522 | 6.236 | 0.000 | 
| age50-59 years | 2.181 | 1.831 | 2.531 | 12.212 | 0.000 | 
| age60-64 years | 2.619 | 2.263 | 2.976 | 14.395 | 0.000 | 
| age65 years and over | 3.033 | 2.680 | 3.387 | 16.833 | 0.000 | 
| ageteen | -0.695 | -1.704 | 0.314 | -1.350 | 0.177 | 
| sexMale | -0.028 | -0.447 | 0.390 | -0.133 | 0.895 | 
| marriedsingle | -0.016 | -0.096 | 0.064 | -0.382 | 0.703 | 
| raceWhite | 0.348 | 0.165 | 0.531 | 3.724 | 0.000 | 
| edu2nd grad. | -0.107 | -0.217 | 0.003 | -1.906 | 0.057 | 
| eduOther 2nd grad. | -0.039 | -0.192 | 0.114 | -0.498 | 0.618 | 
| eduPost-2nd grad. | -0.081 | -0.173 | 0.010 | -1.746 | 0.081 | 
| income$30,000-$49,999 | -0.304 | -0.400 | -0.208 | -6.215 | 0.000 | 
| income$50,000-$79,999 | -0.444 | -0.552 | -0.336 | -8.034 | 0.000 | 
| income$80,000 or more | -0.555 | -0.684 | -0.426 | -8.424 | 0.000 | 
| bmihealthy weight | -1.406 | -2.677 | -0.135 | -2.167 | 0.030 | 
| bmiOverweight | -1.110 | -2.375 | 0.154 | -1.721 | 0.085 | 
| phyactInactive | 0.194 | 0.094 | 0.293 | 3.817 | 0.000 | 
| phyactModerate | 0.077 | -0.034 | 0.187 | 1.353 | 0.176 | 
| fruit4-6 daily serving | -0.062 | -0.155 | 0.031 | -1.313 | 0.189 | 
| fruit6+ daily serving | 0.094 | -0.023 | 0.211 | 1.579 | 0.114 | 
| bpYes | 0.861 | 0.778 | 0.944 | 20.354 | 0.000 | 
| diabYes | 1.745 | 0.462 | 3.027 | 2.667 | 0.008 | 
| doctorYes | 0.535 | 0.372 | 0.697 | 6.447 | 0.000 | 
| stressstressed | 0.256 | 0.164 | 0.347 | 5.483 | 0.000 | 
| smokeCurrent smoker | 0.152 | 0.039 | 0.266 | 2.628 | 0.009 | 
| smokeFormer smoker | 0.177 | 0.082 | 0.272 | 3.654 | 0.000 | 
| drinkCurrent drinker | -0.205 | -0.381 | -0.029 | -2.277 | 0.023 | 
| drinkFormer driker | 0.116 | -0.072 | 0.303 | 1.210 | 0.226 | 
| interaction(age, sex)30-39 years.Female | -0.083 | -0.622 | 0.457 | -0.300 | 0.764 | 
| interaction(age, sex)40-49 years.Female | -0.302 | -0.766 | 0.161 | -1.278 | 0.201 | 
| interaction(age, sex)50-59 years.Female | -0.810 | -1.258 | -0.362 | -3.543 | 0.000 | 
| interaction(age, sex)60-64 years.Female | -0.787 | -1.237 | -0.337 | -3.428 | 0.001 | 
| interaction(age, sex)65 years and over.Female | -0.603 | -1.035 | -0.170 | -2.733 | 0.006 | 
| interaction(age, sex)teen.Female | -0.129 | -1.806 | 1.548 | -0.151 | 0.880 | 
| interaction(bmi, diab)healthy weight.No | 1.380 | 0.085 | 2.675 | 2.089 | 0.037 | 
| interaction(bmi, diab)Overweight.No | 1.085 | -0.204 | 2.373 | 1.650 | 0.099 | 
| Standard errors: Robust | 
fit9 <- update(fitstep, .~. + age:sex + bmi:diab)
publish(fit9)
#>                                            Variable             Units OddsRatio         CI.95     p-value 
#>                                                  OA           Control       Ref                           
#>                                                                    OA      1.55   [1.42;1.69]     < 1e-04 
#>                                             married        not single       Ref                           
#>                                                                single      0.98   [0.91;1.07]   0.7026897 
#>                                                race         Non-white       Ref                           
#>                                                                 White      1.42   [1.18;1.70]   0.0001960 
#>                                                 edu          < 2ndary       Ref                           
#>                                                             2nd grad.      0.90   [0.81;1.00]   0.0566536 
#>                                                       Other 2nd grad.      0.96   [0.83;1.12]   0.6183383 
#>                                                        Post-2nd grad.      0.92   [0.84;1.01]   0.0807393 
#>                                              income   $29,999 or less       Ref                           
#>                                                       $30,000-$49,999      0.74   [0.67;0.81]     < 1e-04 
#>                                                       $50,000-$79,999      0.64   [0.58;0.71]     < 1e-04 
#>                                                       $80,000 or more      0.57   [0.50;0.65]     < 1e-04 
#>                                              phyact            Active       Ref                           
#>                                                              Inactive      1.21   [1.10;1.34]   0.0001349 
#>                                                              Moderate      1.08   [0.97;1.21]   0.1760803 
#>                                               fruit 0-3 daily serving       Ref                           
#>                                                     4-6 daily serving      0.94   [0.86;1.03]   0.1892549 
#>                                                      6+ daily serving      1.10   [0.98;1.23]   0.1142759 
#>                                                  bp                No       Ref                           
#>                                                                   Yes      2.37   [2.18;2.57]     < 1e-04 
#>                                              doctor                No       Ref                           
#>                                                                   Yes      1.71   [1.45;2.01]     < 1e-04 
#>                                              stress  Not too stressed       Ref                           
#>                                                              stressed      1.29   [1.18;1.42]     < 1e-04 
#>                                               smoke      Never smoker       Ref                           
#>                                                        Current smoker      1.16   [1.04;1.30]   0.0085960 
#>                                                         Former smoker      1.19   [1.09;1.31]   0.0002579 
#>                                               drink       Never drank       Ref                           
#>                                                       Current drinker      0.81   [0.68;0.97]   0.0227934 
#>                                                         Former driker      1.12   [0.93;1.35]   0.2264702 
#>               age(20-29 years): sex(Male vs Female)                        0.97   [0.64;1.48]   0.8945322 
#>               age(30-39 years): sex(Male vs Female)                        1.06   [0.75;1.49]   0.7583565 
#>               age(40-49 years): sex(Male vs Female)                        1.32   [1.07;1.62]   0.0091527 
#>               age(50-59 years): sex(Male vs Female)                        2.18   [1.85;2.58]     < 1e-04 
#>               age(60-64 years): sex(Male vs Female)                        2.14   [1.80;2.53]     < 1e-04 
#>         age(65 years and over): sex(Male vs Female)                        1.78   [1.58;1.99]     < 1e-04 
#>                      age(teen): sex(Male vs Female)                        1.11   [0.22;5.62]   0.9033924 
#>        sex(Female): age(30-39 years vs 20-29 years)                        1.24   [0.87;1.77]   0.2276609 
#>        sex(Female): age(40-49 years vs 20-29 years)                        2.35   [1.75;3.16]     < 1e-04 
#>        sex(Female): age(50-59 years vs 20-29 years)                        3.94   [2.95;5.27]     < 1e-04 
#>        sex(Female): age(60-64 years vs 20-29 years)                        6.25   [4.66;8.39]     < 1e-04 
#>  sex(Female): age(65 years and over vs 20-29 years)                       11.37  [8.60;15.02]     < 1e-04 
#>               sex(Female): age(teen vs 20-29 years)                        0.44   [0.11;1.69]   0.2320840 
#>          sex(Male): age(30-39 years vs 20-29 years)                        1.35   [0.89;2.05]   0.1599938 
#>          sex(Male): age(40-49 years vs 20-29 years)                        3.18   [2.21;4.58]     < 1e-04 
#>          sex(Male): age(50-59 years vs 20-29 years)                        8.85  [6.24;12.56]     < 1e-04 
#>          sex(Male): age(60-64 years vs 20-29 years)                       13.73  [9.61;19.61]     < 1e-04 
#>    sex(Male): age(65 years and over vs 20-29 years)                       20.77 [14.59;29.57]     < 1e-04 
#>                 sex(Male): age(teen vs 20-29 years)                        0.50   [0.18;1.37]   0.1769155 
#>                   bmi(Underweight): diab(Yes vs No)                        5.72  [1.59;20.63]   0.0076561 
#>                bmi(healthy weight): diab(Yes vs No)                        1.44   [1.19;1.75]   0.0002221 
#>                    bmi(Overweight): diab(Yes vs No)                        1.93   [1.70;2.20]     < 1e-04 
#>        diab(No): bmi(healthy weight vs Underweight)                        0.97   [0.76;1.25]   0.8394972 
#>            diab(No): bmi(Overweight vs Underweight)                        0.97   [0.76;1.25]   0.8400722 
#>       diab(Yes): bmi(healthy weight vs Underweight)                        0.25   [0.07;0.87]   0.0302066 
#>           diab(Yes): bmi(Overweight vs Underweight)                        0.33   [0.09;1.17]   0.0852377basic.model <- eval(fit5$call[[2]])
basic.model
#> I(CVD == "event") ~ OA + age + sex
#> attr(,"variables")
#> list(I(CVD == "event"), OA, age, sex)
#> attr(,"factors")
#>                   OA age sex
#> I(CVD == "event")  0   0   0
#> OA                 1   0   0
#> age                0   1   0
#> sex                0   0   1
#> attr(,"term.labels")
#> [1] "OA"  "age" "sex"
#> attr(,"order")
#> [1] 1 1 1
#> attr(,"intercept")
#> [1] 1
#> attr(,"response")
#> [1] 1
#> attr(,".Environment")
#> <environment: R_GlobalEnv>
#> attr(,"predvars")
#> list(I(CVD == "event"), OA, age, sex)
#> attr(,"dataClasses")
#> I(CVD == "event")                OA               age               sex 
#>         "logical"          "factor"          "factor"          "factor" 
#>         (weights) 
#>         "numeric"
aic.int.model <- eval(fit9$call[[2]])
aic.int.model
#> I(CVD == "event") ~ OA + age + sex + married + race + edu + income + 
#>     bmi + phyact + fruit + bp + diab + doctor + stress + smoke + 
#>     drink + age:sex + bmi:diabSaving data
Saves the final regression models for future use.
Video content (optional)
For those who prefer a video walkthrough, feel free to watch the video below, which offers a description of an earlier version of the above content.