Survival analysis: NHANES

The tutorial demonstrates analyzing complex survey data (NHANES) with mortality as a survival outcome. See the tutorial in the previous chapter on linking public-use US mortality data with the NHANES.

We will explore the relationship between caffeine consumption and mortality in adults with diabetes using NHANES 1999-2010 datasets. We will follow the following article by Neves et al. (2018).

Load required packages

# Load required packages
library(tableone)
library(survival)
library(Publish)
library(survey)
library(DataExplorer)

Load datasets

Let us load NHANES 1999-2000, 2001-2002, 2003-2004, 2005-2006, 2007-2008, and 2009-2010 datasets and merged all cycles.

# Load
load("Data/nonbinary/coffee.RData")
ls()
#> [1] "dat.analytic1999" "dat.analytic2001" "dat.analytic2003" "dat.analytic2005"
#> [5] "dat.analytic2007" "dat.analytic2009"

# Merge all cycles
dat.full <- rbind(dat.analytic1999, dat.analytic2001, dat.analytic2003, 
                  dat.analytic2005, dat.analytic2007, dat.analytic2009)
head(dat.full)

dim(dat.full)
#> [1] 62160    26

The merged dataset contains 62,160 subjects with 26 relevant variables:

id: Respondent sequence/ID number
survey.weight: Full sample 2 year weights
psu: Masked pseudo-PSU
strata: Masked pseudo-stratum
caff: Caffeine (exposure variable)
stime: Follow-up time (time from interview date to death or censoring)
status: Mortality status
sex: Sex
age: Age in years
race: Race/ethnicity
income: Annual household income
smoking: Smoking status
diabetic.nephropathy: Diabetic nephropathy (no data)
bmi.cat: BMI - categorical
education: Education level
carbohyd: Carbohydrate in gm
alcohol: Alcohol consumption
diab.years: Years since diabetes
htn: Hypertension
daib.retino: Diabetes retinopathy
macrovascular: Macrovascular complications
insulin: Insulin
survey.cycle: Survey cycle
physical.activity: Physical activity
diabetes: Diabetes status
cal.total: Total calories in kcal

Data pre-processing

Eligibility criteria

The authors considered adults aged 18 years or more, only diabetic, and total calories between 500 and 3500.

# Total samples in the merged dataset
nrow(dat.full) # N = 62,160
#> [1] 62160

# Age >= 18 years
dat2 <- subset(dat.full, age >= 18) 
nrow(dat2) # N = 35,379
#> [1] 35379

# With diabetes
dat3 <- subset(dat2, diabetes == "Yes") 
nrow(dat3) # N = 4,687 - numbers don't match with the paper (N = 4,544)
#> [1] 4687

# Implausible alimentary reports
dat4 <- subset(dat3, cal.total >= 500 & cal.total <= 3500) 
nrow(dat4) # N = 4,083 - numbers don't match with the paper (N = 3,948)
#> [1] 4083

Complete case data

Let us drop missing values in the exposure and outcome:

# Drop missing exposure and outcome
dat <- dat4[complete.cases(dat4$id),]
dat <- dat[complete.cases(dat$caff),]
dat <- dat[complete.cases(dat$status),]
dat <- dat[complete.cases(dat$stime),] # N = 4,080
dim(dat)
#> [1] 4080   26

# Missing plot
plot_missing(dat)

Now, let us drop variables with high missingness for this exercise. As explained in the Missing data analysis chapter, a better approach could be imputing missing values under the missing at random assumption.

# Drop variables with high missingness
dat$diabetic.nephropathy <- dat$diab.years <- dat$daib.retino <- dat$income <- NULL

# Complete case data
dat <- na.omit(dat)
dim(dat) # N = 3,780
#> [1] 3780   22

Table 1

Now, let us create Table 1 stratified by coffee consumption (exposure), separately for males and females, as done in the article.

vars <- c("age", "race", "education", "smoking", "alcohol", "carbohyd", "physical.activity",
          "bmi.cat", "htn", "macrovascular", "insulin", "survey.cycle")

tab1a <- CreateTableOne(vars = vars, strata = "caff", data = dat[dat$sex=="Female",], 
                        test = F)
tab1b <- CreateTableOne(vars = vars, strata = "caff", data = dat[dat$sex=="Male",], 
                        test = F)

tab1 <- list(Female = tab1a, Male = tab1b)
print(tab1, showAllLevels = T, smd = T)
#> $Female
#>                        Stratified by caff
#>                         level                     No consumption <100 mg/day   
#>   n                                                  203            916        
#>   age (mean (SD))                                  59.33 (15.14)  62.92 (14.02)
#>   race (%)              Non-Hispanic White            38 (18.7)     286 (31.2) 
#>                         Non-Hispanic Black            96 (47.3)     284 (31.0) 
#>                         Mexican American              47 (23.2)     228 (24.9) 
#>                         Other Hispanic                10 ( 4.9)      80 ( 8.7) 
#>                         Other race                    12 ( 5.9)      38 ( 4.1) 
#>   education (%)         Less than 9th grade           58 (28.6)     238 (26.0) 
#>                         9-11th grade                  48 (23.6)     182 (19.9) 
#>                         High school grade             43 (21.2)     224 (24.5) 
#>                         Some college                  39 (19.2)     209 (22.8) 
#>                         College graduate or above     15 ( 7.4)      63 ( 6.9) 
#>   smoking (%)           Never smoker                 143 (70.4)     598 (65.3) 
#>                         Current smoker                24 (11.8)     127 (13.9) 
#>                         Former smoker                 36 (17.7)     191 (20.9) 
#>   alcohol (%)           No consumption               192 (94.6)     826 (90.2) 
#>                         <20 grams/day                  4 ( 2.0)      69 ( 7.5) 
#>                         20+ grams/day                  7 ( 3.4)      21 ( 2.3) 
#>   carbohyd (mean (SD))                            176.17 (73.88) 190.64 (73.34)
#>   physical.activity (%) Low                           95 (46.8)     393 (42.9) 
#>                         Intermediate                  59 (29.1)     309 (33.7) 
#>                         High                          49 (24.1)     214 (23.4) 
#>   bmi.cat (%)           <20.0                          4 ( 2.0)       6 ( 0.7) 
#>                         20.0 to <25.0                 24 (11.8)     107 (11.7) 
#>                         25.0 to <30.0                 35 (17.2)     254 (27.7) 
#>                         30.0 to <35.0                 45 (22.2)     263 (28.7) 
#>                         35.0 to <40.0                 47 (23.2)     138 (15.1) 
#>                         40.0+                         48 (23.6)     148 (16.2) 
#>   htn (%)               No                            58 (28.6)     273 (29.8) 
#>                         Yes                          145 (71.4)     643 (70.2) 
#>   macrovascular (%)     No                           166 (81.8)     758 (82.8) 
#>                         Yes                           37 (18.2)     158 (17.2) 
#>   insulin (%)           No                           156 (76.8)     755 (82.4) 
#>                         Yes                           47 (23.2)     161 (17.6) 
#>   survey.cycle (%)      1999-00                       45 (22.2)     111 (12.1) 
#>                         2001-02                       45 (22.2)      98 (10.7) 
#>                         2003-04                       26 (12.8)     127 (13.9) 
#>                         2005-06                       20 ( 9.9)     154 (16.8) 
#>                         2007-08                       37 (18.2)     198 (21.6) 
#>                         2009-10                       30 (14.8)     228 (24.9) 
#>                        Stratified by caff
#>                         100-200 mg/day 200+ mg/day    SMD   
#>   n                        424            332               
#>   age (mean (SD))        62.04 (13.75)  60.19 (13.10)  0.150
#>   race (%)                 154 (36.3)     189 (56.9)   0.527
#>                            108 (25.5)      40 (12.0)        
#>                            111 (26.2)      69 (20.8)        
#>                             30 ( 7.1)      20 ( 6.0)        
#>                             21 ( 5.0)      14 ( 4.2)        
#>   education (%)             79 (18.6)      56 (16.9)   0.203
#>                             89 (21.0)      76 (22.9)        
#>                            109 (25.7)      81 (24.4)        
#>                            109 (25.7)      89 (26.8)        
#>                             38 ( 9.0)      30 ( 9.0)        
#>   smoking (%)              238 (56.1)     136 (41.0)   0.377
#>                             81 (19.1)     119 (35.8)        
#>                            105 (24.8)      77 (23.2)        
#>   alcohol (%)              370 (87.3)     283 (85.2)   0.211
#>                             38 ( 9.0)      35 (10.5)        
#>                             16 ( 3.8)      14 ( 4.2)        
#>   carbohyd (mean (SD))  193.79 (72.18) 203.36 (76.05)  0.190
#>   physical.activity (%)    179 (42.2)     148 (44.6)   0.093
#>                            151 (35.6)     117 (35.2)        
#>                             94 (22.2)      67 (20.2)        
#>   bmi.cat (%)                7 ( 1.7)       4 ( 1.2)   0.250
#>                             40 ( 9.4)      44 (13.3)        
#>                            109 (25.7)      80 (24.1)        
#>                            133 (31.4)      82 (24.7)        
#>                             73 (17.2)      63 (19.0)        
#>                             62 (14.6)      59 (17.8)        
#>   htn (%)                  123 (29.0)     110 (33.1)   0.052
#>                            301 (71.0)     222 (66.9)        
#>   macrovascular (%)        352 (83.0)     266 (80.1)   0.042
#>                             72 (17.0)      66 (19.9)        
#>   insulin (%)              340 (80.2)     252 (75.9)   0.094
#>                             84 (19.8)      80 (24.1)        
#>   survey.cycle (%)          42 ( 9.9)      46 (13.9)   0.305
#>                             53 (12.5)      48 (14.5)        
#>                             63 (14.9)      46 (13.9)        
#>                             59 (13.9)      43 (13.0)        
#>                            111 (26.2)      82 (24.7)        
#>                             96 (22.6)      67 (20.2)        
#> 
#> $Male
#>                        Stratified by caff
#>                         level                     No consumption <100 mg/day   
#>   n                                                  168            707        
#>   age (mean (SD))                                  60.95 (12.68)  62.86 (12.88)
#>   race (%)              Non-Hispanic White            35 (20.8)     238 (33.7) 
#>                         Non-Hispanic Black            78 (46.4)     209 (29.6) 
#>                         Mexican American              37 (22.0)     190 (26.9) 
#>                         Other Hispanic                13 ( 7.7)      48 ( 6.8) 
#>                         Other race                     5 ( 3.0)      22 ( 3.1) 
#>   education (%)         Less than 9th grade           50 (29.8)     194 (27.4) 
#>                         9-11th grade                  41 (24.4)     135 (19.1) 
#>                         High school grade             31 (18.5)     145 (20.5) 
#>                         Some college                  30 (17.9)     136 (19.2) 
#>                         College graduate or above     16 ( 9.5)      97 (13.7) 
#>   smoking (%)           Never smoker                  72 (42.9)     281 (39.7) 
#>                         Current smoker                34 (20.2)     155 (21.9) 
#>                         Former smoker                 62 (36.9)     271 (38.3) 
#>   alcohol (%)           No consumption               131 (78.0)     539 (76.2) 
#>                         <20 grams/day                 16 ( 9.5)      96 (13.6) 
#>                         20+ grams/day                 21 (12.5)      72 (10.2) 
#>   carbohyd (mean (SD))                            195.01 (85.93) 213.00 (80.40)
#>   physical.activity (%) Low                           63 (37.5)     230 (32.5) 
#>                         Intermediate                  54 (32.1)     284 (40.2) 
#>                         High                          51 (30.4)     193 (27.3) 
#>   bmi.cat (%)           <20.0                          5 ( 3.0)       4 ( 0.6) 
#>                         20.0 to <25.0                 26 (15.5)      95 (13.4) 
#>                         25.0 to <30.0                 53 (31.5)     276 (39.0) 
#>                         30.0 to <35.0                 46 (27.4)     185 (26.2) 
#>                         35.0 to <40.0                 21 (12.5)      95 (13.4) 
#>                         40.0+                         17 (10.1)      52 ( 7.4) 
#>   htn (%)               No                            50 (29.8)     279 (39.5) 
#>                         Yes                          118 (70.2)     428 (60.5) 
#>   macrovascular (%)     No                           120 (71.4)     526 (74.4) 
#>                         Yes                           48 (28.6)     181 (25.6) 
#>   insulin (%)           No                           135 (80.4)     571 (80.8) 
#>                         Yes                           33 (19.6)     136 (19.2) 
#>   survey.cycle (%)      1999-00                       36 (21.4)      72 (10.2) 
#>                         2001-02                       28 (16.7)      92 (13.0) 
#>                         2003-04                       21 (12.5)     104 (14.7) 
#>                         2005-06                       18 (10.7)     115 (16.3) 
#>                         2007-08                       36 (21.4)     168 (23.8) 
#>                         2009-10                       29 (17.3)     156 (22.1) 
#>                        Stratified by caff
#>                         100-200 mg/day 200+ mg/day    SMD   
#>   n                        461            569               
#>   age (mean (SD))        61.67 (13.66)  61.70 (12.40)  0.074
#>   race (%)                 214 (46.4)     344 (60.5)   0.575
#>                             92 (20.0)      67 (11.8)        
#>                             90 (19.5)     115 (20.2)        
#>                             54 (11.7)      26 ( 4.6)        
#>                             11 ( 2.4)      17 ( 3.0)        
#>   education (%)             98 (21.3)      97 (17.0)   0.269
#>                             79 (17.1)      95 (16.7)        
#>                            104 (22.6)     125 (22.0)        
#>                            100 (21.7)     161 (28.3)        
#>                             80 (17.4)      91 (16.0)        
#>   smoking (%)              162 (35.1)     153 (26.9)   0.216
#>                            109 (23.6)     196 (34.4)        
#>                            190 (41.2)     220 (38.7)        
#>   alcohol (%)              349 (75.7)     427 (75.0)   0.080
#>                             63 (13.7)      80 (14.1)        
#>                             49 (10.6)      62 (10.9)        
#>   carbohyd (mean (SD))  225.58 (87.51) 237.26 (85.11)  0.273
#>   physical.activity (%)    153 (33.2)     195 (34.3)   0.099
#>                            177 (38.4)     199 (35.0)        
#>                            131 (28.4)     175 (30.8)        
#>   bmi.cat (%)                4 ( 0.9)       4 ( 0.7)   0.181
#>                             61 (13.2)      80 (14.1)        
#>                            169 (36.7)     187 (32.9)        
#>                            132 (28.6)     146 (25.7)        
#>                             64 (13.9)      95 (16.7)        
#>                             31 ( 6.7)      57 (10.0)        
#>   htn (%)                  190 (41.2)     234 (41.1)   0.126
#>                            271 (58.8)     335 (58.9)        
#>   macrovascular (%)        353 (76.6)     412 (72.4)   0.066
#>                            108 (23.4)     157 (27.6)        
#>   insulin (%)              375 (81.3)     449 (78.9)   0.032
#>                             86 (18.7)     120 (21.1)        
#>   survey.cycle (%)          48 (10.4)      64 (11.2)   0.284
#>                             59 (12.8)      69 (12.1)        
#>                             48 (10.4)     106 (18.6)        
#>                             70 (15.2)      65 (11.4)        
#>                            112 (24.3)     130 (22.8)        
#>                            124 (26.9)     135 (23.7)

Survey design

The paper analyzed the data separately for males and females. Let us create the survey design:

# Revised weight - weight divided by 6 cycles
dat.full$svy.weight <- dat.full$survey.weight/6 
dat$svy.weight <- dat$survey.weight/6 
summary(dat$svy.weight)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   223.2  1760.2  3349.3  4555.9  5817.8 25347.4

# Create an indicator variable
dat.full$miss <- 1
dat.full$miss[dat.full$id %in% dat$id] <- 0

# Set up the design
w.design0 <- svydesign(strata = ~strata, id = ~psu, weights = ~svy.weight, 
                       data = dat.full, nest = TRUE)

# Subset the design
w.design1 <- subset(w.design0, miss == 0)

# Subset the design for females
w.design.f <- subset(w.design1, sex == "Female")
dim(w.design.f)
#> [1] 1875   28

# Subset the design for males
w.design.m <- subset(w.design1, sex == "Male")
dim(w.design.m)
#> [1] 1905   28

Kaplan-Meier plot

Let us create the Kaplan-Meier plot for males and females:

# KM for females
fit0.f <- svykm(Surv(stime, status) ~ caff, design = w.design.f)
plot(fit0.f)


# KM for males
fit0.m <- svykm(Surv(stime, status) ~ caff, design = w.design.m)
plot(fit0.m)

Cox PH

Now, we will fit the Cox proportional hazards model, separately for males and females. Let us run the unadjusted model first.

# Unadjusted Cox PH for females
fit1.f <- svycoxph(Surv(stime, status) ~ caff, design = w.design.f)
publish(fit1.f)
#> Stratified 1 - level Cluster Sampling design (with replacement)
#> With (180) clusters.
#> subset(w.design1, sex == "Female")
#>  Variable          Units HazardRatio       CI.95   p-value 
#>      caff No consumption         Ref                       
#>              <100 mg/day        0.80 [0.62;1.03]   0.08543 
#>           100-200 mg/day        0.58 [0.42;0.80]   < 0.001 
#>              200+ mg/day        0.61 [0.43;0.86]   0.00507

# Unadjusted Cox PH for males
fit1.m <- svycoxph(Surv(stime, status) ~ caff, design = w.design.m)
publish(fit1.m)
#> Stratified 1 - level Cluster Sampling design (with replacement)
#> With (180) clusters.
#> subset(w.design1, sex == "Male")
#>  Variable          Units HazardRatio       CI.95 p-value 
#>      caff No consumption         Ref                     
#>              <100 mg/day        1.20 [0.88;1.63]   0.246 
#>           100-200 mg/day        1.10 [0.75;1.59]   0.632 
#>              200+ mg/day        1.10 [0.76;1.58]   0.607

Now, we will fit the Cox PH model, adjusting for covariates.

# Covariate adjusted Cox PH for females
fit2.f <- svycoxph(Surv(stime, status) ~ caff + age + race + education + smoking + 
                     alcohol +  carbohyd + physical.activity + bmi.cat + htn +
                     macrovascular + insulin + survey.cycle, design = w.design.f)
publish(fit2.f)
#> Stratified 1 - level Cluster Sampling design (with replacement)
#> With (180) clusters.
#> subset(w.design1, sex == "Female")
#>           Variable                     Units HazardRatio       CI.95   p-value 
#>               caff            No consumption         Ref                       
#>                                  <100 mg/day        0.62 [0.46;0.84]   0.00166 
#>                               100-200 mg/day        0.44 [0.30;0.64]   < 0.001 
#>                                  200+ mg/day        0.43 [0.28;0.66]   < 0.001 
#>                age                                  1.08 [1.07;1.09]   < 0.001 
#>               race        Non-Hispanic White         Ref                       
#>                           Non-Hispanic Black        0.81 [0.65;1.01]   0.06322 
#>                             Mexican American        0.79 [0.61;1.03]   0.07976 
#>                               Other Hispanic        0.65 [0.41;1.01]   0.05634 
#>                                   Other race        0.64 [0.38;1.08]   0.09394 
#>          education       Less than 9th grade         Ref                       
#>                                 9-11th grade        1.15 [0.84;1.57]   0.37396 
#>                            High school grade        1.03 [0.75;1.41]   0.84759 
#>                                 Some college        1.06 [0.79;1.44]   0.69484 
#>                    College graduate or above        0.61 [0.40;0.93]   0.02192 
#>            smoking              Never smoker         Ref                       
#>                               Current smoker        1.84 [1.42;2.38]   < 0.001 
#>                                Former smoker        1.17 [0.95;1.46]   0.14379 
#>            alcohol            No consumption         Ref                       
#>                                <20 grams/day        0.94 [0.65;1.36]   0.75056 
#>                                20+ grams/day        1.05 [0.58;1.93]   0.86786 
#>           carbohyd                                  1.00 [1.00;1.00]   0.69680 
#>  physical.activity                       Low         Ref                       
#>                                 Intermediate        0.73 [0.57;0.92]   0.00834 
#>                                         High        0.66 [0.49;0.89]   0.00638 
#>            bmi.cat                     <20.0         Ref                       
#>                                20.0 to <25.0        0.39 [0.21;0.73]   0.00301 
#>                                25.0 to <30.0        0.33 [0.19;0.57]   < 0.001 
#>                                30.0 to <35.0        0.30 [0.17;0.54]   < 0.001 
#>                                35.0 to <40.0        0.26 [0.13;0.49]   < 0.001 
#>                                        40.0+        0.31 [0.16;0.62]   < 0.001 
#>                htn                        No         Ref                       
#>                                          Yes        1.00 [0.80;1.24]   0.98084 
#>      macrovascular                        No         Ref                       
#>                                          Yes        1.85 [1.48;2.32]   < 0.001 
#>            insulin                        No         Ref                       
#>                                          Yes        1.87 [1.50;2.32]   < 0.001 
#>       survey.cycle                   1999-00         Ref                       
#>                                      2001-02        0.58 [0.43;0.78]   < 0.001 
#>                                      2003-04        0.75 [0.54;1.04]   0.08384 
#>                                      2005-06        0.81 [0.57;1.15]   0.24147 
#>                                      2007-08        0.88 [0.60;1.28]   0.49860 
#>                                      2009-10        0.76 [0.56;1.04]   0.08292

# Covariate adjusted Cox PH for males
fit2.m <- svycoxph(Surv(stime, status) ~ caff + age + race + education + smoking + 
                     alcohol +  carbohyd + physical.activity + bmi.cat + htn +
                     macrovascular + insulin + survey.cycle, design = w.design.m)
publish(fit2.m)
#> Stratified 1 - level Cluster Sampling design (with replacement)
#> With (180) clusters.
#> subset(w.design1, sex == "Male")
#>           Variable                     Units HazardRatio       CI.95   p-value 
#>               caff            No consumption         Ref                       
#>                                  <100 mg/day        1.06 [0.75;1.49]   0.74199 
#>                               100-200 mg/day        1.04 [0.69;1.57]   0.84938 
#>                                  200+ mg/day        1.05 [0.71;1.54]   0.81249 
#>                age                                  1.08 [1.06;1.09]   < 0.001 
#>               race        Non-Hispanic White         Ref                       
#>                           Non-Hispanic Black        0.75 [0.61;0.91]   0.00423 
#>                             Mexican American        0.68 [0.52;0.88]   0.00348 
#>                               Other Hispanic        0.83 [0.51;1.36]   0.46367 
#>                                   Other race        1.03 [0.60;1.76]   0.91073 
#>          education       Less than 9th grade         Ref                       
#>                                 9-11th grade        1.47 [1.10;1.96]   0.00846 
#>                            High school grade        1.03 [0.80;1.33]   0.81768 
#>                                 Some college        1.02 [0.76;1.36]   0.91606 
#>                    College graduate or above        0.68 [0.47;0.98]   0.03833 
#>            smoking              Never smoker         Ref                       
#>                               Current smoker        1.90 [1.38;2.62]   < 0.001 
#>                                Former smoker        1.03 [0.85;1.26]   0.74101 
#>            alcohol            No consumption         Ref                       
#>                                <20 grams/day        0.95 [0.74;1.21]   0.67612 
#>                                20+ grams/day        1.14 [0.83;1.58]   0.42116 
#>           carbohyd                                  1.00 [1.00;1.00]   0.37461 
#>  physical.activity                       Low         Ref                       
#>                                 Intermediate        0.64 [0.52;0.80]   < 0.001 
#>                                         High        0.69 [0.56;0.87]   0.00137 
#>            bmi.cat                     <20.0         Ref                       
#>                                20.0 to <25.0        0.25 [0.11;0.58]   0.00134 
#>                                25.0 to <30.0        0.20 [0.09;0.48]   < 0.001 
#>                                30.0 to <35.0        0.19 [0.08;0.45]   < 0.001 
#>                                35.0 to <40.0        0.25 [0.10;0.64]   0.00380 
#>                                        40.0+        0.26 [0.11;0.60]   0.00160 
#>                htn                        No         Ref                       
#>                                          Yes        1.18 [0.98;1.42]   0.08442 
#>      macrovascular                        No         Ref                       
#>                                          Yes        1.40 [1.17;1.66]   < 0.001 
#>            insulin                        No         Ref                       
#>                                          Yes        1.48 [1.21;1.81]   < 0.001 
#>       survey.cycle                   1999-00         Ref                       
#>                                      2001-02        1.37 [1.04;1.80]   0.02569 
#>                                      2003-04        0.98 [0.68;1.43]   0.93245 
#>                                      2005-06        1.07 [0.77;1.48]   0.67918 
#>                                      2007-08        1.05 [0.78;1.41]   0.75981 
#>                                      2009-10        0.83 [0.58;1.18]   0.29625

The adjusted results are approximately the same as in Table 2 of the article. Caffeine consumption was associated with mortality among women but not among men.

PH assumption

Now, we will check the proportional hazard assumption.

# PH assumption among females
cox.zph(fit2.f)
#>                      chisq df    p
#> caff              3.65e-03  3 1.00
#> age               2.15e-06  1 1.00
#> race              2.87e-03  4 1.00
#> education         2.42e-03  4 1.00
#> smoking           2.44e-03  2 1.00
#> alcohol           1.14e-03  2 1.00
#> carbohyd          7.38e-04  1 0.98
#> physical.activity 3.34e-03  2 1.00
#> bmi.cat           2.21e-03  5 1.00
#> htn               1.33e-03  1 0.97
#> macrovascular     4.24e-04  1 0.98
#> insulin           2.26e-04  1 0.99
#> survey.cycle      3.49e-03  5 1.00
#> GLOBAL            2.55e-02 32 1.00

# PH assumption among males
cox.zph(fit2.m)
#>                      chisq df    p
#> caff              2.70e-03  3 1.00
#> age               2.45e-03  1 0.96
#> race              2.40e-03  4 1.00
#> education         1.13e-03  4 1.00
#> smoking           1.67e-03  2 1.00
#> alcohol           1.50e-03  2 1.00
#> carbohyd          1.33e-03  1 0.97
#> physical.activity 7.73e-03  2 1.00
#> bmi.cat           3.60e-03  5 1.00
#> htn               4.45e-06  1 1.00
#> macrovascular     4.32e-06  1 1.00
#> insulin           9.24e-05  1 0.99
#> survey.cycle      2.21e-03  5 1.00
#> GLOBAL            2.78e-02 32 1.00

The large p-values indicate that the proportional hazard assumption was met for both models.