Exercise 1 Solution (R)
Assignment Setup
The following lab builds upon the RHC data set used in Lab Assignment #1 and explores the marginal and conditional effects of right-heart-catheteriziation treatment (swang1) on mortality (death).
We begin by loading the RHC data set from Lab #1…
We then apply the following data-wrangling procedures, which we implemented in Lab #1:
- Generate a variable 
age.catthat convertsageinto a factor with levels[0, 50),[50, 60),[60, 70),[70, 80), and[80, Inf). - Convert 
raceinto a factor with levelswhite,black, andother. - Convert 
sexinto a factor with levelsMaleandFemale. - Convert 
cat1into a factor with levelsARF,CHF,MOSF, andOther. - Convert 
cainto a factor with levelsNone,Localized (Yes), andMetastatic. - Generate a variable called 
n.comorbiditiesthat counts the total number of the following comorbidities per person:cardiohx,chfhx,dementhx,psychhx,chrpulhx,renalhx,liverhx,gibledhx,malighx,immunhx,transhx,amihx. - Convert 
swang1into a factor with levelsNo RHCandRHC. - Convert 
deathinto a logical vector with FALSE/TRUE values. 
# Generate age.cat
df$age.cat <- cut(df$age, breaks = c(0, 50, 60, 70, 80, Inf), 
                  include.lowest = TRUE, right = FALSE)
# Factor `race`
df$race <- factor(x = df$race, levels = c('white', 'black', 'other'))
# Factor `sex`
df$sex <- factor(x = df$sex, levels = c('Male', 'Female'))
# Factor `cat1`
df$cat1 <- factor(df$cat1)
levels(df$cat1) = list(ARF = 'ARF',
                       CHF = 'CHF',
                       MOSF = c('MOSF w/Malignancy', 'MOSF w/Sepsis'),
                       Other = c('Cirrhosis', 'Colon Cancer', 'Coma', 
                                 'COPD', 'Lung Cancer'))
# Factoring `ca`
df$ca <- factor(x = df$ca, 
                levels = c('No', 'Yes', 'Metastatic'),
                labels = c('None', 'Localized (Yes)', 'Metastatic'))
# Generate n.comorbiditis
df$n.comorbidities <- rowSums(x = subset(df, select = cardiohx:amihx), na.rm = TRUE)
# Factor `swang1`
df$swang1 <- factor(x = df$swang1, levels = c('No RHC', 'RHC'))
# Convert `death` into a logical vector
df$death <- factor(df$death, levels = c('No', 'Yes'), labels = c('FALSE', 'TRUE'))
df$death <- as.logical(df$death)Problem 1: Summarizing Analytic Dataset
1(a): Generating the analytic dataset
Generate an analytic dataset called df.analytic according to the following parameters:
- Contains the variables 
age.cat,sex,race,cat1,ca,dnr1,aps1,surv2md1,n.comorbidities,adld3p,das2d3pc,temp1,hrt1,meanbp1,resp1,wblc1,pafi1,paco21,ph1,crea1,alb1,scoma1,swang1,death - Subset to complete case observations
 
NOTE. The resulting analytic dataset should have 24 columns and 1439 rows.
# Generate df.analytic
df.analytic <- 
  df |>
  subset(select = c(age.cat, sex, race, cat1, ca, dnr1, aps1, surv2md1, n.comorbidities,
                    adld3p, das2d3pc, temp1, hrt1, meanbp1, resp1, wblc1, pafi1, paco21,
                    ph1, crea1, alb1, scoma1, swang1, death)) |>
  na.omit()
# Verify the number of columns and rows within df.analytic
dim(df.analytic)
#> [1] 1439   241(b): Generating a descriptive Table 1.
Using df.analytic, produce a descriptive Table 1 that matches the following sample table:
| Overall | FALSE | TRUE | |
|---|---|---|---|
| n | 1439 | 733 | 706 | 
| adld3p (mean (SD)) | 1.18 (1.82) | 0.96 (1.68) | 1.41 (1.93) | 
| age.cat (%) | |||
| [0,50) | 377 (26.2) | 253 (34.5) | 124 (17.6) | 
| [50,60) | 245 (17.0) | 115 (15.7) | 130 (18.4) | 
| [60,70) | 360 (25.0) | 158 (21.6) | 202 (28.6) | 
| [70,80) | 308 (21.4) | 144 (19.6) | 164 (23.2) | 
| [80,Inf] | 149 (10.4) | 63 ( 8.6) | 86 (12.2) | 
| alb1 (mean (SD)) | 3.24 (0.64) | 3.22 (0.66) | 3.25 (0.62) | 
| aps1 (mean (SD)) | 48.63 (17.32) | 47.51 (17.24) | 49.80 (17.32) | 
| ca (%) | |||
| None | 1121 (77.9) | 629 (85.8) | 492 (69.7) | 
| Localized (Yes) | 217 (15.1) | 88 (12.0) | 129 (18.3) | 
| Metastatic | 101 ( 7.0) | 16 ( 2.2) | 85 (12.0) | 
| cat1 (%) | |||
| ARF | 556 (38.6) | 331 (45.2) | 225 (31.9) | 
| CHF | 303 (21.1) | 134 (18.3) | 169 (23.9) | 
| MOSF | 290 (20.2) | 136 (18.6) | 154 (21.8) | 
| Other | 290 (20.2) | 132 (18.0) | 158 (22.4) | 
| crea1 (mean (SD)) | 2.08 (2.21) | 1.96 (2.08) | 2.21 (2.34) | 
| das2d3pc (mean (SD)) | 20.36 (7.19) | 21.75 (7.63) | 18.91 (6.39) | 
| dnr1 = Yes (%) | 98 ( 6.8) | 26 ( 3.5) | 72 (10.2) | 
| hrt1 (mean (SD)) | 111.26 (38.50) | 112.14 (38.18) | 110.36 (38.84) | 
| meanbp1 (mean (SD)) | 82.90 (37.49) | 86.51 (39.21) | 79.15 (35.25) | 
| n.comorbidities (mean (SD)) | 1.75 (1.22) | 1.51 (1.20) | 2.00 (1.19) | 
| paco21 (mean (SD)) | 40.52 (13.60) | 40.72 (14.47) | 40.31 (12.64) | 
| pafi1 (mean (SD)) | 247.64 (110.40) | 237.35 (109.56) | 258.33 (110.34) | 
| ph1 (mean (SD)) | 7.39 (0.10) | 7.39 (0.10) | 7.39 (0.09) | 
| race (%) | |||
| white | 1110 (77.1) | 564 (76.9) | 546 (77.3) | 
| black | 243 (16.9) | 129 (17.6) | 114 (16.1) | 
| other | 86 ( 6.0) | 40 ( 5.5) | 46 ( 6.5) | 
| resp1 (mean (SD)) | 29.03 (12.17) | 29.08 (12.34) | 28.98 (11.99) | 
| scoma1 (mean (SD)) | 5.60 (16.22) | 6.30 (17.77) | 4.87 (14.41) | 
| sex = Female (%) | 617 (42.9) | 336 (45.8) | 281 (39.8) | 
| surv2md1 (mean (SD)) | 0.70 (0.16) | 0.73 (0.13) | 0.66 (0.17) | 
| swang1 = RHC (%) | 390 (27.1) | 196 (26.7) | 194 (27.5) | 
| temp1 (mean (SD)) | 37.32 (1.65) | 37.52 (1.67) | 37.11 (1.60) | 
| wblc1 (mean (SD)) | 14.54 (11.71) | 14.36 (8.45) | 14.72 (14.33) | 
NOTE. Ensure that the order of all variables and the levels of all factors matches the sample table.
# Load the `tableone` package into the library
library(tableone)
# Generate the Table 1
.vars <- sort(names(df.analytic))[-9]
CreateTableOne(vars = .vars,
               strata = 'death',
               data = df.analytic,
               includeNA = TRUE,
               test = FALSE,
               addOverall = TRUE)
#>                              Stratified by death
#>                               Overall         FALSE           TRUE           
#>   n                             1439             733             706         
#>   adld3p (mean (SD))            1.18 (1.82)     0.96 (1.68)     1.41 (1.93)  
#>   age.cat (%)                                                                
#>      [0,50)                      377 (26.2)      253 (34.5)      124 (17.6)  
#>      [50,60)                     245 (17.0)      115 (15.7)      130 (18.4)  
#>      [60,70)                     360 (25.0)      158 (21.6)      202 (28.6)  
#>      [70,80)                     308 (21.4)      144 (19.6)      164 (23.2)  
#>      [80,Inf]                    149 (10.4)       63 ( 8.6)       86 (12.2)  
#>   alb1 (mean (SD))              3.24 (0.64)     3.22 (0.66)     3.25 (0.62)  
#>   aps1 (mean (SD))             48.63 (17.32)   47.51 (17.24)   49.80 (17.32) 
#>   ca (%)                                                                     
#>      None                       1121 (77.9)      629 (85.8)      492 (69.7)  
#>      Localized (Yes)             217 (15.1)       88 (12.0)      129 (18.3)  
#>      Metastatic                  101 ( 7.0)       16 ( 2.2)       85 (12.0)  
#>   cat1 (%)                                                                   
#>      ARF                         556 (38.6)      331 (45.2)      225 (31.9)  
#>      CHF                         303 (21.1)      134 (18.3)      169 (23.9)  
#>      MOSF                        290 (20.2)      136 (18.6)      154 (21.8)  
#>      Other                       290 (20.2)      132 (18.0)      158 (22.4)  
#>   crea1 (mean (SD))             2.08 (2.21)     1.96 (2.08)     2.21 (2.34)  
#>   das2d3pc (mean (SD))         20.36 (7.19)    21.75 (7.63)    18.91 (6.39)  
#>   dnr1 = Yes (%)                  98 ( 6.8)       26 ( 3.5)       72 (10.2)  
#>   hrt1 (mean (SD))            111.26 (38.50)  112.14 (38.18)  110.36 (38.84) 
#>   meanbp1 (mean (SD))          82.90 (37.49)   86.51 (39.21)   79.15 (35.25) 
#>   n.comorbidities (mean (SD))   1.75 (1.22)     1.51 (1.20)     2.00 (1.19)  
#>   paco21 (mean (SD))           40.52 (13.60)   40.72 (14.47)   40.31 (12.64) 
#>   pafi1 (mean (SD))           247.64 (110.40) 237.35 (109.56) 258.33 (110.34)
#>   ph1 (mean (SD))               7.39 (0.10)     7.39 (0.10)     7.39 (0.09)  
#>   race (%)                                                                   
#>      white                      1110 (77.1)      564 (76.9)      546 (77.3)  
#>      black                       243 (16.9)      129 (17.6)      114 (16.1)  
#>      other                        86 ( 6.0)       40 ( 5.5)       46 ( 6.5)  
#>   resp1 (mean (SD))            29.03 (12.17)   29.08 (12.34)   28.98 (11.99) 
#>   scoma1 (mean (SD))            5.60 (16.22)    6.30 (17.77)    4.87 (14.41) 
#>   sex = Female (%)               617 (42.9)      336 (45.8)      281 (39.8)  
#>   surv2md1 (mean (SD))          0.70 (0.16)     0.73 (0.13)     0.66 (0.17)  
#>   swang1 = RHC (%)               390 (27.1)      196 (26.7)      194 (27.5)  
#>   temp1 (mean (SD))            37.32 (1.65)    37.52 (1.67)    37.11 (1.60)  
#>   wblc1 (mean (SD))            14.54 (11.71)   14.36 (8.45)    14.72 (14.33)Problem 2: Crude, Conditional and Marginal Regression Models
For this next section, refer to the following examples in the Advanced Epi Methods text. Additionally, you may find it useful to review Naimi & Whitcomb (2020) as a primer on how to estimate odds ratios, risk ratios, and risk differences via generalized linear models (see Table 2 of the article).
2(a): Crude Models
Using df.analytic, estimate the crude odds ratio, risk ratio, and risk difference for the effect of swang1 (exposure) on death (outcome). Please adhere to the following instructions, and round your estimates to 3 decimal places.
- When estimating crude ORs, use a logistic model.
 
# Estimate & print crude odds ratio & summarizing model
mod.2a.1 <-
glm(death ~ swang1, 
    family = binomial(link = 'logit'),
    data = df.analytic) |>
  Publish::publish(print = FALSE, digits = 3)
mod.2a.1$regressionTable[1:2,]- When estimating crude RRs, use a Poisson model with robust SEs.
 
# Estimate & print crude risk ratio
mod.2a.2 <-
glm(death ~ swang1, 
    family = poisson(link = 'log'),
    data = df.analytic) |>
  Publish::publish(print = FALSE, confint.method = 'robust', digits = 3)
mod.2a.2$regressionTable[1:2,]- When estimating crude RDs, use a Gaussian model with robust SEs.
 
2(b): Conditional Models
Using df.analytic, estimate the conditional odds ratio, risk ratio, and risk difference for the effect of swang1 (exposure) on death (outcome). Please adhere to the following instructions, and round your estimates to 3 decimal places. Adjust for all covariates found in df.analytic.
- When estimating conditional odds ratios, use a logistic model.
 
# Estimate & print conditional odds ratio
mod.2b.1 <-
glm(death ~ swang1 + ., 
    family = binomial(link = 'logit'),
    data = df.analytic) |>
  Publish::publish(print = FALSE, digits = 3)
mod.2b.1$regressionTable[1:2,]- When estimating conditional risk ratios, use a Poisson model with robust SEs (i.e., modified Poisson regression).
 
# Estimate & print conditional risk ratio
mod.2b.2 <-
glm(death ~ swang1 + ., 
    family = poisson(link = 'log'),
    data = df.analytic) |>
  Publish::publish(print = FALSE, confint.method = 'robust', digits = 3)
mod.2b.2$regressionTable[1:2,]- When estimating conditional risk differences, use a Gaussian model with an identity link and robust SEs (i.e., linear regression with robust SEs).
 
2(c): Marginal Models
Using df.analytic, estimate the marginal odds ratio, risk ratio, and risk difference for the effect of swang1 (exposure) on death (outcome). Please adhere to the following instructions: Round your estimates to 3 decimal places. Adjust for all covariates found in df.analytic. Bootstrap could be used to estimate confidence intervals, but we won’t be calculating confidence intervals for the marginal models.
- When estimating marginal odds ratios, use a logistic model.
 
# Logistic regression model
mod_marg_or <- glm(death ~ swang1 + ., family = binomial(link = "logit"), 
                   data = df.analytic)
# Create a new dataset where everyone receives the treatment (swang1 = 1)
df.analytic_rhc <- df.analytic
df.analytic_rhc$swang1 <- "RHC"
p1 <- mean(predict(mod_marg_or, newdata = df.analytic_rhc, type = "response"))
# Create a new dataset where no one receives the treatment (swang1 = 0)
df.analytic_rhc$swang1 <- "No RHC"
p0 <- mean(predict(mod_marg_or, newdata = df.analytic_rhc, type = "response"))
# Calculate the marginal odds ratio
ORm <- (p1 / (1 - p1)) / (p0 / (1 - p0))
ORm
#> [1] 1.057679
round(ORm, 3)
#> [1] 1.058- When estimating marginal risk ratios, use a Poisson model with robust SEs (i.e., modified Poisson regression).
 
# Poisson regression model for risk ratio
mod_marg_rr <- glm(death ~ swang1 + ., family = poisson(link = "log"), 
                   data = df.analytic)
# Set swang1 = 1 for everyone
df.analytic_rhc <- df.analytic
df.analytic_rhc$swang1 <- "RHC"
p1 <- mean(predict(mod_marg_rr, newdata = df.analytic_rhc, type = "response"))
# Set swang1 = 0 for everyone
df.analytic_rhc$swang1 <- "No RHC"
p0 <- mean(predict(mod_marg_rr, newdata = df.analytic_rhc, type = "response"))
# Calculate the marginal risk ratio
RRm <- p1 / p0
RRm
#> [1] 1.029778
round(RRm, 3)
#> [1] 1.03- When estimating marginal risk differences, use a Gaussian model with an identity link and robust SEs (i.e., linear regression with robust SEs).
 
# Gaussian model for risk difference
mod_marg_rd <- glm(death ~ swang1 + ., family = gaussian(link = "identity"), 
                   data = df.analytic)
# Set swang1 = 1 for everyone
df.analytic_rhc <- df.analytic
df.analytic_rhc$swang1 <- "RHC"
p1 <- mean(predict(mod_marg_rd, newdata = df.analytic_rhc, type = "response"))
# Set swang1 = 0 for everyone
df.analytic_rhc$swang1 <- "No RHC"
p0 <- mean(predict(mod_marg_rd, newdata = df.analytic_rhc, type = "response"))
# Calculate the marginal risk difference
RDm <- p1 - p0
RDm
#> [1] 0.01323736
round(RDm, 3)
#> [1] 0.0132(d): Summarizing Models
Based upon the results from 2(a) - 2(c), complete the following table by replacing 9.999 with estimates from the corresponding models:
| Modeling Strategy | OR (95% CI) | RR (95% CI) | RD (95% CI) | 
|---|---|---|---|
| Crude Est. | |||
| swang1 [RHC] | 1.038 (0.823, 1.310) | 1.019 (0.906, 1.146) | 0.009 (-0.049, 0.067) | 
| Conditional Est. | |||
| swang1 [RHC] | 1.072 (0.805, 1.427) | 1.030 (0.913, 1.161) | 0.013 (-0.044, 0.071) | 
| Marginal Est. | |||
| swang1 [RHC] | 1.058 | 1.03 | 0.013 |