Exercise 1 (R)

Tip

You can download all of the related files in a zip file confoundingEx.zip from Github folder, or just by clicking this link directly.

Navigate to the GitHub folder (above link) where the ZIP file is located.
Click on the file name (above zip file) to open its preview window.
Click on the Download button to download the file. If you can’t see the Download button, click on “Download Raw File” link that should appear on the page.

Problem Statement

The following lab builds upon the RHC data set used in a previous Lab Assignment and explores the marginal and conditional effects of right-heart-catheteriziation treatment (swang1) on mortality (death). We begin by loading the RHC data set that we have seen earlier.

# Import the rhc data set
df <- read.csv('Data/rhc.csv', header = TRUE)

We then apply the following data-wrangling procedures, which we implemented previously:

Generate a variable age.cat that converts age into a factor with levels [0, 50), [50, 60), [60, 70), [70, 80), and [80, Inf).
Convert race into a factor with levels white, black, and other.
Convert sex into a factor with levels Male and Female.
Convert cat1 into a factor with levels ARF, CHF, MOSF, and Other.
Convert ca into a factor with levels None, Localized (Yes), and Metastatic.
Generate a variable called n.comorbidities that counts the total number of the following comorbidities per person: cardiohx, chfhx, dementhx, psychhx, chrpulhx, renalhx, liverhx, gibledhx, malighx, immunhx, transhx, amihx.
Convert swang1 into a factor with levels No RHC and RHC.
Convert death into a logical vector with FALSE/TRUE values.

# Generate age.cat
df$age.cat <- cut(df$age, breaks = c(0, 50, 60, 70, 80, Inf), 
                  include.lowest = TRUE, right = FALSE)

# Factor `race`
df$race <- factor(x = df$race, levels = c('white', 'black', 'other'))

# Factor `sex`
df$sex <- factor(x = df$sex, levels = c('Male', 'Female'))

# Factor `cat1`
df$cat1 <- factor(df$cat1)
levels(df$cat1) = list(ARF = 'ARF',
                       CHF = 'CHF',
                       MOSF = c('MOSF w/Malignancy', 'MOSF w/Sepsis'),
                       Other = c('Cirrhosis', 'Colon Cancer', 'Coma', 
                                 'COPD', 'Lung Cancer'))

# Factoring `ca`
df$ca <- factor(x = df$ca, 
                levels = c('No', 'Yes', 'Metastatic'),
                labels = c('None', 'Localized (Yes)', 'Metastatic'))

# Generate n.comorbiditis
df$n.comorbidities <- rowSums(x = subset(df, select = cardiohx:amihx), na.rm = TRUE)

# Factor `swang1`
df$swang1 <- factor(x = df$swang1, levels = c('No RHC', 'RHC'))

# Convert `death` into a logical vector
df$death <- factor(df$death, levels = c('No', 'Yes'), labels = c('FALSE', 'TRUE'))
df$death <- as.logical(df$death)

Problem 1: Summarizing Analytic Dataset

1(a): Generating the analytic dataset

Generate an analytic dataset called df.analytic according to the following parameters:

Contains the variables age.cat, sex, race, cat1, ca, dnr1, aps1, surv2md1, n.comorbidities, adld3p, das2d3pc, temp1, hrt1, meanbp1, resp1, wblc1, pafi1, paco21, ph1, crea1, alb1, scoma1, swang1, death
Subset to complete case observations

NOTE. The resulting analytic dataset should have 24 columns and 1439 rows.

# Generate df.analytic
# df.analytic <- ... # Complete the code here

# Verify the number of columns and rows within df.analytic
#

1(b): Generating a descriptive Table 1. [20% grade]

Using df.analytic, produce a descriptive Table 1 that matches the following sample table:

	Overall	FALSE	TRUE
n	1439	733	706
adld3p (mean (SD))	1.18 (1.82)	0.96 (1.68)	1.41 (1.93)
age.cat (%)
[0,50)	377 (26.2)	253 (34.5)	124 (17.6)
[50,60)	245 (17.0)	115 (15.7)	130 (18.4)
[60,70)	360 (25.0)	158 (21.6)	202 (28.6)
[70,80)	308 (21.4)	144 (19.6)	164 (23.2)
[80,Inf]	149 (10.4)	63 ( 8.6)	86 (12.2)
alb1 (mean (SD))	3.24 (0.64)	3.22 (0.66)	3.25 (0.62)
aps1 (mean (SD))	48.63 (17.32)	47.51 (17.24)	49.80 (17.32)
ca (%)
None	1121 (77.9)	629 (85.8)	492 (69.7)
Localized (Yes)	217 (15.1)	88 (12.0)	129 (18.3)
Metastatic	101 ( 7.0)	16 ( 2.2)	85 (12.0)
cat1 (%)
ARF	556 (38.6)	331 (45.2)	225 (31.9)
CHF	303 (21.1)	134 (18.3)	169 (23.9)
MOSF	290 (20.2)	136 (18.6)	154 (21.8)
Other	290 (20.2)	132 (18.0)	158 (22.4)
crea1 (mean (SD))	2.08 (2.21)	1.96 (2.08)	2.21 (2.34)
das2d3pc (mean (SD))	20.36 (7.19)	21.75 (7.63)	18.91 (6.39)
dnr1 = Yes (%)	98 ( 6.8)	26 ( 3.5)	72 (10.2)
hrt1 (mean (SD))	111.26 (38.50)	112.14 (38.18)	110.36 (38.84)
meanbp1 (mean (SD))	82.90 (37.49)	86.51 (39.21)	79.15 (35.25)
n.comorbidities (mean (SD))	1.75 (1.22)	1.51 (1.20)	2.00 (1.19)
paco21 (mean (SD))	40.52 (13.60)	40.72 (14.47)	40.31 (12.64)
pafi1 (mean (SD))	247.64 (110.40)	237.35 (109.56)	258.33 (110.34)
ph1 (mean (SD))	7.39 (0.10)	7.39 (0.10)	7.39 (0.09)
race (%)
white	1110 (77.1)	564 (76.9)	546 (77.3)
black	243 (16.9)	129 (17.6)	114 (16.1)
other	86 ( 6.0)	40 ( 5.5)	46 ( 6.5)
resp1 (mean (SD))	29.03 (12.17)	29.08 (12.34)	28.98 (11.99)
scoma1 (mean (SD))	5.60 (16.22)	6.30 (17.77)	4.87 (14.41)
sex = Female (%)	617 (42.9)	336 (45.8)	281 (39.8)
surv2md1 (mean (SD))	0.70 (0.16)	0.73 (0.13)	0.66 (0.17)
swang1 = RHC (%)	390 (27.1)	196 (26.7)	194 (27.5)
temp1 (mean (SD))	37.32 (1.65)	37.52 (1.67)	37.11 (1.60)
wblc1 (mean (SD))	14.54 (11.71)	14.36 (8.45)	14.72 (14.33)

NOTE. Ensure that the order of all variables and the levels of all factors matches the sample table.

# Load the `tableone` package into the library
library(tableone)

# Generate the Table 1
#

Problem 2: Crude, Conditional and Marginal Regression Models

For this next section, refer to the following examples in the Advanced Epi Methods text. Additionally, you may find it useful to review Naimi & Whitcomb (2020) as a primer on how to estimate odds ratios, risk ratios, and risk differences via generalized linear models (see Table 2 of the article).

2(a): Crude Models [30% grade]

Using df.analytic, estimate the crude odds ratio, risk ratio, and risk difference for the effect of swang1 (exposure) on death (outcome). Please adhere to the following instructions, and round your estimates to 3 decimal places.

When estimating crude ORs, use a logistic model.

# Estimate & print crude odds ratio, 
# and associated confidence interval

When estimating crude RRs, use a Poisson model with robust SEs.

# Estimate & print crude risk ratio,
# and associated confidence interval

When estimating crude RDs, use a Gaussian model with robust SEs.

# Estimate & print crude risk difference, and 
# associated confidence interval

2(b): Conditional Models [20% grade]

Using df.analytic, estimate the conditional odds ratio, risk ratio, and risk difference for the effect of swang1 (exposure) on death (outcome). Please adhere to the following instructions, and round your estimates to 3 decimal places. Adjust for all covariates found in df.analytic.

When estimating conditional odds ratios, use a logistic model.

# Estimate & print conditional odds ratio, and 
# associated confidence interval

When estimating conditional risk ratios, use a Poisson model with robust SEs (i.e., modified Poisson regression).

# Estimate & print conditional risk ratio, 
# and associated confidence interval

When estimating conditional risk differences, use a Gaussian model with an identity link and robust SEs (i.e., linear regression with robust SEs).

# Estimate & print conditional risk difference, 
# and associated confidence interval

2(c): Marginal Models [30% grade]

Using df.analytic, estimate the marginal odds ratio, risk ratio, and risk difference for the effect of swang1 (exposure) on death (outcome). Please adhere to the following instructions: Round your estimates to 3 decimal places. Adjust for all covariates found in df.analytic. Bootstrap could be used to estimate confidence intervals, but we won’t be calculating confidence intervals for the marginal models.

When estimating marginal odds ratios, use a logistic model.

# Estimate & print marginal odds ratio

When estimating marginal risk ratios, use a Poisson model with robust SEs (i.e., modified Poisson regression).

# Estimate & print marginal risk ratio

When estimating marginal risk differences, use a Gaussian model with an identity link and robust SEs (i.e., linear regression with robust SEs).

# Estimate & print marginal risk difference

2(d): Summarizing Models

Based upon the results from 2(a) - 2(c), complete the following table by replacing 9.999 with estimates from the corresponding models (edit if needed/your results are different):

Modeling Strategy	OR (95% CI)	RR (95% CI)	RD (95% CI)
Crude Est.
swang1 [RHC]	1.038 (0.823, 1.310)	1.019 (0.906, 1.146)	0.009 (-0.049, 0.067)
Conditional Est.
swang1 [RHC]	1.072 (0.805, 1.427)	1.030 (0.913, 1.161)	0.013 (-0.044, 0.071)
Marginal Est.
swang1 [RHC]	1.058	1.03	0.013

2(e): Interpreting (optional)

Which of the three measure of effects are collapsible and why?

Conditional estimates are adjusted for covariates and show the effect of the exposure (RHC) accounting for confounders. Marginal estimates reflect the overall effect of the exposure across the entire population without considering stratification by covariates, providing an “average” effect.

The conditional OR (1.072) and the marginal OR (1.058) are different. This difference illustrates the non-collapsibility of the odds ratio. Even after adjusting for covariates (conditional), the OR changes when calculating the marginal estimate across the population, because the odds ratio is sensitive to how covariates interact with the exposure and outcome.
The conditional RR (1.030) and marginal RR (1.030) are identical. This indicates that the RR is collapsible because adjusting for covariates (conditional model) does not change the overall effect when you average it across the population (marginal model).
The conditional RD (0.013) and marginal RD (0.013) are also identical, further supporting that the RD is collapsible. Whether you adjust for covariates or average across the entire population, the absolute difference in risk remains the same.

Knit your file

Please knit your file once you finished and submit the knitted PDF or doc file. Please also fill-up the group member names.