2 Overview of the Study
This section provides an overview of the original study: Examining the Role of Race/Ethnicity and Sex in Modifying the Association Between Early Smoking Initiation and Mortality: A 20-Year NHANES Analysis. By outlining the studyβs motivation, key research questions, analytical approach, and main findings, this chapter sets the stage for the detailed, step-by-step reproduction of the analysis in the chapters that follow.
2.1 How to Use This Guide π
This Quarto book is designed to be a transparent and comprehensive guide for reproducing the original analysis.
- Intended Audience: This guide is for researchers, public health analysts, and students with a foundational understanding of R and statistical concepts like survival analysis.
-
Required Software: To run the code in this walkthrough, you will need R (version 4.2.2 or later) and RStudio. The code relies on several key R packages, including
nhanesA,survey,survival, andggplot2, which are loaded at the beginning of each relevant chapter. - A Note on Reproducibility: The primary goal is to allow other researchers to validate these findings and build upon them. By documenting the complete analytical pipeline, this guide supports the principles of open and reproducible science.
2.2 Motivation π§
The motive behind this study was to better understand the link between smoking initiation age and mortality across different demographic groups. Cigarette smoking remains a leading preventable cause of premature death in the U.S., accounting for over 480,000 deaths annually. Clarifying how this risk varies across populations is essential for developing targeted public health interventions that address the specific needs and risks within different communities, thereby enhancing the precision and relevance of health policies.
2.3 Research Objectives π―
This study addresses two primary objectives:
- To re-assess the relationship between the initial age of cigarette smoking and overall mortality.
- To examine how this relationship is modified by race/ethnicity and sex.
To investigate these objectives, the analysis utilizes data from U.S. adults aged 20-79 who participated in the 1999β2018 National Health and Nutrition Examination Survey (NHANES). Mortality data was provided by the National Center for Health Statistics (NCHS) through a linkage with public-use death records.
2.4 Analytical Approach π οΈ
To address the research objectives, the paper employed multiple analytic approaches. The main survival analysis directly answers the two primary research questions, followed by additional analyses to explore potential mediators and validate the main conclusions.
This tutorial will replicate the following key analyses performed in the original paper:
- Descriptive Analysis: Reproduces Appendix Tables 1 & 2 results.
- Main Survival Analysis: Establishes the primary link between smoking initiation age and mortality using Kaplan-Meier Curves: Figure 1. Cox Proportional Hazards Model results are incorporated in Figure 2 in the main paper.
- Effect Modification Analysis: It then investigates how this relationship is modified by race/ethnicity and sex. This corresponds to Figure 2 in the main paper and Appendix Tables 4 and 5 in the supplementary material.
- Exploratory Analysis: Investigates the secondary relationship between the age of smoking initiation and the total duration of smoking using boxplots. This corresponds to Appendix Figures 1-3 in the supplementary material.
- Sensitivity Analysis 1: Adjusts for socioeconomic status (SES) proxies, such as family income and education, to check for confounding. This corresponds to Appendix Figure 5 in the supplementary material.
- Sensitivity Analysis 2: Repeats the analysis on data from the 2011β2018 cycles to include the βnon-Hispanic Asianβ category, which was introduced in the 2011 NHANES survey. This corresponds to Appendix Table 3 and Appendix Figure 4 in the supplementary material.
2.5 Summary of Findings π
The final statistical analysis included 50,549 participants. The study found that early smoking initiation was significantly associated with a higher risk of all-cause mortality across all age groups, with earlier starting ages having higher hazard ratios. Furthermore, this association was observed to differ across race/ethnicity and sex, with the interaction by sex being statistically significant.