Exercise 2 (A)
Exercise: Phased Multi-Year NHANES Data Wrangling
Instructions: Use the R programming language and functions from the tidyverse
, nhanesA
, tableone
, and naniar
packages to complete this exercise. We will build up our dataset in phases, starting with a single year and expanding to multiple survey cycles.
Please knit your final R Markdown file and submit the knitted HTML or PDF document ONLY.
Setup: Load Packages
First, run the following code block to ensure the required packages are installed and loaded into your R session.
Problem 1: Import and Translate Single-Year Data
Download the Demographic (DEMO
) data for the 2013-2014 NHANES cycle, using translated = TRUE
to automatically convert coded values into text labels.
Problem 2: Add Body Measures Data for a Single Year
Download the Body Measures (BMX
) data for the same 2013-2014 cycle and merge it with the demographic data from Problem 1.
Problem 3: Import and Merge Multi-Cycle Data with Translation
Expand to multiple years, using translated = TRUE
for all downloads. Merge both the Demographic (DEMO
) and Body Measures (BMX
) data for all three NHANES cycles: 2013-2014 (H
), 2015-2016 (I
), and 2017-2018 (J
). Combine them into a single dataframe named nhanes_raw
.
Problem 4: Data Cleaning and Filtering
Using the nhanes_raw
dataset, we will now create our clean dataset. This involves filtering our population to adults and then creating our analysis variables.
- Filter for Adults: Keep only participants aged 20 years or older.
-
Rename Variables:
RIAGENDR
toSex
,RIDAGEYR
toAge
,RIDRETH3
toRaceEthnicity
,BMXBMI
toBMI
. -
Group
RaceEthnicity
: Combine “Mexican American” and “Other Hispanic” into a single “Hispanic” category. -
Create
AgeGroup
: CategorizeAge
into “20-39”, “40-59”, and “60+”. -
Create
BMICat
: CategorizeBMI
into “Underweight”, “Normal weight”, “Overweight”, and “Obese”.
# nhanes_clean <- ..
# Step 1: Filter the data to include only adults
# Step 2: Rename variables
# Create new variables
# Convert the new character RaceEthnicity to a factor with the desired level order
# Step 4: Create AgeGroup (now without NAs because we filtered)
# Step 5: Create BMICat
# Check the structure to confirm variables are correct
Problem 5: Create Final Analytic Dataset
Create a final, analysis-ready dataset named nhanes_analysis
that includes only the key variables.
Problem 6: Investigate Missing Data
Now that our data is correctly filtered and processed, let’s re-examine the missing data patterns. The missingness should be much lower.
Problem 7: Create a Descriptive Table
Finally, with the data correctly loaded and cleaned for our adult population, create the summary table of sample characteristics, stratified by Sex
.