Exercise 3 (A)
Exercise: Phased Multi-Year NHANES Data Wrangling
Instructions: Use the R programming language and functions from the tidyverse, nhanesA, tableone, and naniar packages to complete this exercise. We will build up our dataset in phases, starting with a single year and expanding to multiple survey cycles.
Please knit your final R Markdown file and submit the knitted HTML or PDF document ONLY.
Setup: Load Packages
First, run the following code block to ensure the required packages are installed and loaded into your R session.
Problem 1: Import and Translate Single-Year Data
Download the Demographic (DEMO) data for the 2013-2014 NHANES cycle, using translated = TRUE to automatically convert coded values into text labels.
Problem 2: Add Body Measures Data for a Single Year
Download the Body Measures (BMX) data for the same 2013-2014 cycle and merge it with the demographic data from Problem 1.
Problem 3: Import and Merge Multi-Cycle Data with Translation
Expand to multiple years, using translated = TRUE for all downloads. Merge both the Demographic (DEMO) and Body Measures (BMX) data for all three NHANES cycles: 2013-2014 (H), 2015-2016 (I), and 2017-2018 (J). Combine them into a single dataframe named nhanes_raw.
Problem 4: Data Cleaning and Filtering
Using the nhanes_raw dataset, we will now create our clean dataset. This involves filtering our population to adults and then creating our analysis variables.
- Filter for Adults: Keep only participants aged 20 years or older.
 - 
Rename Variables: 
RIAGENDRtoSex,RIDAGEYRtoAge,RIDRETH3toRaceEthnicity,BMXBMItoBMI. - 
Group 
RaceEthnicity: Combine “Mexican American” and “Other Hispanic” into a single “Hispanic” category. - 
Create 
AgeGroup: CategorizeAgeinto “20-39”, “40-59”, and “60+”. - 
Create 
BMICat: CategorizeBMIinto “Underweight”, “Normal weight”, “Overweight”, and “Obese”. 
# nhanes_clean <- ..
  # Step 1: Filter the data to include only adults
   
  # Step 2: Rename variables
   
  # Create new variables
   
    # Convert the new character RaceEthnicity to a factor with the desired level order
     
    
    # Step 4: Create AgeGroup (now without NAs because we filtered)
     
                   
    # Step 5: Create BMICat
     
# Check the structure to confirm variables are correctProblem 5: Create Final Analytic Dataset
Create a final, analysis-ready dataset named nhanes_analysis that includes only the key variables.
Problem 6: Investigate Missing Data
Now that our data is correctly filtered and processed, let’s re-examine the missing data patterns. The missingness should be much lower.
Problem 7: Create a Descriptive Table
Finally, with the data correctly loaded and cleaned for our adult population, create the summary table of sample characteristics, stratified by Sex.