1  Data to Analyze

To answer the research question “Does obesity increase the risk of developing diabetes?” in the U.S. context, we do the following:

1.1 Choose a U.S. data source

1.2 Confounder identification

Directed acyclic graph (DAG)

flowchart TB
  A[Obesity A] --> Y(Diabetes Y)
  L[Confounders C] --> Y
  L --> A

Hypothesized Directed acyclic graph drawn based on analyst’s best understanding of the literature

Exposure: Being obese

Outcome: Developing diabetes

Confounders: Demographic and lab variables

1.3 Identify measured and unmeasured variables in the data

Find variables capturing the following concepts in the data based on a hypothesized DAG.

Role Data Component Variables considered based on DAG
Outcome DIQ Have diabetes1
Exposure BMX Obese; BMI >= 30
Confounder (demographic) DEMO Age, Sex, Education, Race/ethnicity, Marital status, Annual household income, County of birth, Survey cycle year
(behaviour) SMQ, PAQ, SLQ, DBQ Smoking2, Vigorous work activity, Sleep3, Diet4
(health history / access) DIQ, HUQ Diabetes family history, Access to care5
(lab) BPX, BPQ, BIOPRO Blood pressure (systolic, diastolic6), Cholesterol, Uric acid, Total Protein, Total Bilirubin, Phosphorus, Sodium, Potassium, Globulin, Total Calcium
  • 14 demographic, behavioral, health history related variables
    • Mostly categorical
  • 11 lab variables
    • Mostly continuous

1.4 Analytic data

3 cycles of NHANES datasets were merged:

flowchart LR
  A[NHANES] --> C1(2013-2014 cycle) --> ss1(10,175 \nparticipants)
  A --> C2(2015-2016 cycle) --> ss2(9,971 \nparticipants)
  A --> C3(2017-2018 cycle) --> ss3(9,254 \nparticipants)
  ss1 --> ss(7,585 \nafter \nimposing \neligibility \ncriteria)
  ss2 --> ss
  ss3 --> ss
  style A fill:#FFA500;
  style C1 fill:#FFA500;
  style C2 fill:#FFA500;
  style C3 fill:#FFA500;
  style ss1 fill:#FFA500;
  style ss2 fill:#FFA500;
  style ss3 fill:#FFA500;
  style ss fill:#FFA500;

Our study population was restricted to the U.S. population who were

  • 20 years or older and
  • not pregnant at the time of survey data collection, and
  • who had available International Classification of Diseases (ICD) codes to ensure we can extract sufficient proxy information for the analysis (discussed next page).

  1. combination of (a) Doctor told you have diabetes, (b) Taking insulin now, (c) Take diabetic pills to lower blood sugar.↩︎

  2. cigarette use (at least 100 cigarettes in life)↩︎

  3. Sleep hours/workdays↩︎

  4. How healthy is the diet↩︎

  5. Routine place to go for healthcare↩︎

  6. average of 4 measurements↩︎