load("data/analytic13recoded.RData")
load("data/analytic15recoded.RData")
load("data/analytic17recoded.RData")
34 Merge three cycles
34.1 Analytic dataset
34.1.1 Load 2013-18 datasets
34.1.2 Merge 2013-18 datasets
# adults aged 20 years or more
<- rbind(analytic13, analytic15, analytic17)
data.merged0 dim(data.merged0)
#> [1] 17057 34
<- droplevels(data.merged0) data.merged
34.1.3 Check missingness
plot_missing(data.merged)
# profile_missing(data.merged)
dim(data.merged)
#> [1] 17057 34
The data contants variables with some missing information.
<- na.omit(data.merged)
data.complete dim(data.complete)
#> [1] 6850 34
- Only complete cases retained, and survey features/weights were ignored for simplicity.
- In a realistic analysis, we would consider the missingness pattern before deleting or imputing such information.
34.2 Summary statistics
No (N=4291) |
Yes (N=2559) |
Overall (N=6850) |
|
---|---|---|---|
age.cat | |||
20-49 | 2208 (51.5%) | 1227 (47.9%) | 3435 (50.1%) |
50-64 | 1085 (25.3%) | 767 (30.0%) | 1852 (27.0%) |
65+ | 998 (23.3%) | 565 (22.1%) | 1563 (22.8%) |
sex | |||
Male | 2086 (48.6%) | 1106 (43.2%) | 3192 (46.6%) |
Female | 2205 (51.4%) | 1453 (56.8%) | 3658 (53.4%) |
education | |||
Less than high school | 597 (13.9%) | 419 (16.4%) | 1016 (14.8%) |
High school | 1809 (42.2%) | 1375 (53.7%) | 3184 (46.5%) |
College graduate or above | 1885 (43.9%) | 765 (29.9%) | 2650 (38.7%) |
race | |||
White | 1496 (34.9%) | 932 (36.4%) | 2428 (35.4%) |
Black | 583 (13.6%) | 581 (22.7%) | 1164 (17.0%) |
Hispanic | 955 (22.3%) | 763 (29.8%) | 1718 (25.1%) |
Others | 1257 (29.3%) | 283 (11.1%) | 1540 (22.5%) |
marital | |||
Never married | 757 (17.6%) | 408 (15.9%) | 1165 (17.0%) |
Married/with partner | 2756 (64.2%) | 1533 (59.9%) | 4289 (62.6%) |
Other | 778 (18.1%) | 618 (24.2%) | 1396 (20.4%) |
income | |||
less than $20,000 | 668 (15.6%) | 443 (17.3%) | 1111 (16.2%) |
$20,000 to $74,999 | 1955 (45.6%) | 1353 (52.9%) | 3308 (48.3%) |
$75,000 and Over | 1668 (38.9%) | 763 (29.8%) | 2431 (35.5%) |
born | |||
Born in US | 2269 (52.9%) | 1745 (68.2%) | 4014 (58.6%) |
Other place | 2022 (47.1%) | 814 (31.8%) | 2836 (41.4%) |
year | |||
NHANES 2013-2014 public release | 1976 (46.0%) | 1100 (43.0%) | 3076 (44.9%) |
NHANES 2015-2016 public release | 740 (17.2%) | 337 (13.2%) | 1077 (15.7%) |
NHANES 2017-2018 public release | 1575 (36.7%) | 1122 (43.8%) | 2697 (39.4%) |
diabetes.family.history | |||
No | 3656 (85.2%) | 1971 (77.0%) | 5627 (82.1%) |
Yes | 635 (14.8%) | 588 (23.0%) | 1223 (17.9%) |
smoking | |||
Never smoker | 2760 (64.3%) | 1591 (62.2%) | 4351 (63.5%) |
Previous smoker | 917 (21.4%) | 636 (24.9%) | 1553 (22.7%) |
Current smoker | 614 (14.3%) | 332 (13.0%) | 946 (13.8%) |
diet.healthy | |||
Poor or fair | 876 (20.4%) | 1006 (39.3%) | 1882 (27.5%) |
Good | 1747 (40.7%) | 1039 (40.6%) | 2786 (40.7%) |
Very good or excellent | 1668 (38.9%) | 514 (20.1%) | 2182 (31.9%) |
physical.activity | |||
No | 3590 (83.7%) | 2007 (78.4%) | 5597 (81.7%) |
Yes | 701 (16.3%) | 552 (21.6%) | 1253 (18.3%) |
medical.access | |||
No | 767 (17.9%) | 319 (12.5%) | 1086 (15.9%) |
Yes | 3524 (82.1%) | 2240 (87.5%) | 5764 (84.1%) |
sleep | |||
Mean (SD) | 7.32 (1.42) | 7.21 (1.54) | 7.28 (1.47) |
Median [Min, Max] | 7.00 [2.00, 14.0] | 7.00 [2.00, 14.0] | 7.00 [2.00, 14.0] |
systolicBP | |||
Mean (SD) | 122 (18.2) | 127 (17.4) | 124 (18.1) |
Median [Min, Max] | 118 [64.7, 229] | 125 [74.0, 212] | 121 [64.7, 229] |
diastolicBP | |||
Mean (SD) | 70.2 (11.1) | 72.8 (11.5) | 71.2 (11.3) |
Median [Min, Max] | 70.7 [12.0, 123] | 72.7 [26.0, 124] | 71.3 [12.0, 124] |
uric.acid | |||
Mean (SD) | 5.19 (1.36) | 5.74 (1.48) | 5.39 (1.43) |
Median [Min, Max] | 5.10 [1.10, 12.3] | 5.60 [2.10, 13.3] | 5.30 [1.10, 13.3] |
protein.total | |||
Mean (SD) | 7.14 (0.454) | 7.10 (0.443) | 7.12 (0.450) |
Median [Min, Max] | 7.10 [4.70, 10.2] | 7.10 [5.40, 9.10] | 7.10 [4.70, 10.2] |
bilirubin.total | |||
Mean (SD) | 0.594 (0.307) | 0.513 (0.304) | 0.564 (0.308) |
Median [Min, Max] | 0.500 [0, 3.30] | 0.500 [0, 7.10] | 0.500 [0, 7.10] |
phosphorus | |||
Mean (SD) | 3.73 (0.545) | 3.66 (0.575) | 3.70 (0.557) |
Median [Min, Max] | 3.70 [2.00, 6.10] | 3.60 [1.80, 8.90] | 3.70 [1.80, 8.90] |
sodium | |||
Mean (SD) | 140 (2.45) | 140 (2.58) | 140 (2.50) |
Median [Min, Max] | 140 [124, 150] | 140 [121, 154] | 140 [121, 154] |
potassium | |||
Mean (SD) | 4.01 (0.358) | 4.04 (0.363) | 4.02 (0.360) |
Median [Min, Max] | 4.00 [2.80, 6.00] | 4.00 [2.80, 6.60] | 4.00 [2.80, 6.60] |
globulin | |||
Mean (SD) | 2.88 (0.438) | 3.02 (0.450) | 2.93 (0.448) |
Median [Min, Max] | 2.80 [1.60, 6.50] | 3.00 [1.40, 5.20] | 2.90 [1.40, 6.50] |
calcium.total | |||
Mean (SD) | 9.39 (0.364) | 9.32 (0.381) | 9.36 (0.371) |
Median [Min, Max] | 9.40 [6.40, 14.8] | 9.30 [6.60, 12.0] | 9.40 [6.40, 14.8] |
high.cholesterol | |||
No | 2833 (66.0%) | 1504 (58.8%) | 4337 (63.3%) |
Yes | 1458 (34.0%) | 1055 (41.2%) | 2513 (36.7%) |
- Investigator specified covariates stratified by the exposure (obesity)
- This Table includes information about participants with and without ICD-10-CM proxy information. Therefore, the sample is is larger than the original analysis.
34.3 Proxy data from ICD10 codes
<- rbind(rx2013, rx2015, rx2017)
dat.proxy.long $icd10 <- NULL
dat.proxy.long# Rename 3 digits ICD-10 codes as icd10
colnames(dat.proxy.long)[names(dat.proxy.long)=="icd10.new"] <- "icd10"
We combine all of the ICD-10-CM information form all 3 cycles.
34.4 Save dataset for later use
save(data.merged,
data.complete,
dat.proxy.long, file = "data/analytic3cycles.RData")