Useful Packages
This tutorial introduces a variety of methods to explore a dataset, including summary statistics, variable distributions, correlations, and handling missing data.
This tutorial uses the same dataset as previous tutorials, including working with a predictive question, machine learning with a continuous outcome, and machine learning with a binary outcome.
TableOne
TableOne
is an R package that provides a simple method to create the classic “Table 1” seen in health research papers, summarizing the characteristics of the dataset. It also offers functions like svyCreateTableOne
for survey data, allowing users to account for strata and weights, and display counts and proportions for both weighted and unweighted data.
CreateTableOne(vars = c("Disease.category", "Cancer", "Cardiovascular", "Congestive.HF",
"Dementia", "Psychiatric", "Pulmonary", "Renal", "Hepatic", "GI.Bleed", "Tumor",
"Immunosuppression", "Transfer.hx", "MI", "age", "sex", "edu", "DASIndex",
"APACHE.score", "Glasgow.Coma.Score", "blood.pressure", "WBC", "Heart.rate",
"Respiratory.rate", "Temperature", "PaO2vs.FIO2", "Albumin", "Hematocrit",
"Bilirubin", "Creatinine", "Sodium", "Potassium", "PaCo2", "PH", "Weight",
"DNR.status", "Medical.insurance", "Respiratory.Diag", "Cardiovascular.Diag",
"Neurological.Diag", "Gastrointestinal.Diag", "Renal.Diag", "Metabolic.Diag",
"Hematologic.Diag", "Sepsis.Diag", "Trauma.Diag", "Orthopedic.Diag", "race",
"income", "Length.of.Stay", "Death"),
strata = "RHC.use",
data = ObsData,
includeNA = TRUE,
test = TRUE)
#> Warning in ModuleReturnVarsExist(vars, data): The data frame does not have:
#> Immunosuppression Dropped
#> Stratified by RHC.use
#> 0 1 p test
#> n 3551 2184
#> Disease.category (%) <0.001
#> ARF 1581 (44.5) 909 (41.6)
#> CHF 247 ( 7.0) 209 ( 9.6)
#> Other 955 (26.9) 208 ( 9.5)
#> MOSF 768 (21.6) 858 (39.3)
#> Cancer (%) 0.001
#> None 2652 (74.7) 1727 (79.1)
#> Localized (Yes) 638 (18.0) 334 (15.3)
#> Metastatic 261 ( 7.4) 123 ( 5.6)
#> Cardiovascular = 1 (%) 567 (16.0) 446 (20.4) <0.001
#> Congestive.HF = 1 (%) 596 (16.8) 425 (19.5) 0.011
#> Dementia = 1 (%) 413 (11.6) 151 ( 6.9) <0.001
#> Psychiatric = 1 (%) 286 ( 8.1) 100 ( 4.6) <0.001
#> Pulmonary = 1 (%) 774 (21.8) 315 (14.4) <0.001
#> Renal = 1 (%) 149 ( 4.2) 106 ( 4.9) 0.268
#> Hepatic = 1 (%) 265 ( 7.5) 136 ( 6.2) 0.084
#> GI.Bleed = 1 (%) 131 ( 3.7) 54 ( 2.5) 0.014
#> Tumor = 1 (%) 872 (24.6) 444 (20.3) <0.001
#> Transfer.hx = 1 (%) 335 ( 9.4) 327 (15.0) <0.001
#> MI = 1 (%) 105 ( 3.0) 95 ( 4.3) 0.007
#> age (%) <0.001
#> [-Inf,50) 884 (24.9) 540 (24.7)
#> [50,60) 546 (15.4) 371 (17.0)
#> [60,70) 812 (22.9) 577 (26.4)
#> [70,80) 809 (22.8) 529 (24.2)
#> [80, Inf) 500 (14.1) 167 ( 7.6)
#> sex = Female (%) 1637 (46.1) 906 (41.5) 0.001
#> edu (mean (SD)) 11.57 (3.13) 11.86 (3.16) 0.001
#> DASIndex (mean (SD)) 20.37 (5.48) 20.70 (5.03) 0.023
#> APACHE.score (mean (SD)) 50.93 (18.81) 60.74 (20.27) <0.001
#> Glasgow.Coma.Score (mean (SD)) 22.25 (31.37) 18.97 (28.26) <0.001
#> blood.pressure (mean (SD)) 84.87 (38.87) 68.20 (34.24) <0.001
#> WBC (mean (SD)) 15.26 (11.41) 16.27 (12.55) 0.002
#> Heart.rate (mean (SD)) 112.87 (40.94) 118.93 (41.47) <0.001
#> Respiratory.rate (mean (SD)) 28.98 (13.95) 26.65 (14.17) <0.001
#> Temperature (mean (SD)) 37.63 (1.74) 37.59 (1.83) 0.429
#> PaO2vs.FIO2 (mean (SD)) 240.63 (116.66) 192.43 (105.54) <0.001
#> Albumin (mean (SD)) 3.16 (0.67) 2.98 (0.93) <0.001
#> Hematocrit (mean (SD)) 32.70 (8.79) 30.51 (7.42) <0.001
#> Bilirubin (mean (SD)) 2.00 (4.43) 2.71 (5.33) <0.001
#> Creatinine (mean (SD)) 1.92 (2.03) 2.47 (2.05) <0.001
#> Sodium (mean (SD)) 137.04 (7.68) 136.33 (7.60) 0.001
#> Potassium (mean (SD)) 4.08 (1.04) 4.05 (1.01) 0.321
#> PaCo2 (mean (SD)) 39.95 (14.24) 36.79 (10.97) <0.001
#> PH (mean (SD)) 7.39 (0.11) 7.38 (0.11) <0.001
#> Weight (mean (SD)) 65.04 (29.50) 72.36 (27.73) <0.001
#> DNR.status = Yes (%) 499 (14.1) 155 ( 7.1) <0.001
#> Medical.insurance (%) <0.001
#> Medicaid 454 (12.8) 193 ( 8.8)
#> Medicare 947 (26.7) 511 (23.4)
#> Medicare & Medicaid 251 ( 7.1) 123 ( 5.6)
#> No insurance 186 ( 5.2) 136 ( 6.2)
#> Private 967 (27.2) 731 (33.5)
#> Private & Medicare 746 (21.0) 490 (22.4)
#> Respiratory.Diag = Yes (%) 1481 (41.7) 632 (28.9) <0.001
#> Cardiovascular.Diag = Yes (%) 1007 (28.4) 924 (42.3) <0.001
#> Neurological.Diag = Yes (%) 575 (16.2) 118 ( 5.4) <0.001
#> Gastrointestinal.Diag = Yes (%) 522 (14.7) 420 (19.2) <0.001
#> Renal.Diag = Yes (%) 147 ( 4.1) 148 ( 6.8) <0.001
#> Metabolic.Diag = Yes (%) 172 ( 4.8) 93 ( 4.3) 0.337
#> Hematologic.Diag = Yes (%) 239 ( 6.7) 115 ( 5.3) 0.029
#> Sepsis.Diag = Yes (%) 515 (14.5) 516 (23.6) <0.001
#> Trauma.Diag = Yes (%) 18 ( 0.5) 34 ( 1.6) <0.001
#> Orthopedic.Diag = Yes (%) 3 ( 0.1) 4 ( 0.2) 0.516
#> race (%) 0.425
#> white 2753 (77.5) 1707 (78.2)
#> black 585 (16.5) 335 (15.3)
#> other 213 ( 6.0) 142 ( 6.5)
#> income (%) <0.001
#> $11-$25k 713 (20.1) 452 (20.7)
#> $25-$50k 500 (14.1) 393 (18.0)
#> > $50k 257 ( 7.2) 194 ( 8.9)
#> Under $11k 2081 (58.6) 1145 (52.4)
#> Length.of.Stay (mean (SD)) 19.53 (23.59) 24.86 (28.90) <0.001
#> Death (mean (SD)) 0.63 (0.48) 0.68 (0.47) <0.001
table1
The table1
package is useful for generating descriptive summary tables commonly used in medical research. Below are examples of how to use it.
Basic usage of table1
:
ObsData$RHC.use.factor <- factor(ObsData$RHC.use,
levels = c(0, 1),
labels = c("No RHC", "Received RHC"))
# Generate a basic Table 1 summarizing characteristics of ObsData, grouped by 'RHC.use'
table1(~ age + sex + APACHE.score + Medical.insurance | RHC.use.factor, data = ObsData)
No RHC (N=3551) |
Received RHC (N=2184) |
Overall (N=5735) |
|
---|---|---|---|
age | |||
[-Inf,50) | 884 (24.9%) | 540 (24.7%) | 1424 (24.8%) |
[50,60) | 546 (15.4%) | 371 (17.0%) | 917 (16.0%) |
[60,70) | 812 (22.9%) | 577 (26.4%) | 1389 (24.2%) |
[70,80) | 809 (22.8%) | 529 (24.2%) | 1338 (23.3%) |
[80, Inf) | 500 (14.1%) | 167 (7.6%) | 667 (11.6%) |
sex | |||
Male | 1914 (53.9%) | 1278 (58.5%) | 3192 (55.7%) |
Female | 1637 (46.1%) | 906 (41.5%) | 2543 (44.3%) |
APACHE.score | |||
Mean (SD) | 50.9 (18.8) | 60.7 (20.3) | 54.7 (20.0) |
Median [Min, Max] | 50.0 [3.00, 147] | 60.0 [9.00, 135] | 54.0 [3.00, 147] |
Medical.insurance | |||
Medicaid | 454 (12.8%) | 193 (8.8%) | 647 (11.3%) |
Medicare | 947 (26.7%) | 511 (23.4%) | 1458 (25.4%) |
Medicare & Medicaid | 251 (7.1%) | 123 (5.6%) | 374 (6.5%) |
No insurance | 186 (5.2%) | 136 (6.2%) | 322 (5.6%) |
Private | 967 (27.2%) | 731 (33.5%) | 1698 (29.6%) |
Private & Medicare | 746 (21.0%) | 490 (22.4%) | 1236 (21.6%) |
Customizing labels and formats
# Label variables and modify the format
labels <- list(
age = "Age (years)",
APACHE.score = "APACHE II Score",
Medical.insurance = "Medical Insurance Status"
)
table1(~ age + sex + APACHE.score + Medical.insurance | RHC.use.factor, data = ObsData,
label = labels, caption = "Table 1: Summary of patient characteristics")
No RHC (N=3551) |
Received RHC (N=2184) |
Overall (N=5735) |
|
---|---|---|---|
age | |||
[-Inf,50) | 884 (24.9%) | 540 (24.7%) | 1424 (24.8%) |
[50,60) | 546 (15.4%) | 371 (17.0%) | 917 (16.0%) |
[60,70) | 812 (22.9%) | 577 (26.4%) | 1389 (24.2%) |
[70,80) | 809 (22.8%) | 529 (24.2%) | 1338 (23.3%) |
[80, Inf) | 500 (14.1%) | 167 (7.6%) | 667 (11.6%) |
sex | |||
Male | 1914 (53.9%) | 1278 (58.5%) | 3192 (55.7%) |
Female | 1637 (46.1%) | 906 (41.5%) | 2543 (44.3%) |
APACHE.score | |||
Mean (SD) | 50.9 (18.8) | 60.7 (20.3) | 54.7 (20.0) |
Median [Min, Max] | 50.0 [3.00, 147] | 60.0 [9.00, 135] | 54.0 [3.00, 147] |
Medical.insurance | |||
Medicaid | 454 (12.8%) | 193 (8.8%) | 647 (11.3%) |
Medicare | 947 (26.7%) | 511 (23.4%) | 1458 (25.4%) |
Medicare & Medicaid | 251 (7.1%) | 123 (5.6%) | 374 (6.5%) |
No insurance | 186 (5.2%) | 136 (6.2%) | 322 (5.6%) |
Private | 967 (27.2%) | 731 (33.5%) | 1698 (29.6%) |
Private & Medicare | 746 (21.0%) | 490 (22.4%) | 1236 (21.6%) |
Handling missing data
# Including missing values in the summary table
table1(~ age + sex + APACHE.score + Medical.insurance | RHC.use.factor, data = ObsData,
overall = TRUE, render.missing = TRUE)
No RHC (N=3551) |
Received RHC (N=2184) |
TRUE (N=5735) |
|
---|---|---|---|
age | |||
[-Inf,50) | 884 (24.9%) | 540 (24.7%) | 1424 (24.8%) |
[50,60) | 546 (15.4%) | 371 (17.0%) | 917 (16.0%) |
[60,70) | 812 (22.9%) | 577 (26.4%) | 1389 (24.2%) |
[70,80) | 809 (22.8%) | 529 (24.2%) | 1338 (23.3%) |
[80, Inf) | 500 (14.1%) | 167 (7.6%) | 667 (11.6%) |
sex | |||
Male | 1914 (53.9%) | 1278 (58.5%) | 3192 (55.7%) |
Female | 1637 (46.1%) | 906 (41.5%) | 2543 (44.3%) |
APACHE.score | |||
Mean (SD) | 50.9 (18.8) | 60.7 (20.3) | 54.7 (20.0) |
Median [Min, Max] | 50.0 [3.00, 147] | 60.0 [9.00, 135] | 54.0 [3.00, 147] |
Medical.insurance | |||
Medicaid | 454 (12.8%) | 193 (8.8%) | 647 (11.3%) |
Medicare | 947 (26.7%) | 511 (23.4%) | 1458 (25.4%) |
Medicare & Medicaid | 251 (7.1%) | 123 (5.6%) | 374 (6.5%) |
No insurance | 186 (5.2%) | 136 (6.2%) | 322 (5.6%) |
Private | 967 (27.2%) | 731 (33.5%) | 1698 (29.6%) |
Private & Medicare | 746 (21.0%) | 490 (22.4%) | 1236 (21.6%) |
gtsummary
gtsummary
provides highly customizable functions to construct tables. It allows renaming variables, adding captions, and selecting specific measures for variable types. It is particularly useful for creating clean, customized Table 1s and includes options for survey data.
Characteristic | N = 5,7351 |
---|---|
Disease.category | |
ARF | 2,490 (43%) |
CHF | 456 (8.0%) |
Other | 1,163 (20%) |
MOSF | 1,626 (28%) |
Cancer | |
None | 4,379 (76%) |
Localized (Yes) | 972 (17%) |
Metastatic | 384 (6.7%) |
Cardiovascular | |
0 | 4,722 (82%) |
1 | 1,013 (18%) |
Congestive.HF | |
0 | 4,714 (82%) |
1 | 1,021 (18%) |
Dementia | |
0 | 5,171 (90%) |
1 | 564 (9.8%) |
Psychiatric | |
0 | 5,349 (93%) |
1 | 386 (6.7%) |
Pulmonary | |
0 | 4,646 (81%) |
1 | 1,089 (19%) |
Renal | |
0 | 5,480 (96%) |
1 | 255 (4.4%) |
Hepatic | |
0 | 5,334 (93%) |
1 | 401 (7.0%) |
GI.Bleed | |
0 | 5,550 (97%) |
1 | 185 (3.2%) |
Tumor | |
0 | 4,419 (77%) |
1 | 1,316 (23%) |
Immunosupperssion | |
0 | 4,192 (73%) |
1 | 1,543 (27%) |
Transfer.hx | |
0 | 5,073 (88%) |
1 | 662 (12%) |
MI | |
0 | 5,535 (97%) |
1 | 200 (3.5%) |
age | |
[-Inf,50) | 1,424 (25%) |
[50,60) | 917 (16%) |
[60,70) | 1,389 (24%) |
[70,80) | 1,338 (23%) |
[80, Inf) | 667 (12%) |
sex | |
Male | 3,192 (56%) |
Female | 2,543 (44%) |
edu | 12.0 (10.0, 13.0) |
DASIndex | 19.7 (16.1, 23.4) |
APACHE.score | 54 (41, 67) |
Glasgow.Coma.Score | 0 (0, 41) |
blood.pressure | 63 (50, 115) |
WBC | 14 (8, 20) |
Heart.rate | 124 (97, 141) |
Respiratory.rate | 30 (14, 38) |
Temperature | 38.09 (36.09, 39.00) |
PaO2vs.FIO2 | 203 (133, 317) |
Albumin | 3.50 (2.60, 3.50) |
Hematocrit | 30 (26, 36) |
Bilirubin | 1.01 (0.80, 1.40) |
Creatinine | 1.50 (1.00, 2.40) |
Sodium | 136 (132, 142) |
Potassium | 3.80 (3.40, 4.60) |
PaCo2 | 37 (31, 42) |
PH | 7.40 (7.34, 7.46) |
Weight | 70 (56, 84) |
DNR.status | 654 (11%) |
Medical.insurance | |
Medicaid | 647 (11%) |
Medicare | 1,458 (25%) |
Medicare & Medicaid | 374 (6.5%) |
No insurance | 322 (5.6%) |
Private | 1,698 (30%) |
Private & Medicare | 1,236 (22%) |
Respiratory.Diag | 2,113 (37%) |
Cardiovascular.Diag | 1,931 (34%) |
Neurological.Diag | 693 (12%) |
Gastrointestinal.Diag | 942 (16%) |
Renal.Diag | 295 (5.1%) |
Metabolic.Diag | 265 (4.6%) |
Hematologic.Diag | 354 (6.2%) |
Sepsis.Diag | 1,031 (18%) |
Trauma.Diag | 52 (0.9%) |
Orthopedic.Diag | 7 (0.1%) |
race | |
white | 4,460 (78%) |
black | 920 (16%) |
other | 355 (6.2%) |
income | |
$11-$25k | 1,165 (20%) |
$25-$50k | 893 (16%) |
> $50k | 451 (7.9%) |
Under $11k | 3,226 (56%) |
Length.of.Stay | 14 (7, 25) |
Death | 3,722 (65%) |
RHC.use | 2,184 (38%) |
1 n (%); Median (IQR) |
tbl_summary(ObsData, by = RHC.use,
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"
),
digits = all_continuous() ~ 2) |>
as_gt() |>
gt::tab_source_note(gt::md("*Add note here.*"))
Characteristic | 0, N = 3,5511 | 1, N = 2,1841 |
---|---|---|
Disease.category | ||
ARF | 1,581 (45%) | 909 (42%) |
CHF | 247 (7.0%) | 209 (9.6%) |
Other | 955 (27%) | 208 (9.5%) |
MOSF | 768 (22%) | 858 (39%) |
Cancer | ||
None | 2,652 (75%) | 1,727 (79%) |
Localized (Yes) | 638 (18%) | 334 (15%) |
Metastatic | 261 (7.4%) | 123 (5.6%) |
Cardiovascular | ||
0 | 2,984 (84%) | 1,738 (80%) |
1 | 567 (16%) | 446 (20%) |
Congestive.HF | ||
0 | 2,955 (83%) | 1,759 (81%) |
1 | 596 (17%) | 425 (19%) |
Dementia | ||
0 | 3,138 (88%) | 2,033 (93%) |
1 | 413 (12%) | 151 (6.9%) |
Psychiatric | ||
0 | 3,265 (92%) | 2,084 (95%) |
1 | 286 (8.1%) | 100 (4.6%) |
Pulmonary | ||
0 | 2,777 (78%) | 1,869 (86%) |
1 | 774 (22%) | 315 (14%) |
Renal | ||
0 | 3,402 (96%) | 2,078 (95%) |
1 | 149 (4.2%) | 106 (4.9%) |
Hepatic | ||
0 | 3,286 (93%) | 2,048 (94%) |
1 | 265 (7.5%) | 136 (6.2%) |
GI.Bleed | ||
0 | 3,420 (96%) | 2,130 (98%) |
1 | 131 (3.7%) | 54 (2.5%) |
Tumor | ||
0 | 2,679 (75%) | 1,740 (80%) |
1 | 872 (25%) | 444 (20%) |
Immunosupperssion | ||
0 | 2,644 (74%) | 1,548 (71%) |
1 | 907 (26%) | 636 (29%) |
Transfer.hx | ||
0 | 3,216 (91%) | 1,857 (85%) |
1 | 335 (9.4%) | 327 (15%) |
MI | ||
0 | 3,446 (97%) | 2,089 (96%) |
1 | 105 (3.0%) | 95 (4.3%) |
age | ||
[-Inf,50) | 884 (25%) | 540 (25%) |
[50,60) | 546 (15%) | 371 (17%) |
[60,70) | 812 (23%) | 577 (26%) |
[70,80) | 809 (23%) | 529 (24%) |
[80, Inf) | 500 (14%) | 167 (7.6%) |
sex | ||
Male | 1,914 (54%) | 1,278 (59%) |
Female | 1,637 (46%) | 906 (41%) |
edu | 11.57 (3.13) | 11.86 (3.16) |
DASIndex | 20.37 (5.48) | 20.70 (5.03) |
APACHE.score | 50.93 (18.81) | 60.74 (20.27) |
Glasgow.Coma.Score | 22.25 (31.37) | 18.97 (28.26) |
blood.pressure | 84.87 (38.87) | 68.20 (34.24) |
WBC | 15.26 (11.41) | 16.27 (12.55) |
Heart.rate | 112.87 (40.94) | 118.93 (41.47) |
Respiratory.rate | 28.98 (13.95) | 26.65 (14.17) |
Temperature | 37.63 (1.74) | 37.59 (1.83) |
PaO2vs.FIO2 | 240.63 (116.66) | 192.43 (105.54) |
Albumin | 3.16 (0.67) | 2.98 (0.93) |
Hematocrit | 32.70 (8.79) | 30.51 (7.42) |
Bilirubin | 2.00 (4.43) | 2.71 (5.33) |
Creatinine | 1.92 (2.03) | 2.47 (2.05) |
Sodium | 137.04 (7.68) | 136.33 (7.60) |
Potassium | 4.08 (1.04) | 4.05 (1.01) |
PaCo2 | 39.95 (14.24) | 36.79 (10.97) |
PH | 7.39 (0.11) | 7.38 (0.11) |
Weight | 65.04 (29.50) | 72.36 (27.73) |
DNR.status | 499 (14%) | 155 (7.1%) |
Medical.insurance | ||
Medicaid | 454 (13%) | 193 (8.8%) |
Medicare | 947 (27%) | 511 (23%) |
Medicare & Medicaid | 251 (7.1%) | 123 (5.6%) |
No insurance | 186 (5.2%) | 136 (6.2%) |
Private | 967 (27%) | 731 (33%) |
Private & Medicare | 746 (21%) | 490 (22%) |
Respiratory.Diag | 1,481 (42%) | 632 (29%) |
Cardiovascular.Diag | 1,007 (28%) | 924 (42%) |
Neurological.Diag | 575 (16%) | 118 (5.4%) |
Gastrointestinal.Diag | 522 (15%) | 420 (19%) |
Renal.Diag | 147 (4.1%) | 148 (6.8%) |
Metabolic.Diag | 172 (4.8%) | 93 (4.3%) |
Hematologic.Diag | 239 (6.7%) | 115 (5.3%) |
Sepsis.Diag | 515 (15%) | 516 (24%) |
Trauma.Diag | 18 (0.5%) | 34 (1.6%) |
Orthopedic.Diag | 3 (<0.1%) | 4 (0.2%) |
race | ||
white | 2,753 (78%) | 1,707 (78%) |
black | 585 (16%) | 335 (15%) |
other | 213 (6.0%) | 142 (6.5%) |
income | ||
$11-$25k | 713 (20%) | 452 (21%) |
$25-$50k | 500 (14%) | 393 (18%) |
> $50k | 257 (7.2%) | 194 (8.9%) |
Under $11k | 2,081 (59%) | 1,145 (52%) |
Length.of.Stay | 19.53 (23.59) | 24.86 (28.90) |
Death | 2,236 (63%) | 1,486 (68%) |
Add note here. | ||
1 n (%); Mean (SD) |
DataExplorer
DataExplorer
offers functions for initial data exploration, including various visualizations. Below are some examples using the RHC dataset.
The introduce
function provides an overview of the dataset dimensions, variable types, and missingness.
Plot the amount of missing data per variable:
Visualize categorical variable distributions with the plot_bar
function:
Visualize the distribution of numerical variables with histograms:
Quantile-quantile plots can be used to assess whether numerical variables are normally distributed:
Generate a correlation plot to show relationships between variables:
Boxplots can visualize variable distributions based on treatment or outcome:
Automatically generate a full PDF report with the create_report
function:
GGally
GGally
provides methods to combine multiple ggplot2 plots, enabling visualization of several variables at once.
ggpairs(ObsData,
columns = c('age', 'sex', 'edu', 'blood.pressure', 'Medical.insurance'),
ggplot2::aes(color=as.factor(RHC.use)))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
modelsummary
modelsummary
provides functions to visualize data, including summaries, correlations, and Table 1-style tables.
require(modelsummary)
#> Loading required package: modelsummary
#> Warning: package 'modelsummary' was built under R version 4.2.3
require(rmarkdown)
#> Loading required package: rmarkdown
#> Warning: package 'rmarkdown' was built under R version 4.2.3
require(markdown)
#> Loading required package: markdown
#> Warning: package 'markdown' was built under R version 4.2.3
Overview of each variable:
Unique (#) | Missing (%) | Mean | SD | Min | Median | Max | ||
---|---|---|---|---|---|---|---|---|
edu | 42 | 0 | 11.7 | 3.1 | 0.0 | 12.0 | 30.0 | |
DASIndex | 1023 | 0 | 20.5 | 5.3 | 11.0 | 19.7 | 33.0 | |
APACHE.score | 123 | 0 | 54.7 | 20.0 | 3.0 | 54.0 | 147.0 | |
Glasgow.Coma.Score | 11 | 0 | 21.0 | 30.3 | 0.0 | 0.0 | 100.0 | |
blood.pressure | 178 | 0 | 78.5 | 38.0 | 0.0 | 63.0 | 259.0 | |
WBC | 520 | 0 | 15.6 | 11.9 | 0.0 | 14.1 | 192.0 | |
Heart.rate | 189 | 0 | 115.2 | 41.2 | 0.0 | 124.0 | 250.0 | |
Respiratory.rate | 72 | 0 | 28.1 | 14.1 | 0.0 | 30.0 | 100.0 | |
Temperature | 118 | 0 | 37.6 | 1.8 | 27.0 | 38.1 | 43.0 | |
PaO2vs.FIO2 | 1342 | 0 | 222.3 | 115.0 | 11.6 | 202.5 | 937.5 | |
Albumin | 57 | 0 | 3.1 | 0.8 | 0.3 | 3.5 | 29.0 | |
Hematocrit | 450 | 0 | 31.9 | 8.4 | 2.0 | 30.0 | 66.2 | |
Bilirubin | 266 | 0 | 2.3 | 4.8 | 0.1 | 1.0 | 58.2 | |
Creatinine | 148 | 0 | 2.1 | 2.1 | 0.1 | 1.5 | 25.1 | |
Sodium | 73 | 0 | 136.8 | 7.7 | 101.0 | 136.0 | 178.0 | |
Potassium | 81 | 0 | 4.1 | 1.0 | 1.1 | 3.8 | 11.9 | |
PaCo2 | 266 | 0 | 38.7 | 13.2 | 1.0 | 37.0 | 156.0 | |
PH | 96 | 0 | 7.4 | 0.1 | 6.6 | 7.4 | 7.8 | |
Weight | 922 | 0 | 67.8 | 29.1 | 0.0 | 70.0 | 244.0 | |
Length.of.Stay | 164 | 0 | 21.6 | 25.9 | 2.0 | 14.0 | 394.0 | |
Death | 2 | 0 | 0.6 | 0.5 | 0.0 | 1.0 | 1.0 | |
RHC.use | 2 | 0 | 0.4 | 0.5 | 0.0 | 0.0 | 1.0 |
Create a Table 1 using datasummary_balance
:
Mean | Std. Dev. | Mean | Std. Dev. | ||
---|---|---|---|---|---|
edu | 11.6 | 3.1 | 11.9 | 3.2 | |
DASIndex | 20.4 | 5.5 | 20.7 | 5.0 | |
APACHE.score | 50.9 | 18.8 | 60.7 | 20.3 | |
Glasgow.Coma.Score | 22.3 | 31.4 | 19.0 | 28.3 | |
blood.pressure | 84.9 | 38.9 | 68.2 | 34.2 | |
WBC | 15.3 | 11.4 | 16.3 | 12.5 | |
Heart.rate | 112.9 | 40.9 | 118.9 | 41.5 | |
Respiratory.rate | 29.0 | 13.9 | 26.7 | 14.2 | |
Temperature | 37.6 | 1.7 | 37.6 | 1.8 | |
PaO2vs.FIO2 | 240.6 | 116.7 | 192.4 | 105.5 | |
Albumin | 3.2 | 0.7 | 3.0 | 0.9 | |
Hematocrit | 32.7 | 8.8 | 30.5 | 7.4 | |
Bilirubin | 2.0 | 4.4 | 2.7 | 5.3 | |
Creatinine | 1.9 | 2.0 | 2.5 | 2.1 | |
Sodium | 137.0 | 7.7 | 136.3 | 7.6 | |
Potassium | 4.1 | 1.0 | 4.0 | 1.0 | |
PaCo2 | 40.0 | 14.2 | 36.8 | 11.0 | |
PH | 7.4 | 0.1 | 7.4 | 0.1 | |
Weight | 65.0 | 29.5 | 72.4 | 27.7 | |
Length.of.Stay | 19.5 | 23.6 | 24.9 | 28.9 | |
Death | 0.6 | 0.5 | 0.7 | 0.5 | |
N | Pct. | N | Pct. | ||
Disease.category | ARF | 1581 | 44.5 | 909 | 41.6 |
CHF | 247 | 7.0 | 209 | 9.6 | |
Other | 955 | 26.9 | 208 | 9.5 | |
MOSF | 768 | 21.6 | 858 | 39.3 | |
Cancer | None | 2652 | 74.7 | 1727 | 79.1 |
Localized (Yes) | 638 | 18.0 | 334 | 15.3 | |
Metastatic | 261 | 7.4 | 123 | 5.6 | |
Cardiovascular | 0 | 2984 | 84.0 | 1738 | 79.6 |
1 | 567 | 16.0 | 446 | 20.4 | |
Congestive.HF | 0 | 2955 | 83.2 | 1759 | 80.5 |
1 | 596 | 16.8 | 425 | 19.5 | |
Dementia | 0 | 3138 | 88.4 | 2033 | 93.1 |
1 | 413 | 11.6 | 151 | 6.9 | |
Psychiatric | 0 | 3265 | 91.9 | 2084 | 95.4 |
1 | 286 | 8.1 | 100 | 4.6 | |
Pulmonary | 0 | 2777 | 78.2 | 1869 | 85.6 |
1 | 774 | 21.8 | 315 | 14.4 | |
Renal | 0 | 3402 | 95.8 | 2078 | 95.1 |
1 | 149 | 4.2 | 106 | 4.9 | |
Hepatic | 0 | 3286 | 92.5 | 2048 | 93.8 |
1 | 265 | 7.5 | 136 | 6.2 | |
GI.Bleed | 0 | 3420 | 96.3 | 2130 | 97.5 |
1 | 131 | 3.7 | 54 | 2.5 | |
Tumor | 0 | 2679 | 75.4 | 1740 | 79.7 |
1 | 872 | 24.6 | 444 | 20.3 | |
Immunosupperssion | 0 | 2644 | 74.5 | 1548 | 70.9 |
1 | 907 | 25.5 | 636 | 29.1 | |
Transfer.hx | 0 | 3216 | 90.6 | 1857 | 85.0 |
1 | 335 | 9.4 | 327 | 15.0 | |
MI | 0 | 3446 | 97.0 | 2089 | 95.7 |
1 | 105 | 3.0 | 95 | 4.3 | |
age | [-Inf,50) | 884 | 24.9 | 540 | 24.7 |
[50,60) | 546 | 15.4 | 371 | 17.0 | |
[60,70) | 812 | 22.9 | 577 | 26.4 | |
[70,80) | 809 | 22.8 | 529 | 24.2 | |
[80, Inf) | 500 | 14.1 | 167 | 7.6 | |
sex | Male | 1914 | 53.9 | 1278 | 58.5 |
Female | 1637 | 46.1 | 906 | 41.5 | |
DNR.status | No | 3052 | 85.9 | 2029 | 92.9 |
Yes | 499 | 14.1 | 155 | 7.1 | |
Medical.insurance | Medicaid | 454 | 12.8 | 193 | 8.8 |
Medicare | 947 | 26.7 | 511 | 23.4 | |
Medicare & Medicaid | 251 | 7.1 | 123 | 5.6 | |
No insurance | 186 | 5.2 | 136 | 6.2 | |
Private | 967 | 27.2 | 731 | 33.5 | |
Private & Medicare | 746 | 21.0 | 490 | 22.4 | |
Respiratory.Diag | No | 2070 | 58.3 | 1552 | 71.1 |
Yes | 1481 | 41.7 | 632 | 28.9 | |
Cardiovascular.Diag | No | 2544 | 71.6 | 1260 | 57.7 |
Yes | 1007 | 28.4 | 924 | 42.3 | |
Neurological.Diag | No | 2976 | 83.8 | 2066 | 94.6 |
Yes | 575 | 16.2 | 118 | 5.4 | |
Gastrointestinal.Diag | No | 3029 | 85.3 | 1764 | 80.8 |
Yes | 522 | 14.7 | 420 | 19.2 | |
Renal.Diag | No | 3404 | 95.9 | 2036 | 93.2 |
Yes | 147 | 4.1 | 148 | 6.8 | |
Metabolic.Diag | No | 3379 | 95.2 | 2091 | 95.7 |
Yes | 172 | 4.8 | 93 | 4.3 | |
Hematologic.Diag | No | 3312 | 93.3 | 2069 | 94.7 |
Yes | 239 | 6.7 | 115 | 5.3 | |
Sepsis.Diag | No | 3036 | 85.5 | 1668 | 76.4 |
Yes | 515 | 14.5 | 516 | 23.6 | |
Trauma.Diag | No | 3533 | 99.5 | 2150 | 98.4 |
Yes | 18 | 0.5 | 34 | 1.6 | |
Orthopedic.Diag | No | 3548 | 99.9 | 2180 | 99.8 |
Yes | 3 | 0.1 | 4 | 0.2 | |
race | white | 2753 | 77.5 | 1707 | 78.2 |
black | 585 | 16.5 | 335 | 15.3 | |
other | 213 | 6.0 | 142 | 6.5 | |
income | $11-$25k | 713 | 20.1 | 452 | 20.7 |
$25-$50k | 500 | 14.1 | 393 | 18.0 | |
> $50k | 257 | 7.2 | 194 | 8.9 | |
Under $11k | 2081 | 58.6 | 1145 | 52.4 |
You can also customize the appearance of tables generated with modelsummary
. For example, you can adjust the number of digits displayed in the summaries.
datasummary_balance(~ RHC.use, ObsData,
fmt="%.2f",
output="markdown")
#> Warning: Please install the `estimatr` package or set `dinm=FALSE` to suppress
#> this warning.
0 / Mean | 0 / Std. Dev. | 1 / Mean | 1 / Std. Dev. | ||
---|---|---|---|---|---|
edu | 11.57 | 3.13 | 11.86 | 3.16 | |
DASIndex | 20.37 | 5.48 | 20.70 | 5.03 | |
APACHE.score | 50.93 | 18.81 | 60.74 | 20.27 | |
Glasgow.Coma.Score | 22.25 | 31.37 | 18.97 | 28.26 | |
blood.pressure | 84.87 | 38.87 | 68.20 | 34.24 | |
WBC | 15.26 | 11.41 | 16.27 | 12.55 | |
Heart.rate | 112.87 | 40.94 | 118.93 | 41.47 | |
Respiratory.rate | 28.98 | 13.95 | 26.65 | 14.17 | |
Temperature | 37.63 | 1.74 | 37.59 | 1.83 | |
PaO2vs.FIO2 | 240.63 | 116.66 | 192.43 | 105.54 | |
Albumin | 3.16 | 0.67 | 2.98 | 0.93 | |
Hematocrit | 32.70 | 8.79 | 30.51 | 7.42 | |
Bilirubin | 2.00 | 4.43 | 2.71 | 5.33 | |
Creatinine | 1.92 | 2.03 | 2.47 | 2.05 | |
Sodium | 137.04 | 7.68 | 136.33 | 7.60 | |
Potassium | 4.08 | 1.04 | 4.05 | 1.01 | |
PaCo2 | 39.95 | 14.24 | 36.79 | 10.97 | |
PH | 7.39 | 0.11 | 7.38 | 0.11 | |
Weight | 65.04 | 29.50 | 72.36 | 27.73 | |
Length.of.Stay | 19.53 | 23.59 | 24.86 | 28.90 | |
Death | 0.63 | 0.48 | 0.68 | 0.47 | |
N | Pct. | N | Pct. | ||
Disease.category | ARF | 1581 | 44.5 | 909 | 41.6 |
CHF | 247 | 7.0 | 209 | 9.6 | |
Other | 955 | 26.9 | 208 | 9.5 | |
MOSF | 768 | 21.6 | 858 | 39.3 | |
Cancer | None | 2652 | 74.7 | 1727 | 79.1 |
Localized (Yes) | 638 | 18.0 | 334 | 15.3 | |
Metastatic | 261 | 7.4 | 123 | 5.6 | |
Cardiovascular | 0 | 2984 | 84.0 | 1738 | 79.6 |
1 | 567 | 16.0 | 446 | 20.4 | |
Congestive.HF | 0 | 2955 | 83.2 | 1759 | 80.5 |
1 | 596 | 16.8 | 425 | 19.5 | |
Dementia | 0 | 3138 | 88.4 | 2033 | 93.1 |
1 | 413 | 11.6 | 151 | 6.9 | |
Psychiatric | 0 | 3265 | 91.9 | 2084 | 95.4 |
1 | 286 | 8.1 | 100 | 4.6 | |
Pulmonary | 0 | 2777 | 78.2 | 1869 | 85.6 |
1 | 774 | 21.8 | 315 | 14.4 | |
Renal | 0 | 3402 | 95.8 | 2078 | 95.1 |
1 | 149 | 4.2 | 106 | 4.9 | |
Hepatic | 0 | 3286 | 92.5 | 2048 | 93.8 |
1 | 265 | 7.5 | 136 | 6.2 | |
GI.Bleed | 0 | 3420 | 96.3 | 2130 | 97.5 |
1 | 131 | 3.7 | 54 | 2.5 | |
Tumor | 0 | 2679 | 75.4 | 1740 | 79.7 |
1 | 872 | 24.6 | 444 | 20.3 | |
Immunosupperssion | 0 | 2644 | 74.5 | 1548 | 70.9 |
1 | 907 | 25.5 | 636 | 29.1 | |
Transfer.hx | 0 | 3216 | 90.6 | 1857 | 85.0 |
1 | 335 | 9.4 | 327 | 15.0 | |
MI | 0 | 3446 | 97.0 | 2089 | 95.7 |
1 | 105 | 3.0 | 95 | 4.3 | |
age | [-Inf,50) | 884 | 24.9 | 540 | 24.7 |
[50,60) | 546 | 15.4 | 371 | 17.0 | |
[60,70) | 812 | 22.9 | 577 | 26.4 | |
[70,80) | 809 | 22.8 | 529 | 24.2 | |
[80, Inf) | 500 | 14.1 | 167 | 7.6 | |
sex | Male | 1914 | 53.9 | 1278 | 58.5 |
Female | 1637 | 46.1 | 906 | 41.5 | |
DNR.status | No | 3052 | 85.9 | 2029 | 92.9 |
Yes | 499 | 14.1 | 155 | 7.1 | |
Medical.insurance | Medicaid | 454 | 12.8 | 193 | 8.8 |
Medicare | 947 | 26.7 | 511 | 23.4 | |
Medicare & Medicaid | 251 | 7.1 | 123 | 5.6 | |
No insurance | 186 | 5.2 | 136 | 6.2 | |
Private | 967 | 27.2 | 731 | 33.5 | |
Private & Medicare | 746 | 21.0 | 490 | 22.4 | |
Respiratory.Diag | No | 2070 | 58.3 | 1552 | 71.1 |
Yes | 1481 | 41.7 | 632 | 28.9 | |
Cardiovascular.Diag | No | 2544 | 71.6 | 1260 | 57.7 |
Yes | 1007 | 28.4 | 924 | 42.3 | |
Neurological.Diag | No | 2976 | 83.8 | 2066 | 94.6 |
Yes | 575 | 16.2 | 118 | 5.4 | |
Gastrointestinal.Diag | No | 3029 | 85.3 | 1764 | 80.8 |
Yes | 522 | 14.7 | 420 | 19.2 | |
Renal.Diag | No | 3404 | 95.9 | 2036 | 93.2 |
Yes | 147 | 4.1 | 148 | 6.8 | |
Metabolic.Diag | No | 3379 | 95.2 | 2091 | 95.7 |
Yes | 172 | 4.8 | 93 | 4.3 | |
Hematologic.Diag | No | 3312 | 93.3 | 2069 | 94.7 |
Yes | 239 | 6.7 | 115 | 5.3 | |
Sepsis.Diag | No | 3036 | 85.5 | 1668 | 76.4 |
Yes | 515 | 14.5 | 516 | 23.6 | |
Trauma.Diag | No | 3533 | 99.5 | 2150 | 98.4 |
Yes | 18 | 0.5 | 34 | 1.6 | |
Orthopedic.Diag | No | 3548 | 99.9 | 2180 | 99.8 |
Yes | 3 | 0.1 | 4 | 0.2 | |
race | white | 2753 | 77.5 | 1707 | 78.2 |
black | 585 | 16.5 | 335 | 15.3 | |
other | 213 | 6.0 | 142 | 6.5 | |
income | $11-$25k | 713 | 20.1 | 452 | 20.7 |
$25-$50k | 500 | 14.1 | 393 | 18.0 | |
> $50k | 257 | 7.2 | 194 | 8.9 | |
Under $11k | 2081 | 58.6 | 1145 | 52.4 |
Extract correlations between variables with datasummary_correlation
:
edu | DASIndex | APACHE.score | Glasgow.Coma.Score | blood.pressure | WBC | Heart.rate | Respiratory.rate | Temperature | PaO2vs.FIO2 | Albumin | Hematocrit | Bilirubin | Creatinine | Sodium | Potassium | PaCo2 | PH | Weight | Length.of.Stay | Death | RHC.use | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
edu | 1 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
DASIndex | .10 | 1 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
APACHE.score | .02 | −.06 | 1 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
Glasgow.Coma.Score | −.02 | .04 | .03 | 1 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
blood.pressure | −.04 | .06 | −.40 | .02 | 1 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
WBC | −.02 | .03 | .13 | .04 | −.03 | 1 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
Heart.rate | .05 | .02 | .22 | −.11 | .06 | .03 | 1 | . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
Respiratory.rate | .03 | .00 | .26 | −.14 | .03 | .01 | .28 | 1 | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
Temperature | .07 | .15 | −.04 | .05 | .01 | −.01 | .22 | .14 | 1 | . | . | . | . | . | . | . | . | . | . | . | . | . |
PaO2vs.FIO2 | −.01 | −.07 | −.23 | .09 | .07 | −.06 | −.08 | −.10 | −.10 | 1 | . | . | . | . | . | . | . | . | . | . | . | . |
Albumin | .00 | .00 | −.29 | .02 | .13 | −.06 | −.08 | .00 | −.01 | .07 | 1 | . | . | . | . | . | . | . | . | . | . | . |
Hematocrit | −.05 | .03 | −.24 | .03 | .16 | −.03 | −.03 | .01 | −.02 | −.01 | .29 | 1 | . | . | . | . | . | . | . | . | . | . |
Bilirubin | .07 | −.01 | .28 | .05 | −.05 | .01 | .03 | .01 | −.03 | .03 | −.09 | −.15 | 1 | . | . | . | . | . | . | . | . | . |
Creatinine | −.02 | −.05 | .38 | .01 | −.09 | .05 | −.06 | −.04 | −.11 | .06 | .00 | −.20 | .12 | 1 | . | . | . | . | . | . | . | . |
Sodium | −.01 | .03 | .02 | .13 | .02 | −.04 | .03 | .02 | .05 | −.02 | .00 | .06 | .02 | −.01 | 1 | . | . | . | . | . | . | . |
Potassium | −.02 | −.06 | .15 | −.01 | −.07 | .08 | −.11 | −.01 | −.13 | .04 | .03 | −.01 | .00 | .30 | −.10 | 1 | . | . | . | . | . | . |
PaCo2 | −.05 | −.09 | −.09 | −.12 | .04 | −.05 | −.01 | .00 | −.04 | −.17 | .09 | .24 | −.08 | −.13 | .06 | .04 | 1 | . | . | . | . | . |
PH | .04 | .05 | −.33 | .03 | .13 | −.06 | .03 | −.01 | .14 | .11 | .04 | −.03 | .02 | −.16 | −.02 | −.20 | −.47 | 1 | . | . | . | . |
Weight | −.06 | .05 | .08 | −.08 | −.02 | −.01 | .02 | −.01 | .02 | −.05 | −.05 | .05 | .01 | .10 | .00 | .06 | .03 | −.05 | 1 | . | . | . |
Length.of.Stay | .02 | .04 | .02 | .00 | −.02 | .03 | .07 | −.01 | .09 | −.08 | −.11 | −.09 | .00 | .01 | .04 | −.02 | .00 | .02 | .02 | 1 | . | . |
Death | −.03 | −.18 | .19 | .12 | −.10 | .03 | −.02 | .01 | −.10 | .02 | −.03 | −.09 | .08 | .08 | .00 | .05 | −.04 | −.04 | −.05 | −.08 | 1 | . |
RHC.use | .04 | .03 | .24 | −.05 | −.21 | .04 | .07 | −.08 | −.01 | −.20 | −.12 | −.13 | .07 | .13 | −.04 | −.01 | −.12 | −.06 | .12 | .10 | .05 | 1 |
Generate a contingency table using datasummary_crosstab
:
age | 0 | 1 | All | |
---|---|---|---|---|
[-Inf,50) | N | 884 | 540 | 1424 |
% row | 62.1 | 37.9 | 100.0 | |
[50,60) | N | 546 | 371 | 917 |
% row | 59.5 | 40.5 | 100.0 | |
[60,70) | N | 812 | 577 | 1389 |
% row | 58.5 | 41.5 | 100.0 | |
[70,80) | N | 809 | 529 | 1338 |
% row | 60.5 | 39.5 | 100.0 | |
[80, Inf) | N | 500 | 167 | 667 |
% row | 75.0 | 25.0 | 100.0 | |
All | N | 3551 | 2184 | 5735 |
% row | 61.9 | 38.1 | 100.0 |
The modelsummary
package also allows for professional display of regression models, combining results from multiple models into one table for easy comparison.
# Example: Fit two regression models
model1 <- lm(Length.of.Stay ~ age + sex + APACHE.score, data = ObsData)
model2 <- lm(Length.of.Stay ~ age + sex + APACHE.score + Medical.insurance, data = ObsData)
# Display both models side by side
modelsummary(list(model1, model2),
statistic = "p.value",
stars = TRUE)
(1) | (2) | |
---|---|---|
(Intercept) | 21.312*** | 21.200*** |
(<0.001) | (<0.001) | |
age[50,60) | −1.065 | −1.116 |
(0.330) | (0.310) | |
age[60,70) | −1.420 | −1.201 |
(0.145) | (0.255) | |
age[70,80) | −2.763** | −2.334+ |
(0.005) | (0.056) | |
age[80, Inf) | −6.762*** | −6.530*** |
(<0.001) | (<0.001) | |
sexFemale | 0.932 | 1.021 |
(0.176) | (0.140) | |
APACHE.score | 0.033+ | 0.032+ |
(0.057) | (0.063) | |
Medical.insuranceMedicare | 0.561 | |
(0.687) | ||
Medical.insuranceMedicare & Medicaid | −2.545 | |
(0.145) | ||
Medical.insuranceNo insurance | 0.475 | |
(0.788) | ||
Medical.insurancePrivate | 0.385 | |
(0.748) | ||
Medical.insurancePrivate & Medicare | −0.817 | |
(0.569) | ||
Num.Obs. | 5735 | 5735 |
R2 | 0.007 | 0.008 |
R2 Adj. | 0.006 | 0.006 |
AIC | 53563.6 | 53567.9 |
BIC | 53616.8 | 53654.4 |
Log.Lik. | −26773.805 | −26770.932 |
F | 6.388 | 4.006 |
RMSE | 25.78 | 25.77 |
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
All tables generated by modelsummary
can be saved in various formats such as LaTeX, HTML, or Word.
stargazer
The stargazer
package is widely used for outputting regression results in LaTeX, HTML, or plain text formats. It allows you to include various statistics, such as standard errors, t-values, p-values, and confidence intervals.
# Example: Fit two regression models
model1 <- lm(Length.of.Stay ~ age + sex + APACHE.score, data = ObsData)
model2 <- lm(Length.of.Stay ~ age + sex + APACHE.score + Medical.insurance, data = ObsData)
# Display the regression results using stargazer
stargazer(model1, model2, type = "text",
title = "Regression Results",
dep.var.labels = "Length of Stay",
covariate.labels = c("Age", "Sex", "APACHE Score", "Medical Insurance"),
out = "regression_table.txt")
#>
#> Regression Results
#> ==============================================================================
#> Dependent variable:
#> ------------------------------------------------
#> Length of Stay
#> (1) (2)
#> ------------------------------------------------------------------------------
#> Age -1.065 -1.116
#> (1.093) (1.099)
#>
#> Sex -1.420 -1.201
#> (0.974) (1.055)
#>
#> APACHE Score -2.763*** -2.334*
#> (0.982) (1.221)
#>
#> Medical Insurance -6.762*** -6.530***
#> (1.213) (1.424)
#>
#> sexFemale 0.932 1.021
#> (0.688) (0.691)
#>
#> APACHE.score 0.033* 0.032*
#> (0.017) (0.017)
#>
#> Medical.insuranceMedicare 0.561
#> (1.391)
#>
#> Medical.insuranceMedicare Medicaid
#> (1.744)
#>
#> Medical.insuranceNo insurance 0.475
#> (1.763)
#>
#> Medical.insurancePrivate 0.385
#> (1.199)
#>
#> Medical.insurancePrivate Medicare
#> (1.436)
#>
#> Constant 21.312*** 21.200***
#> (1.220) (1.495)
#>
#> ------------------------------------------------------------------------------
#> Observations 5,735 5,735
#> R2 0.007 0.008
#> Adjusted R2 0.006 0.006
#> Residual Std. Error 25.795 (df = 5728) 25.793 (df = 5723)
#> F Statistic 6.388*** (df = 6; 5728) 4.006*** (df = 11; 5723)
#> ==============================================================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
broom
The broom
package converts regression model outputs into tidy data frames, making it easy to extract and manipulate specific parts of the model for custom summaries.
# Tidy the model results for model1
tidy(model1)
#> # A tibble: 7 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 21.3 1.22 17.5 1.21e-66
#> 2 age[50,60) -1.07 1.09 -0.975 3.30e- 1
#> 3 age[60,70) -1.42 0.974 -1.46 1.45e- 1
#> 4 age[70,80) -2.76 0.982 -2.81 4.94e- 3
#> 5 age[80, Inf) -6.76 1.21 -5.58 2.57e- 8
#> 6 sexFemale 0.932 0.688 1.35 1.76e- 1
#> 7 APACHE.score 0.0325 0.0171 1.90 5.71e- 2
# Glance at model1 for a quick summary of goodness-of-fit statistics
glance(model1)
#> # A tibble: 1 × 12
#> r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.00665 0.00561 25.8 6.39 0.00000102 6 -26774. 53564. 53617.
#> # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
texreg
The texreg
package allows you to export regression tables into LaTeX, HTML, or Word. It supports outputting multiple regression models side by side for easy comparison.
require(texreg)
#> Loading required package: texreg
#> Version: 1.39.4
#> Date: 2024-07-23
#> Author: Philip Leifeld (University of Manchester)
#>
#> Consider submitting praise using the praise or praise_interactive functions.
#> Please cite the JSS article in your publications -- see citation("texreg").
You can also customize the output from these packages by specifying different statistics to display, such as including robust standard errors, changing the number of decimal places, or adding significance stars.
stargazer(model1, model2, type = "text",
se = list(coef(summary(model1))[ , "Std. Error"],
coef(summary(model2))[ , "Std. Error"]),
star.cutoffs = c(0.05, 0.01, 0.001))
#>
#> ==============================================================================
#> Dependent variable:
#> ------------------------------------------------
#> Length.of.Stay
#> (1) (2)
#> ------------------------------------------------------------------------------
#> age[50,60) -1.065 -1.116
#> (1.093) (1.099)
#>
#> age[60,70) -1.420 -1.201
#> (0.974) (1.055)
#>
#> age[70,80) -2.763** -2.334
#> (0.982) (1.221)
#>
#> age[80, Inf) -6.762*** -6.530***
#> (1.213) (1.424)
#>
#> sexFemale 0.932 1.021
#> (0.688) (0.691)
#>
#> APACHE.score 0.033 0.032
#> (0.017) (0.017)
#>
#> Medical.insuranceMedicare 0.561
#> (1.391)
#>
#> Medical.insuranceMedicare Medicaid
#> (1.744)
#>
#> Medical.insuranceNo insurance 0.475
#> (1.763)
#>
#> Medical.insurancePrivate 0.385
#> (1.199)
#>
#> Medical.insurancePrivate Medicare
#> (1.436)
#>
#> Constant 21.312*** 21.200***
#> (1.220) (1.495)
#>
#> ------------------------------------------------------------------------------
#> Observations 5,735 5,735
#> R2 0.007 0.008
#> Adjusted R2 0.006 0.006
#> Residual Std. Error 25.795 (df = 5728) 25.793 (df = 5723)
#> F Statistic 6.388*** (df = 6; 5728) 4.006*** (df = 11; 5723)
#> ==============================================================================
#> Note: *p<0.05; **p<0.01; ***p<0.001
janitor
The janitor
package simplifies data cleaning tasks, such as checking for duplicate records, cleaning column names, and generating cross-tabulations.
# Clean column names to make them syntactically valid
ObsData_clean <- clean_names(ObsData)
# Method 1: Using the data frame name in the tabyl function
tabyl(ObsData, Disease.category, RHC.use)
#> Disease.category 0 1
#> ARF 1581 909
#> CHF 247 209
#> Other 955 208
#> MOSF 768 858
# Method 2: Using the pipe operator
ObsData %>% tabyl(Disease.category, RHC.use)
#> Disease.category 0 1
#> ARF 1581 909
#> CHF 247 209
#> Other 955 208
#> MOSF 768 858
skimr
The skimr
package provides a more compact and readable summary compared to the default summary()
function. It tailors its output for each variable type.
Name | ObsData |
Number of rows | 5735 |
Number of columns | 52 |
_______________________ | |
Column type frequency: | |
factor | 30 |
numeric | 22 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
Disease.category | 0 | 1 | FALSE | 4 | ARF: 2490, MOS: 1626, Oth: 1163, CHF: 456 |
Cancer | 0 | 1 | FALSE | 3 | Non: 4379, Loc: 972, Met: 384 |
Cardiovascular | 0 | 1 | FALSE | 2 | 0: 4722, 1: 1013 |
Congestive.HF | 0 | 1 | FALSE | 2 | 0: 4714, 1: 1021 |
Dementia | 0 | 1 | FALSE | 2 | 0: 5171, 1: 564 |
Psychiatric | 0 | 1 | FALSE | 2 | 0: 5349, 1: 386 |
Pulmonary | 0 | 1 | FALSE | 2 | 0: 4646, 1: 1089 |
Renal | 0 | 1 | FALSE | 2 | 0: 5480, 1: 255 |
Hepatic | 0 | 1 | FALSE | 2 | 0: 5334, 1: 401 |
GI.Bleed | 0 | 1 | FALSE | 2 | 0: 5550, 1: 185 |
Tumor | 0 | 1 | FALSE | 2 | 0: 4419, 1: 1316 |
Immunosupperssion | 0 | 1 | FALSE | 2 | 0: 4192, 1: 1543 |
Transfer.hx | 0 | 1 | FALSE | 2 | 0: 5073, 1: 662 |
MI | 0 | 1 | FALSE | 2 | 0: 5535, 1: 200 |
age | 0 | 1 | FALSE | 5 | [-I: 1424, [60: 1389, [70: 1338, [50: 917 |
sex | 0 | 1 | FALSE | 2 | Mal: 3192, Fem: 2543 |
DNR.status | 0 | 1 | FALSE | 2 | No: 5081, Yes: 654 |
Medical.insurance | 0 | 1 | FALSE | 6 | Pri: 1698, Med: 1458, Pri: 1236, Med: 647 |
Respiratory.Diag | 0 | 1 | FALSE | 2 | No: 3622, Yes: 2113 |
Cardiovascular.Diag | 0 | 1 | FALSE | 2 | No: 3804, Yes: 1931 |
Neurological.Diag | 0 | 1 | FALSE | 2 | No: 5042, Yes: 693 |
Gastrointestinal.Diag | 0 | 1 | FALSE | 2 | No: 4793, Yes: 942 |
Renal.Diag | 0 | 1 | FALSE | 2 | No: 5440, Yes: 295 |
Metabolic.Diag | 0 | 1 | FALSE | 2 | No: 5470, Yes: 265 |
Hematologic.Diag | 0 | 1 | FALSE | 2 | No: 5381, Yes: 354 |
Sepsis.Diag | 0 | 1 | FALSE | 2 | No: 4704, Yes: 1031 |
Trauma.Diag | 0 | 1 | FALSE | 2 | No: 5683, Yes: 52 |
Orthopedic.Diag | 0 | 1 | FALSE | 2 | No: 5728, Yes: 7 |
race | 0 | 1 | FALSE | 3 | whi: 4460, bla: 920, oth: 355 |
income | 0 | 1 | FALSE | 4 | Und: 3226, $11: 1165, $25: 893, > $: 451 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
edu | 0 | 1 | 11.68 | 3.15 | 0.00 | 10.00 | 12.00 | 13.00 | 30.00 | ▁▇▃▁▁ |
DASIndex | 0 | 1 | 20.50 | 5.32 | 11.00 | 16.06 | 19.75 | 23.43 | 33.00 | ▃▇▆▂▃ |
APACHE.score | 0 | 1 | 54.67 | 19.96 | 3.00 | 41.00 | 54.00 | 67.00 | 147.00 | ▂▇▅▁▁ |
Glasgow.Coma.Score | 0 | 1 | 21.00 | 30.27 | 0.00 | 0.00 | 0.00 | 41.00 | 100.00 | ▇▂▂▁▁ |
blood.pressure | 0 | 1 | 78.52 | 38.05 | 0.00 | 50.00 | 63.00 | 115.00 | 259.00 | ▆▇▆▁▁ |
WBC | 0 | 1 | 15.65 | 11.87 | 0.00 | 8.40 | 14.10 | 20.05 | 192.00 | ▇▁▁▁▁ |
Heart.rate | 0 | 1 | 115.18 | 41.24 | 0.00 | 97.00 | 124.00 | 141.00 | 250.00 | ▁▂▇▂▁ |
Respiratory.rate | 0 | 1 | 28.09 | 14.08 | 0.00 | 14.00 | 30.00 | 38.00 | 100.00 | ▅▇▂▁▁ |
Temperature | 0 | 1 | 37.62 | 1.77 | 27.00 | 36.09 | 38.09 | 39.00 | 43.00 | ▁▁▅▇▁ |
PaO2vs.FIO2 | 0 | 1 | 222.27 | 114.95 | 11.60 | 133.31 | 202.50 | 316.62 | 937.50 | ▇▇▁▁▁ |
Albumin | 0 | 1 | 3.09 | 0.78 | 0.30 | 2.60 | 3.50 | 3.50 | 29.00 | ▇▁▁▁▁ |
Hematocrit | 0 | 1 | 31.87 | 8.36 | 2.00 | 26.10 | 30.00 | 36.30 | 66.19 | ▁▆▇▃▁ |
Bilirubin | 0 | 1 | 2.27 | 4.80 | 0.10 | 0.80 | 1.01 | 1.40 | 58.20 | ▇▁▁▁▁ |
Creatinine | 0 | 1 | 2.13 | 2.05 | 0.10 | 1.00 | 1.50 | 2.40 | 25.10 | ▇▁▁▁▁ |
Sodium | 0 | 1 | 136.77 | 7.66 | 101.00 | 132.00 | 136.00 | 142.00 | 178.00 | ▁▂▇▁▁ |
Potassium | 0 | 1 | 4.07 | 1.03 | 1.10 | 3.40 | 3.80 | 4.60 | 11.90 | ▂▇▁▁▁ |
PaCo2 | 0 | 1 | 38.75 | 13.18 | 1.00 | 31.00 | 37.00 | 42.00 | 156.00 | ▃▇▁▁▁ |
PH | 0 | 1 | 7.39 | 0.11 | 6.58 | 7.34 | 7.40 | 7.46 | 7.77 | ▁▁▂▇▁ |
Weight | 0 | 1 | 67.83 | 29.06 | 0.00 | 56.30 | 70.00 | 83.70 | 244.00 | ▂▇▁▁▁ |
Length.of.Stay | 0 | 1 | 21.56 | 25.87 | 2.00 | 7.00 | 14.00 | 25.00 | 394.00 | ▇▁▁▁▁ |
Death | 0 | 1 | 0.65 | 0.48 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▅▁▁▁▇ |
RHC.use | 0 | 1 | 0.38 | 0.49 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▅ |
corrplot
The corrplot
package is useful for visualizing correlation matrices with different styles (circle, color, etc.), making correlations easier to interpret.
# Step 1: Select only numerical variables from ObsData
ObsData_num <- dplyr::select(ObsData, where(is.numeric))
# Step 2: Compute the correlation matrix
corr_matrix <- cor(ObsData_num, use = "complete.obs")
# Step 3: Plot the correlation matrix using corrplot
corrplot::corrplot(corr_matrix, method = "circle")
visdat
The visdat
package helps visualize missing data patterns, data types, and distributions.
naniar
The naniar
package provides tools to handle and visualize missing data, helping to explore missingness patterns in the data.
# Introduce some missing values for demonstration
ObsData_with_NA <- ObsData
ObsData_with_NA$age[sample(1:nrow(ObsData), 10)] <- NA
ObsData_with_NA$sex[sample(1:nrow(ObsData), 10)] <- NA
# Visualize missing data across variables
gg_miss_var(ObsData_with_NA)
# Plot the missing data upset plot
naniar::gg_miss_upset(ObsData_with_NA)
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?