Useful Packages

This tutorial introduces a variety of methods to explore a dataset, including summary statistics, variable distributions, correlations, and handling missing data.

Note

This tutorial uses the same dataset as previous tutorials, including working with a predictive question, machine learning with a continuous outcome, and machine learning with a binary outcome.

TableOne

TableOne is an R package that provides a simple method to create the classic “Table 1” seen in health research papers, summarizing the characteristics of the dataset. It also offers functions like svyCreateTableOne for survey data, allowing users to account for strata and weights, and display counts and proportions for both weighted and unweighted data.

require(tableone)
#> Loading required package: tableone
CreateTableOne(vars = c("Disease.category", "Cancer", "Cardiovascular", "Congestive.HF", 
                        "Dementia", "Psychiatric", "Pulmonary", "Renal", "Hepatic", "GI.Bleed", "Tumor",
                        "Immunosuppression", "Transfer.hx", "MI", "age", "sex", "edu", "DASIndex",
                        "APACHE.score", "Glasgow.Coma.Score", "blood.pressure", "WBC", "Heart.rate",
                        "Respiratory.rate", "Temperature", "PaO2vs.FIO2", "Albumin", "Hematocrit",
                        "Bilirubin", "Creatinine", "Sodium", "Potassium", "PaCo2", "PH", "Weight",
                        "DNR.status", "Medical.insurance", "Respiratory.Diag", "Cardiovascular.Diag",
                        "Neurological.Diag", "Gastrointestinal.Diag", "Renal.Diag", "Metabolic.Diag",
                        "Hematologic.Diag", "Sepsis.Diag", "Trauma.Diag", "Orthopedic.Diag", "race", 
                        "income", "Length.of.Stay", "Death"),
               strata = "RHC.use",
               data = ObsData,
               includeNA = TRUE,
               test = TRUE)
#> Warning in ModuleReturnVarsExist(vars, data): The data frame does not have:
#> Immunosuppression Dropped
#>                                  Stratified by RHC.use
#>                                   0               1               p      test
#>   n                                 3551            2184                     
#>   Disease.category (%)                                            <0.001     
#>      ARF                            1581 (44.5)      909 (41.6)              
#>      CHF                             247 ( 7.0)      209 ( 9.6)              
#>      Other                           955 (26.9)      208 ( 9.5)              
#>      MOSF                            768 (21.6)      858 (39.3)              
#>   Cancer (%)                                                       0.001     
#>      None                           2652 (74.7)     1727 (79.1)              
#>      Localized (Yes)                 638 (18.0)      334 (15.3)              
#>      Metastatic                      261 ( 7.4)      123 ( 5.6)              
#>   Cardiovascular = 1 (%)             567 (16.0)      446 (20.4)   <0.001     
#>   Congestive.HF = 1 (%)              596 (16.8)      425 (19.5)    0.011     
#>   Dementia = 1 (%)                   413 (11.6)      151 ( 6.9)   <0.001     
#>   Psychiatric = 1 (%)                286 ( 8.1)      100 ( 4.6)   <0.001     
#>   Pulmonary = 1 (%)                  774 (21.8)      315 (14.4)   <0.001     
#>   Renal = 1 (%)                      149 ( 4.2)      106 ( 4.9)    0.268     
#>   Hepatic = 1 (%)                    265 ( 7.5)      136 ( 6.2)    0.084     
#>   GI.Bleed = 1 (%)                   131 ( 3.7)       54 ( 2.5)    0.014     
#>   Tumor = 1 (%)                      872 (24.6)      444 (20.3)   <0.001     
#>   Transfer.hx = 1 (%)                335 ( 9.4)      327 (15.0)   <0.001     
#>   MI = 1 (%)                         105 ( 3.0)       95 ( 4.3)    0.007     
#>   age (%)                                                         <0.001     
#>      [-Inf,50)                       884 (24.9)      540 (24.7)              
#>      [50,60)                         546 (15.4)      371 (17.0)              
#>      [60,70)                         812 (22.9)      577 (26.4)              
#>      [70,80)                         809 (22.8)      529 (24.2)              
#>      [80, Inf)                       500 (14.1)      167 ( 7.6)              
#>   sex = Female (%)                  1637 (46.1)      906 (41.5)    0.001     
#>   edu (mean (SD))                  11.57 (3.13)    11.86 (3.16)    0.001     
#>   DASIndex (mean (SD))             20.37 (5.48)    20.70 (5.03)    0.023     
#>   APACHE.score (mean (SD))         50.93 (18.81)   60.74 (20.27)  <0.001     
#>   Glasgow.Coma.Score (mean (SD))   22.25 (31.37)   18.97 (28.26)  <0.001     
#>   blood.pressure (mean (SD))       84.87 (38.87)   68.20 (34.24)  <0.001     
#>   WBC (mean (SD))                  15.26 (11.41)   16.27 (12.55)   0.002     
#>   Heart.rate (mean (SD))          112.87 (40.94)  118.93 (41.47)  <0.001     
#>   Respiratory.rate (mean (SD))     28.98 (13.95)   26.65 (14.17)  <0.001     
#>   Temperature (mean (SD))          37.63 (1.74)    37.59 (1.83)    0.429     
#>   PaO2vs.FIO2 (mean (SD))         240.63 (116.66) 192.43 (105.54) <0.001     
#>   Albumin (mean (SD))               3.16 (0.67)     2.98 (0.93)   <0.001     
#>   Hematocrit (mean (SD))           32.70 (8.79)    30.51 (7.42)   <0.001     
#>   Bilirubin (mean (SD))             2.00 (4.43)     2.71 (5.33)   <0.001     
#>   Creatinine (mean (SD))            1.92 (2.03)     2.47 (2.05)   <0.001     
#>   Sodium (mean (SD))              137.04 (7.68)   136.33 (7.60)    0.001     
#>   Potassium (mean (SD))             4.08 (1.04)     4.05 (1.01)    0.321     
#>   PaCo2 (mean (SD))                39.95 (14.24)   36.79 (10.97)  <0.001     
#>   PH (mean (SD))                    7.39 (0.11)     7.38 (0.11)   <0.001     
#>   Weight (mean (SD))               65.04 (29.50)   72.36 (27.73)  <0.001     
#>   DNR.status = Yes (%)               499 (14.1)      155 ( 7.1)   <0.001     
#>   Medical.insurance (%)                                           <0.001     
#>      Medicaid                        454 (12.8)      193 ( 8.8)              
#>      Medicare                        947 (26.7)      511 (23.4)              
#>      Medicare & Medicaid             251 ( 7.1)      123 ( 5.6)              
#>      No insurance                    186 ( 5.2)      136 ( 6.2)              
#>      Private                         967 (27.2)      731 (33.5)              
#>      Private & Medicare              746 (21.0)      490 (22.4)              
#>   Respiratory.Diag = Yes (%)        1481 (41.7)      632 (28.9)   <0.001     
#>   Cardiovascular.Diag = Yes (%)     1007 (28.4)      924 (42.3)   <0.001     
#>   Neurological.Diag = Yes (%)        575 (16.2)      118 ( 5.4)   <0.001     
#>   Gastrointestinal.Diag = Yes (%)    522 (14.7)      420 (19.2)   <0.001     
#>   Renal.Diag = Yes (%)               147 ( 4.1)      148 ( 6.8)   <0.001     
#>   Metabolic.Diag = Yes (%)           172 ( 4.8)       93 ( 4.3)    0.337     
#>   Hematologic.Diag = Yes (%)         239 ( 6.7)      115 ( 5.3)    0.029     
#>   Sepsis.Diag = Yes (%)              515 (14.5)      516 (23.6)   <0.001     
#>   Trauma.Diag = Yes (%)               18 ( 0.5)       34 ( 1.6)   <0.001     
#>   Orthopedic.Diag = Yes (%)            3 ( 0.1)        4 ( 0.2)    0.516     
#>   race (%)                                                         0.425     
#>      white                          2753 (77.5)     1707 (78.2)              
#>      black                           585 (16.5)      335 (15.3)              
#>      other                           213 ( 6.0)      142 ( 6.5)              
#>   income (%)                                                      <0.001     
#>      $11-$25k                        713 (20.1)      452 (20.7)              
#>      $25-$50k                        500 (14.1)      393 (18.0)              
#>      > $50k                          257 ( 7.2)      194 ( 8.9)              
#>      Under $11k                     2081 (58.6)     1145 (52.4)              
#>   Length.of.Stay (mean (SD))       19.53 (23.59)   24.86 (28.90)  <0.001     
#>   Death (mean (SD))                 0.63 (0.48)     0.68 (0.47)   <0.001

table1

The table1 package is useful for generating descriptive summary tables commonly used in medical research. Below are examples of how to use it.

Basic usage of table1:

require(table1)
#> Loading required package: table1
#> 
#> Attaching package: 'table1'
#> The following objects are masked from 'package:base':
#> 
#>     units, units<-
ObsData$RHC.use.factor <- factor(ObsData$RHC.use, 
                          levels = c(0, 1), 
                          labels = c("No RHC", "Received RHC"))
# Generate a basic Table 1 summarizing characteristics of ObsData, grouped by 'RHC.use'
table1(~ age + sex + APACHE.score + Medical.insurance | RHC.use.factor, data = ObsData)
No RHC
(N=3551)
Received RHC
(N=2184)
Overall
(N=5735)
age
[-Inf,50) 884 (24.9%) 540 (24.7%) 1424 (24.8%)
[50,60) 546 (15.4%) 371 (17.0%) 917 (16.0%)
[60,70) 812 (22.9%) 577 (26.4%) 1389 (24.2%)
[70,80) 809 (22.8%) 529 (24.2%) 1338 (23.3%)
[80, Inf) 500 (14.1%) 167 (7.6%) 667 (11.6%)
sex
Male 1914 (53.9%) 1278 (58.5%) 3192 (55.7%)
Female 1637 (46.1%) 906 (41.5%) 2543 (44.3%)
APACHE.score
Mean (SD) 50.9 (18.8) 60.7 (20.3) 54.7 (20.0)
Median [Min, Max] 50.0 [3.00, 147] 60.0 [9.00, 135] 54.0 [3.00, 147]
Medical.insurance
Medicaid 454 (12.8%) 193 (8.8%) 647 (11.3%)
Medicare 947 (26.7%) 511 (23.4%) 1458 (25.4%)
Medicare & Medicaid 251 (7.1%) 123 (5.6%) 374 (6.5%)
No insurance 186 (5.2%) 136 (6.2%) 322 (5.6%)
Private 967 (27.2%) 731 (33.5%) 1698 (29.6%)
Private & Medicare 746 (21.0%) 490 (22.4%) 1236 (21.6%)

Customizing labels and formats

# Label variables and modify the format
labels <- list(
  age = "Age (years)",
  APACHE.score = "APACHE II Score",
  Medical.insurance = "Medical Insurance Status"
)

table1(~ age + sex + APACHE.score + Medical.insurance | RHC.use.factor, data = ObsData,
       label = labels, caption = "Table 1: Summary of patient characteristics")
Table 1: Summary of patient characteristics
No RHC
(N=3551)
Received RHC
(N=2184)
Overall
(N=5735)
age
[-Inf,50) 884 (24.9%) 540 (24.7%) 1424 (24.8%)
[50,60) 546 (15.4%) 371 (17.0%) 917 (16.0%)
[60,70) 812 (22.9%) 577 (26.4%) 1389 (24.2%)
[70,80) 809 (22.8%) 529 (24.2%) 1338 (23.3%)
[80, Inf) 500 (14.1%) 167 (7.6%) 667 (11.6%)
sex
Male 1914 (53.9%) 1278 (58.5%) 3192 (55.7%)
Female 1637 (46.1%) 906 (41.5%) 2543 (44.3%)
APACHE.score
Mean (SD) 50.9 (18.8) 60.7 (20.3) 54.7 (20.0)
Median [Min, Max] 50.0 [3.00, 147] 60.0 [9.00, 135] 54.0 [3.00, 147]
Medical.insurance
Medicaid 454 (12.8%) 193 (8.8%) 647 (11.3%)
Medicare 947 (26.7%) 511 (23.4%) 1458 (25.4%)
Medicare & Medicaid 251 (7.1%) 123 (5.6%) 374 (6.5%)
No insurance 186 (5.2%) 136 (6.2%) 322 (5.6%)
Private 967 (27.2%) 731 (33.5%) 1698 (29.6%)
Private & Medicare 746 (21.0%) 490 (22.4%) 1236 (21.6%)

Handling missing data

# Including missing values in the summary table
table1(~ age + sex + APACHE.score + Medical.insurance | RHC.use.factor, data = ObsData,
       overall = TRUE, render.missing = TRUE)
No RHC
(N=3551)
Received RHC
(N=2184)
TRUE
(N=5735)
age
[-Inf,50) 884 (24.9%) 540 (24.7%) 1424 (24.8%)
[50,60) 546 (15.4%) 371 (17.0%) 917 (16.0%)
[60,70) 812 (22.9%) 577 (26.4%) 1389 (24.2%)
[70,80) 809 (22.8%) 529 (24.2%) 1338 (23.3%)
[80, Inf) 500 (14.1%) 167 (7.6%) 667 (11.6%)
sex
Male 1914 (53.9%) 1278 (58.5%) 3192 (55.7%)
Female 1637 (46.1%) 906 (41.5%) 2543 (44.3%)
APACHE.score
Mean (SD) 50.9 (18.8) 60.7 (20.3) 54.7 (20.0)
Median [Min, Max] 50.0 [3.00, 147] 60.0 [9.00, 135] 54.0 [3.00, 147]
Medical.insurance
Medicaid 454 (12.8%) 193 (8.8%) 647 (11.3%)
Medicare 947 (26.7%) 511 (23.4%) 1458 (25.4%)
Medicare & Medicaid 251 (7.1%) 123 (5.6%) 374 (6.5%)
No insurance 186 (5.2%) 136 (6.2%) 322 (5.6%)
Private 967 (27.2%) 731 (33.5%) 1698 (29.6%)
Private & Medicare 746 (21.0%) 490 (22.4%) 1236 (21.6%)
ObsData$RHC.use.factor <- NULL

gtsummary

gtsummary provides highly customizable functions to construct tables. It allows renaming variables, adding captions, and selecting specific measures for variable types. It is particularly useful for creating clean, customized Table 1s and includes options for survey data.

require(gt)
#> Loading required package: gt
#> Warning: package 'gt' was built under R version 4.2.3
require(gtsummary)
#> Loading required package: gtsummary
#> Warning: package 'gtsummary' was built under R version 4.2.3
tbl_summary(ObsData)
Characteristic N = 5,7351
Disease.category
    ARF 2,490 (43%)
    CHF 456 (8.0%)
    Other 1,163 (20%)
    MOSF 1,626 (28%)
Cancer
    None 4,379 (76%)
    Localized (Yes) 972 (17%)
    Metastatic 384 (6.7%)
Cardiovascular
    0 4,722 (82%)
    1 1,013 (18%)
Congestive.HF
    0 4,714 (82%)
    1 1,021 (18%)
Dementia
    0 5,171 (90%)
    1 564 (9.8%)
Psychiatric
    0 5,349 (93%)
    1 386 (6.7%)
Pulmonary
    0 4,646 (81%)
    1 1,089 (19%)
Renal
    0 5,480 (96%)
    1 255 (4.4%)
Hepatic
    0 5,334 (93%)
    1 401 (7.0%)
GI.Bleed
    0 5,550 (97%)
    1 185 (3.2%)
Tumor
    0 4,419 (77%)
    1 1,316 (23%)
Immunosupperssion
    0 4,192 (73%)
    1 1,543 (27%)
Transfer.hx
    0 5,073 (88%)
    1 662 (12%)
MI
    0 5,535 (97%)
    1 200 (3.5%)
age
    [-Inf,50) 1,424 (25%)
    [50,60) 917 (16%)
    [60,70) 1,389 (24%)
    [70,80) 1,338 (23%)
    [80, Inf) 667 (12%)
sex
    Male 3,192 (56%)
    Female 2,543 (44%)
edu 12.0 (10.0, 13.0)
DASIndex 19.7 (16.1, 23.4)
APACHE.score 54 (41, 67)
Glasgow.Coma.Score 0 (0, 41)
blood.pressure 63 (50, 115)
WBC 14 (8, 20)
Heart.rate 124 (97, 141)
Respiratory.rate 30 (14, 38)
Temperature 38.09 (36.09, 39.00)
PaO2vs.FIO2 203 (133, 317)
Albumin 3.50 (2.60, 3.50)
Hematocrit 30 (26, 36)
Bilirubin 1.01 (0.80, 1.40)
Creatinine 1.50 (1.00, 2.40)
Sodium 136 (132, 142)
Potassium 3.80 (3.40, 4.60)
PaCo2 37 (31, 42)
PH 7.40 (7.34, 7.46)
Weight 70 (56, 84)
DNR.status 654 (11%)
Medical.insurance
    Medicaid 647 (11%)
    Medicare 1,458 (25%)
    Medicare & Medicaid 374 (6.5%)
    No insurance 322 (5.6%)
    Private 1,698 (30%)
    Private & Medicare 1,236 (22%)
Respiratory.Diag 2,113 (37%)
Cardiovascular.Diag 1,931 (34%)
Neurological.Diag 693 (12%)
Gastrointestinal.Diag 942 (16%)
Renal.Diag 295 (5.1%)
Metabolic.Diag 265 (4.6%)
Hematologic.Diag 354 (6.2%)
Sepsis.Diag 1,031 (18%)
Trauma.Diag 52 (0.9%)
Orthopedic.Diag 7 (0.1%)
race
    white 4,460 (78%)
    black 920 (16%)
    other 355 (6.2%)
income
    $11-$25k 1,165 (20%)
    $25-$50k 893 (16%)
    > $50k 451 (7.9%)
    Under $11k 3,226 (56%)
Length.of.Stay 14 (7, 25)
Death 3,722 (65%)
RHC.use 2,184 (38%)
1 n (%); Median (IQR)
tbl_summary(ObsData, by = RHC.use,
            statistic = list(
                              all_continuous() ~ "{mean} ({sd})",
                              all_categorical() ~ "{n} ({p}%)"
                            ),
            digits = all_continuous() ~ 2) |> 
  as_gt() |>
  gt::tab_source_note(gt::md("*Add note here.*"))
Characteristic 0, N = 3,5511 1, N = 2,1841
Disease.category

    ARF 1,581 (45%) 909 (42%)
    CHF 247 (7.0%) 209 (9.6%)
    Other 955 (27%) 208 (9.5%)
    MOSF 768 (22%) 858 (39%)
Cancer

    None 2,652 (75%) 1,727 (79%)
    Localized (Yes) 638 (18%) 334 (15%)
    Metastatic 261 (7.4%) 123 (5.6%)
Cardiovascular

    0 2,984 (84%) 1,738 (80%)
    1 567 (16%) 446 (20%)
Congestive.HF

    0 2,955 (83%) 1,759 (81%)
    1 596 (17%) 425 (19%)
Dementia

    0 3,138 (88%) 2,033 (93%)
    1 413 (12%) 151 (6.9%)
Psychiatric

    0 3,265 (92%) 2,084 (95%)
    1 286 (8.1%) 100 (4.6%)
Pulmonary

    0 2,777 (78%) 1,869 (86%)
    1 774 (22%) 315 (14%)
Renal

    0 3,402 (96%) 2,078 (95%)
    1 149 (4.2%) 106 (4.9%)
Hepatic

    0 3,286 (93%) 2,048 (94%)
    1 265 (7.5%) 136 (6.2%)
GI.Bleed

    0 3,420 (96%) 2,130 (98%)
    1 131 (3.7%) 54 (2.5%)
Tumor

    0 2,679 (75%) 1,740 (80%)
    1 872 (25%) 444 (20%)
Immunosupperssion

    0 2,644 (74%) 1,548 (71%)
    1 907 (26%) 636 (29%)
Transfer.hx

    0 3,216 (91%) 1,857 (85%)
    1 335 (9.4%) 327 (15%)
MI

    0 3,446 (97%) 2,089 (96%)
    1 105 (3.0%) 95 (4.3%)
age

    [-Inf,50) 884 (25%) 540 (25%)
    [50,60) 546 (15%) 371 (17%)
    [60,70) 812 (23%) 577 (26%)
    [70,80) 809 (23%) 529 (24%)
    [80, Inf) 500 (14%) 167 (7.6%)
sex

    Male 1,914 (54%) 1,278 (59%)
    Female 1,637 (46%) 906 (41%)
edu 11.57 (3.13) 11.86 (3.16)
DASIndex 20.37 (5.48) 20.70 (5.03)
APACHE.score 50.93 (18.81) 60.74 (20.27)
Glasgow.Coma.Score 22.25 (31.37) 18.97 (28.26)
blood.pressure 84.87 (38.87) 68.20 (34.24)
WBC 15.26 (11.41) 16.27 (12.55)
Heart.rate 112.87 (40.94) 118.93 (41.47)
Respiratory.rate 28.98 (13.95) 26.65 (14.17)
Temperature 37.63 (1.74) 37.59 (1.83)
PaO2vs.FIO2 240.63 (116.66) 192.43 (105.54)
Albumin 3.16 (0.67) 2.98 (0.93)
Hematocrit 32.70 (8.79) 30.51 (7.42)
Bilirubin 2.00 (4.43) 2.71 (5.33)
Creatinine 1.92 (2.03) 2.47 (2.05)
Sodium 137.04 (7.68) 136.33 (7.60)
Potassium 4.08 (1.04) 4.05 (1.01)
PaCo2 39.95 (14.24) 36.79 (10.97)
PH 7.39 (0.11) 7.38 (0.11)
Weight 65.04 (29.50) 72.36 (27.73)
DNR.status 499 (14%) 155 (7.1%)
Medical.insurance

    Medicaid 454 (13%) 193 (8.8%)
    Medicare 947 (27%) 511 (23%)
    Medicare & Medicaid 251 (7.1%) 123 (5.6%)
    No insurance 186 (5.2%) 136 (6.2%)
    Private 967 (27%) 731 (33%)
    Private & Medicare 746 (21%) 490 (22%)
Respiratory.Diag 1,481 (42%) 632 (29%)
Cardiovascular.Diag 1,007 (28%) 924 (42%)
Neurological.Diag 575 (16%) 118 (5.4%)
Gastrointestinal.Diag 522 (15%) 420 (19%)
Renal.Diag 147 (4.1%) 148 (6.8%)
Metabolic.Diag 172 (4.8%) 93 (4.3%)
Hematologic.Diag 239 (6.7%) 115 (5.3%)
Sepsis.Diag 515 (15%) 516 (24%)
Trauma.Diag 18 (0.5%) 34 (1.6%)
Orthopedic.Diag 3 (<0.1%) 4 (0.2%)
race

    white 2,753 (78%) 1,707 (78%)
    black 585 (16%) 335 (15%)
    other 213 (6.0%) 142 (6.5%)
income

    $11-$25k 713 (20%) 452 (21%)
    $25-$50k 500 (14%) 393 (18%)
    > $50k 257 (7.2%) 194 (8.9%)
    Under $11k 2,081 (59%) 1,145 (52%)
Length.of.Stay 19.53 (23.59) 24.86 (28.90)
Death 2,236 (63%) 1,486 (68%)
Add note here.
1 n (%); Mean (SD)

DataExplorer

DataExplorer offers functions for initial data exploration, including various visualizations. Below are some examples using the RHC dataset.

require(DataExplorer)
#> Loading required package: DataExplorer

The introduce function provides an overview of the dataset dimensions, variable types, and missingness.

introduce(ObsData)

Plot the amount of missing data per variable:

plot_missing(ObsData)

Visualize categorical variable distributions with the plot_bar function:

plot_bar(ObsData)

Visualize the distribution of numerical variables with histograms:

plot_histogram(ObsData)

Quantile-quantile plots can be used to assess whether numerical variables are normally distributed:

plot_qq(ObsData)

Generate a correlation plot to show relationships between variables:

plot_correlation(ObsData)

Boxplots can visualize variable distributions based on treatment or outcome:

plot_boxplot(ObsData, by="RHC.use")

Automatically generate a full PDF report with the create_report function:

create_report(ObsData)

GGally

GGally provides methods to combine multiple ggplot2 plots, enabling visualization of several variables at once.

require(GGally)
#> Loading required package: GGally
#> Warning: package 'GGally' was built under R version 4.2.3
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 4.2.3
#> Registered S3 method overwritten by 'GGally':
#>   method from   
#>   +.gg   ggplot2
ggpairs(ObsData, 
        columns = c('age', 'sex', 'edu', 'blood.pressure', 'Medical.insurance'),
        ggplot2::aes(color=as.factor(RHC.use)))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

modelsummary

modelsummary provides functions to visualize data, including summaries, correlations, and Table 1-style tables.

require(modelsummary)
#> Loading required package: modelsummary
#> Warning: package 'modelsummary' was built under R version 4.2.3
require(rmarkdown)
#> Loading required package: rmarkdown
#> Warning: package 'rmarkdown' was built under R version 4.2.3
require(markdown)
#> Loading required package: markdown
#> Warning: package 'markdown' was built under R version 4.2.3

Overview of each variable:

datasummary_skim(ObsData)
Unique (#) Missing (%) Mean SD Min Median Max
edu 42 0 11.7 3.1 0.0 12.0 30.0
DASIndex 1023 0 20.5 5.3 11.0 19.7 33.0
APACHE.score 123 0 54.7 20.0 3.0 54.0 147.0
Glasgow.Coma.Score 11 0 21.0 30.3 0.0 0.0 100.0
blood.pressure 178 0 78.5 38.0 0.0 63.0 259.0
WBC 520 0 15.6 11.9 0.0 14.1 192.0
Heart.rate 189 0 115.2 41.2 0.0 124.0 250.0
Respiratory.rate 72 0 28.1 14.1 0.0 30.0 100.0
Temperature 118 0 37.6 1.8 27.0 38.1 43.0
PaO2vs.FIO2 1342 0 222.3 115.0 11.6 202.5 937.5
Albumin 57 0 3.1 0.8 0.3 3.5 29.0
Hematocrit 450 0 31.9 8.4 2.0 30.0 66.2
Bilirubin 266 0 2.3 4.8 0.1 1.0 58.2
Creatinine 148 0 2.1 2.1 0.1 1.5 25.1
Sodium 73 0 136.8 7.7 101.0 136.0 178.0
Potassium 81 0 4.1 1.0 1.1 3.8 11.9
PaCo2 266 0 38.7 13.2 1.0 37.0 156.0
PH 96 0 7.4 0.1 6.6 7.4 7.8
Weight 922 0 67.8 29.1 0.0 70.0 244.0
Length.of.Stay 164 0 21.6 25.9 2.0 14.0 394.0
Death 2 0 0.6 0.5 0.0 1.0 1.0
RHC.use 2 0 0.4 0.5 0.0 0.0 1.0

Create a Table 1 using datasummary_balance:

datasummary_balance(~ RHC.use, ObsData)
0
1
Mean Std. Dev. Mean Std. Dev.
edu 11.6 3.1 11.9 3.2
DASIndex 20.4 5.5 20.7 5.0
APACHE.score 50.9 18.8 60.7 20.3
Glasgow.Coma.Score 22.3 31.4 19.0 28.3
blood.pressure 84.9 38.9 68.2 34.2
WBC 15.3 11.4 16.3 12.5
Heart.rate 112.9 40.9 118.9 41.5
Respiratory.rate 29.0 13.9 26.7 14.2
Temperature 37.6 1.7 37.6 1.8
PaO2vs.FIO2 240.6 116.7 192.4 105.5
Albumin 3.2 0.7 3.0 0.9
Hematocrit 32.7 8.8 30.5 7.4
Bilirubin 2.0 4.4 2.7 5.3
Creatinine 1.9 2.0 2.5 2.1
Sodium 137.0 7.7 136.3 7.6
Potassium 4.1 1.0 4.0 1.0
PaCo2 40.0 14.2 36.8 11.0
PH 7.4 0.1 7.4 0.1
Weight 65.0 29.5 72.4 27.7
Length.of.Stay 19.5 23.6 24.9 28.9
Death 0.6 0.5 0.7 0.5
N Pct. N Pct.
Disease.category ARF 1581 44.5 909 41.6
CHF 247 7.0 209 9.6
Other 955 26.9 208 9.5
MOSF 768 21.6 858 39.3
Cancer None 2652 74.7 1727 79.1
Localized (Yes) 638 18.0 334 15.3
Metastatic 261 7.4 123 5.6
Cardiovascular 0 2984 84.0 1738 79.6
1 567 16.0 446 20.4
Congestive.HF 0 2955 83.2 1759 80.5
1 596 16.8 425 19.5
Dementia 0 3138 88.4 2033 93.1
1 413 11.6 151 6.9
Psychiatric 0 3265 91.9 2084 95.4
1 286 8.1 100 4.6
Pulmonary 0 2777 78.2 1869 85.6
1 774 21.8 315 14.4
Renal 0 3402 95.8 2078 95.1
1 149 4.2 106 4.9
Hepatic 0 3286 92.5 2048 93.8
1 265 7.5 136 6.2
GI.Bleed 0 3420 96.3 2130 97.5
1 131 3.7 54 2.5
Tumor 0 2679 75.4 1740 79.7
1 872 24.6 444 20.3
Immunosupperssion 0 2644 74.5 1548 70.9
1 907 25.5 636 29.1
Transfer.hx 0 3216 90.6 1857 85.0
1 335 9.4 327 15.0
MI 0 3446 97.0 2089 95.7
1 105 3.0 95 4.3
age [-Inf,50) 884 24.9 540 24.7
[50,60) 546 15.4 371 17.0
[60,70) 812 22.9 577 26.4
[70,80) 809 22.8 529 24.2
[80, Inf) 500 14.1 167 7.6
sex Male 1914 53.9 1278 58.5
Female 1637 46.1 906 41.5
DNR.status No 3052 85.9 2029 92.9
Yes 499 14.1 155 7.1
Medical.insurance Medicaid 454 12.8 193 8.8
Medicare 947 26.7 511 23.4
Medicare & Medicaid 251 7.1 123 5.6
No insurance 186 5.2 136 6.2
Private 967 27.2 731 33.5
Private & Medicare 746 21.0 490 22.4
Respiratory.Diag No 2070 58.3 1552 71.1
Yes 1481 41.7 632 28.9
Cardiovascular.Diag No 2544 71.6 1260 57.7
Yes 1007 28.4 924 42.3
Neurological.Diag No 2976 83.8 2066 94.6
Yes 575 16.2 118 5.4
Gastrointestinal.Diag No 3029 85.3 1764 80.8
Yes 522 14.7 420 19.2
Renal.Diag No 3404 95.9 2036 93.2
Yes 147 4.1 148 6.8
Metabolic.Diag No 3379 95.2 2091 95.7
Yes 172 4.8 93 4.3
Hematologic.Diag No 3312 93.3 2069 94.7
Yes 239 6.7 115 5.3
Sepsis.Diag No 3036 85.5 1668 76.4
Yes 515 14.5 516 23.6
Trauma.Diag No 3533 99.5 2150 98.4
Yes 18 0.5 34 1.6
Orthopedic.Diag No 3548 99.9 2180 99.8
Yes 3 0.1 4 0.2
race white 2753 77.5 1707 78.2
black 585 16.5 335 15.3
other 213 6.0 142 6.5
income $11-$25k 713 20.1 452 20.7
$25-$50k 500 14.1 393 18.0
> $50k 257 7.2 194 8.9
Under $11k 2081 58.6 1145 52.4

You can also customize the appearance of tables generated with modelsummary. For example, you can adjust the number of digits displayed in the summaries.

datasummary_balance(~ RHC.use, ObsData, 
                    fmt="%.2f", 
                    output="markdown")
#> Warning: Please install the `estimatr` package or set `dinm=FALSE` to suppress
#> this warning.
0 / Mean 0 / Std. Dev. 1 / Mean 1 / Std. Dev.
edu 11.57 3.13 11.86 3.16
DASIndex 20.37 5.48 20.70 5.03
APACHE.score 50.93 18.81 60.74 20.27
Glasgow.Coma.Score 22.25 31.37 18.97 28.26
blood.pressure 84.87 38.87 68.20 34.24
WBC 15.26 11.41 16.27 12.55
Heart.rate 112.87 40.94 118.93 41.47
Respiratory.rate 28.98 13.95 26.65 14.17
Temperature 37.63 1.74 37.59 1.83
PaO2vs.FIO2 240.63 116.66 192.43 105.54
Albumin 3.16 0.67 2.98 0.93
Hematocrit 32.70 8.79 30.51 7.42
Bilirubin 2.00 4.43 2.71 5.33
Creatinine 1.92 2.03 2.47 2.05
Sodium 137.04 7.68 136.33 7.60
Potassium 4.08 1.04 4.05 1.01
PaCo2 39.95 14.24 36.79 10.97
PH 7.39 0.11 7.38 0.11
Weight 65.04 29.50 72.36 27.73
Length.of.Stay 19.53 23.59 24.86 28.90
Death 0.63 0.48 0.68 0.47
N Pct. N Pct.
Disease.category ARF 1581 44.5 909 41.6
CHF 247 7.0 209 9.6
Other 955 26.9 208 9.5
MOSF 768 21.6 858 39.3
Cancer None 2652 74.7 1727 79.1
Localized (Yes) 638 18.0 334 15.3
Metastatic 261 7.4 123 5.6
Cardiovascular 0 2984 84.0 1738 79.6
1 567 16.0 446 20.4
Congestive.HF 0 2955 83.2 1759 80.5
1 596 16.8 425 19.5
Dementia 0 3138 88.4 2033 93.1
1 413 11.6 151 6.9
Psychiatric 0 3265 91.9 2084 95.4
1 286 8.1 100 4.6
Pulmonary 0 2777 78.2 1869 85.6
1 774 21.8 315 14.4
Renal 0 3402 95.8 2078 95.1
1 149 4.2 106 4.9
Hepatic 0 3286 92.5 2048 93.8
1 265 7.5 136 6.2
GI.Bleed 0 3420 96.3 2130 97.5
1 131 3.7 54 2.5
Tumor 0 2679 75.4 1740 79.7
1 872 24.6 444 20.3
Immunosupperssion 0 2644 74.5 1548 70.9
1 907 25.5 636 29.1
Transfer.hx 0 3216 90.6 1857 85.0
1 335 9.4 327 15.0
MI 0 3446 97.0 2089 95.7
1 105 3.0 95 4.3
age [-Inf,50) 884 24.9 540 24.7
[50,60) 546 15.4 371 17.0
[60,70) 812 22.9 577 26.4
[70,80) 809 22.8 529 24.2
[80, Inf) 500 14.1 167 7.6
sex Male 1914 53.9 1278 58.5
Female 1637 46.1 906 41.5
DNR.status No 3052 85.9 2029 92.9
Yes 499 14.1 155 7.1
Medical.insurance Medicaid 454 12.8 193 8.8
Medicare 947 26.7 511 23.4
Medicare & Medicaid 251 7.1 123 5.6
No insurance 186 5.2 136 6.2
Private 967 27.2 731 33.5
Private & Medicare 746 21.0 490 22.4
Respiratory.Diag No 2070 58.3 1552 71.1
Yes 1481 41.7 632 28.9
Cardiovascular.Diag No 2544 71.6 1260 57.7
Yes 1007 28.4 924 42.3
Neurological.Diag No 2976 83.8 2066 94.6
Yes 575 16.2 118 5.4
Gastrointestinal.Diag No 3029 85.3 1764 80.8
Yes 522 14.7 420 19.2
Renal.Diag No 3404 95.9 2036 93.2
Yes 147 4.1 148 6.8
Metabolic.Diag No 3379 95.2 2091 95.7
Yes 172 4.8 93 4.3
Hematologic.Diag No 3312 93.3 2069 94.7
Yes 239 6.7 115 5.3
Sepsis.Diag No 3036 85.5 1668 76.4
Yes 515 14.5 516 23.6
Trauma.Diag No 3533 99.5 2150 98.4
Yes 18 0.5 34 1.6
Orthopedic.Diag No 3548 99.9 2180 99.8
Yes 3 0.1 4 0.2
race white 2753 77.5 1707 78.2
black 585 16.5 335 15.3
other 213 6.0 142 6.5
income $11-$25k 713 20.1 452 20.7
$25-$50k 500 14.1 393 18.0
> $50k 257 7.2 194 8.9
Under $11k 2081 58.6 1145 52.4

Extract correlations between variables with datasummary_correlation:

datasummary_correlation(ObsData)
edu  DASIndex APACHE.score Glasgow.Coma.Score blood.pressure WBC Heart.rate Respiratory.rate Temperature  PaO2vs.FIO2 Albumin Hematocrit Bilirubin Creatinine Sodium Potassium  PaCo2 PH Weight Length.of.Stay Death RHC.use
edu 1 . . . . . . . . . . . . . . . . . . . . .
DASIndex .10 1 . . . . . . . . . . . . . . . . . . . .
APACHE.score .02 −.06 1 . . . . . . . . . . . . . . . . . . .
Glasgow.Coma.Score −.02 .04 .03 1 . . . . . . . . . . . . . . . . . .
blood.pressure −.04 .06 −.40 .02 1 . . . . . . . . . . . . . . . . .
WBC −.02 .03 .13 .04 −.03 1 . . . . . . . . . . . . . . . .
Heart.rate .05 .02 .22 −.11 .06 .03 1 . . . . . . . . . . . . . . .
Respiratory.rate .03 .00 .26 −.14 .03 .01 .28 1 . . . . . . . . . . . . . .
Temperature .07 .15 −.04 .05 .01 −.01 .22 .14 1 . . . . . . . . . . . . .
PaO2vs.FIO2 −.01 −.07 −.23 .09 .07 −.06 −.08 −.10 −.10 1 . . . . . . . . . . . .
Albumin .00 .00 −.29 .02 .13 −.06 −.08 .00 −.01 .07 1 . . . . . . . . . . .
Hematocrit −.05 .03 −.24 .03 .16 −.03 −.03 .01 −.02 −.01 .29 1 . . . . . . . . . .
Bilirubin .07 −.01 .28 .05 −.05 .01 .03 .01 −.03 .03 −.09 −.15 1 . . . . . . . . .
Creatinine −.02 −.05 .38 .01 −.09 .05 −.06 −.04 −.11 .06 .00 −.20 .12 1 . . . . . . . .
Sodium −.01 .03 .02 .13 .02 −.04 .03 .02 .05 −.02 .00 .06 .02 −.01 1 . . . . . . .
Potassium −.02 −.06 .15 −.01 −.07 .08 −.11 −.01 −.13 .04 .03 −.01 .00 .30 −.10 1 . . . . . .
PaCo2 −.05 −.09 −.09 −.12 .04 −.05 −.01 .00 −.04 −.17 .09 .24 −.08 −.13 .06 .04 1 . . . . .
PH .04 .05 −.33 .03 .13 −.06 .03 −.01 .14 .11 .04 −.03 .02 −.16 −.02 −.20 −.47 1 . . . .
Weight −.06 .05 .08 −.08 −.02 −.01 .02 −.01 .02 −.05 −.05 .05 .01 .10 .00 .06 .03 −.05 1 . . .
Length.of.Stay .02 .04 .02 .00 −.02 .03 .07 −.01 .09 −.08 −.11 −.09 .00 .01 .04 −.02 .00 .02 .02 1 . .
Death −.03 −.18 .19 .12 −.10 .03 −.02 .01 −.10 .02 −.03 −.09 .08 .08 .00 .05 −.04 −.04 −.05 −.08 1 .
RHC.use .04 .03 .24 −.05 −.21 .04 .07 −.08 −.01 −.20 −.12 −.13 .07 .13 −.04 −.01 −.12 −.06 .12 .10 .05 1

Generate a contingency table using datasummary_crosstab:

datasummary_crosstab(age ~ RHC.use, data=ObsData)
age 0 1 All
[-Inf,50) N 884 540 1424
% row 62.1 37.9 100.0
[50,60) N 546 371 917
% row 59.5 40.5 100.0
[60,70) N 812 577 1389
% row 58.5 41.5 100.0
[70,80) N 809 529 1338
% row 60.5 39.5 100.0
[80, Inf) N 500 167 667
% row 75.0 25.0 100.0
All N 3551 2184 5735
% row 61.9 38.1 100.0

The modelsummary package also allows for professional display of regression models, combining results from multiple models into one table for easy comparison.

# Example: Fit two regression models
model1 <- lm(Length.of.Stay ~ age + sex + APACHE.score, data = ObsData)
model2 <- lm(Length.of.Stay ~ age + sex + APACHE.score + Medical.insurance, data = ObsData)

# Display both models side by side
modelsummary(list(model1, model2), 
             statistic = "p.value", 
             stars = TRUE)
 (1)   (2)
(Intercept) 21.312*** 21.200***
(<0.001) (<0.001)
age[50,60) −1.065 −1.116
(0.330) (0.310)
age[60,70) −1.420 −1.201
(0.145) (0.255)
age[70,80) −2.763** −2.334+
(0.005) (0.056)
age[80, Inf) −6.762*** −6.530***
(<0.001) (<0.001)
sexFemale 0.932 1.021
(0.176) (0.140)
APACHE.score 0.033+ 0.032+
(0.057) (0.063)
Medical.insuranceMedicare 0.561
(0.687)
Medical.insuranceMedicare & Medicaid −2.545
(0.145)
Medical.insuranceNo insurance 0.475
(0.788)
Medical.insurancePrivate 0.385
(0.748)
Medical.insurancePrivate & Medicare −0.817
(0.569)
Num.Obs. 5735 5735
R2 0.007 0.008
R2 Adj. 0.006 0.006
AIC 53563.6 53567.9
BIC 53616.8 53654.4
Log.Lik. −26773.805 −26770.932
F 6.388 4.006
RMSE 25.78 25.77
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

All tables generated by modelsummary can be saved in various formats such as LaTeX, HTML, or Word.

# Save as a LaTeX file
modelsummary(model1, output = "table.tex")

# Save as an HTML file
modelsummary(model1, output = "table.html")

stargazer

The stargazer package is widely used for outputting regression results in LaTeX, HTML, or plain text formats. It allows you to include various statistics, such as standard errors, t-values, p-values, and confidence intervals.

require(stargazer)
#> Loading required package: stargazer
#> 
#> Please cite as:
#>  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
#>  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
# Example: Fit two regression models
model1 <- lm(Length.of.Stay ~ age + sex + APACHE.score, data = ObsData)
model2 <- lm(Length.of.Stay ~ age + sex + APACHE.score + Medical.insurance, data = ObsData)

# Display the regression results using stargazer
stargazer(model1, model2, type = "text", 
          title = "Regression Results",
          dep.var.labels = "Length of Stay",
          covariate.labels = c("Age", "Sex", "APACHE Score", "Medical Insurance"),
          out = "regression_table.txt")
#> 
#> Regression Results
#> ==============================================================================
#>                                             Dependent variable:               
#>                               ------------------------------------------------
#>                                                Length of Stay                 
#>                                         (1)                     (2)           
#> ------------------------------------------------------------------------------
#> Age                                   -1.065                   -1.116         
#>                                       (1.093)                 (1.099)         
#>                                                                               
#> Sex                                   -1.420                   -1.201         
#>                                       (0.974)                 (1.055)         
#>                                                                               
#> APACHE Score                         -2.763***                -2.334*         
#>                                       (0.982)                 (1.221)         
#>                                                                               
#> Medical Insurance                    -6.762***               -6.530***        
#>                                       (1.213)                 (1.424)         
#>                                                                               
#> sexFemale                              0.932                   1.021          
#>                                       (0.688)                 (0.691)         
#>                                                                               
#> APACHE.score                          0.033*                   0.032*         
#>                                       (0.017)                 (0.017)         
#>                                                                               
#> Medical.insuranceMedicare                                      0.561          
#>                                                               (1.391)         
#>                                                                               
#> Medical.insuranceMedicare            Medicaid                                 
#>                                                               (1.744)         
#>                                                                               
#> Medical.insuranceNo insurance                                  0.475          
#>                                                               (1.763)         
#>                                                                               
#> Medical.insurancePrivate                                       0.385          
#>                                                               (1.199)         
#>                                                                               
#> Medical.insurancePrivate             Medicare                                 
#>                                                               (1.436)         
#>                                                                               
#> Constant                             21.312***               21.200***        
#>                                       (1.220)                 (1.495)         
#>                                                                               
#> ------------------------------------------------------------------------------
#> Observations                           5,735                   5,735          
#> R2                                     0.007                   0.008          
#> Adjusted R2                            0.006                   0.006          
#> Residual Std. Error             25.795 (df = 5728)       25.793 (df = 5723)   
#> F Statistic                   6.388*** (df = 6; 5728) 4.006*** (df = 11; 5723)
#> ==============================================================================
#> Note:                                              *p<0.1; **p<0.05; ***p<0.01

broom

The broom package converts regression model outputs into tidy data frames, making it easy to extract and manipulate specific parts of the model for custom summaries.

require(broom)
#> Loading required package: broom
# Tidy the model results for model1
tidy(model1)
#> # A tibble: 7 × 5
#>   term         estimate std.error statistic  p.value
#>   <chr>           <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)   21.3       1.22      17.5   1.21e-66
#> 2 age[50,60)    -1.07      1.09      -0.975 3.30e- 1
#> 3 age[60,70)    -1.42      0.974     -1.46  1.45e- 1
#> 4 age[70,80)    -2.76      0.982     -2.81  4.94e- 3
#> 5 age[80, Inf)  -6.76      1.21      -5.58  2.57e- 8
#> 6 sexFemale      0.932     0.688      1.35  1.76e- 1
#> 7 APACHE.score   0.0325    0.0171     1.90  5.71e- 2

# Glance at model1 for a quick summary of goodness-of-fit statistics
glance(model1)
#> # A tibble: 1 × 12
#>   r.squared adj.r.squared sigma statistic    p.value    df  logLik    AIC    BIC
#>       <dbl>         <dbl> <dbl>     <dbl>      <dbl> <dbl>   <dbl>  <dbl>  <dbl>
#> 1   0.00665       0.00561  25.8      6.39 0.00000102     6 -26774. 53564. 53617.
#> # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

texreg

The texreg package allows you to export regression tables into LaTeX, HTML, or Word. It supports outputting multiple regression models side by side for easy comparison.

require(texreg)
#> Loading required package: texreg
#> Version:  1.39.4
#> Date:     2024-07-23
#> Author:   Philip Leifeld (University of Manchester)
#> 
#> Consider submitting praise using the praise or praise_interactive functions.
#> Please cite the JSS article in your publications -- see citation("texreg").
# Export models into a LaTeX table
texreg(list(model1, model2), 
       file = "regression_table.tex", 
       label = "tab:regression", 
       caption = "Regression Models for Length of Stay")
#> The table was written to the file 'regression_table.tex'.

You can also customize the output from these packages by specifying different statistics to display, such as including robust standard errors, changing the number of decimal places, or adding significance stars.

stargazer(model1, model2, type = "text", 
          se = list(coef(summary(model1))[ , "Std. Error"], 
                    coef(summary(model2))[ , "Std. Error"]), 
          star.cutoffs = c(0.05, 0.01, 0.001))
#> 
#> ==============================================================================
#>                                             Dependent variable:               
#>                               ------------------------------------------------
#>                                                Length.of.Stay                 
#>                                         (1)                     (2)           
#> ------------------------------------------------------------------------------
#> age[50,60)                            -1.065                   -1.116         
#>                                       (1.093)                 (1.099)         
#>                                                                               
#> age[60,70)                            -1.420                   -1.201         
#>                                       (0.974)                 (1.055)         
#>                                                                               
#> age[70,80)                           -2.763**                  -2.334         
#>                                       (0.982)                 (1.221)         
#>                                                                               
#> age[80, Inf)                         -6.762***               -6.530***        
#>                                       (1.213)                 (1.424)         
#>                                                                               
#> sexFemale                              0.932                   1.021          
#>                                       (0.688)                 (0.691)         
#>                                                                               
#> APACHE.score                           0.033                   0.032          
#>                                       (0.017)                 (0.017)         
#>                                                                               
#> Medical.insuranceMedicare                                      0.561          
#>                                                               (1.391)         
#>                                                                               
#> Medical.insuranceMedicare            Medicaid                                 
#>                                                               (1.744)         
#>                                                                               
#> Medical.insuranceNo insurance                                  0.475          
#>                                                               (1.763)         
#>                                                                               
#> Medical.insurancePrivate                                       0.385          
#>                                                               (1.199)         
#>                                                                               
#> Medical.insurancePrivate             Medicare                                 
#>                                                               (1.436)         
#>                                                                               
#> Constant                             21.312***               21.200***        
#>                                       (1.220)                 (1.495)         
#>                                                                               
#> ------------------------------------------------------------------------------
#> Observations                           5,735                   5,735          
#> R2                                     0.007                   0.008          
#> Adjusted R2                            0.006                   0.006          
#> Residual Std. Error             25.795 (df = 5728)       25.793 (df = 5723)   
#> F Statistic                   6.388*** (df = 6; 5728) 4.006*** (df = 11; 5723)
#> ==============================================================================
#> Note:                                            *p<0.05; **p<0.01; ***p<0.001

janitor

The janitor package simplifies data cleaning tasks, such as checking for duplicate records, cleaning column names, and generating cross-tabulations.

require(janitor)
#> Loading required package: janitor
#> Warning: package 'janitor' was built under R version 4.2.3
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test
# Clean column names to make them syntactically valid
ObsData_clean <- clean_names(ObsData)

# Method 1: Using the data frame name in the tabyl function
tabyl(ObsData, Disease.category, RHC.use)
#>  Disease.category    0   1
#>               ARF 1581 909
#>               CHF  247 209
#>             Other  955 208
#>              MOSF  768 858

# Method 2: Using the pipe operator
ObsData %>% tabyl(Disease.category, RHC.use)
#>  Disease.category    0   1
#>               ARF 1581 909
#>               CHF  247 209
#>             Other  955 208
#>              MOSF  768 858

skimr

The skimr package provides a more compact and readable summary compared to the default summary() function. It tailors its output for each variable type.

require(skimr)
#> Loading required package: skimr
#> Warning: package 'skimr' was built under R version 4.2.3
# Quick summary of the entire dataset
skim(ObsData)
Data summary
Name ObsData
Number of rows 5735
Number of columns 52
_______________________
Column type frequency:
factor 30
numeric 22
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
Disease.category 0 1 FALSE 4 ARF: 2490, MOS: 1626, Oth: 1163, CHF: 456
Cancer 0 1 FALSE 3 Non: 4379, Loc: 972, Met: 384
Cardiovascular 0 1 FALSE 2 0: 4722, 1: 1013
Congestive.HF 0 1 FALSE 2 0: 4714, 1: 1021
Dementia 0 1 FALSE 2 0: 5171, 1: 564
Psychiatric 0 1 FALSE 2 0: 5349, 1: 386
Pulmonary 0 1 FALSE 2 0: 4646, 1: 1089
Renal 0 1 FALSE 2 0: 5480, 1: 255
Hepatic 0 1 FALSE 2 0: 5334, 1: 401
GI.Bleed 0 1 FALSE 2 0: 5550, 1: 185
Tumor 0 1 FALSE 2 0: 4419, 1: 1316
Immunosupperssion 0 1 FALSE 2 0: 4192, 1: 1543
Transfer.hx 0 1 FALSE 2 0: 5073, 1: 662
MI 0 1 FALSE 2 0: 5535, 1: 200
age 0 1 FALSE 5 [-I: 1424, [60: 1389, [70: 1338, [50: 917
sex 0 1 FALSE 2 Mal: 3192, Fem: 2543
DNR.status 0 1 FALSE 2 No: 5081, Yes: 654
Medical.insurance 0 1 FALSE 6 Pri: 1698, Med: 1458, Pri: 1236, Med: 647
Respiratory.Diag 0 1 FALSE 2 No: 3622, Yes: 2113
Cardiovascular.Diag 0 1 FALSE 2 No: 3804, Yes: 1931
Neurological.Diag 0 1 FALSE 2 No: 5042, Yes: 693
Gastrointestinal.Diag 0 1 FALSE 2 No: 4793, Yes: 942
Renal.Diag 0 1 FALSE 2 No: 5440, Yes: 295
Metabolic.Diag 0 1 FALSE 2 No: 5470, Yes: 265
Hematologic.Diag 0 1 FALSE 2 No: 5381, Yes: 354
Sepsis.Diag 0 1 FALSE 2 No: 4704, Yes: 1031
Trauma.Diag 0 1 FALSE 2 No: 5683, Yes: 52
Orthopedic.Diag 0 1 FALSE 2 No: 5728, Yes: 7
race 0 1 FALSE 3 whi: 4460, bla: 920, oth: 355
income 0 1 FALSE 4 Und: 3226, $11: 1165, $25: 893, > $: 451

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
edu 0 1 11.68 3.15 0.00 10.00 12.00 13.00 30.00 ▁▇▃▁▁
DASIndex 0 1 20.50 5.32 11.00 16.06 19.75 23.43 33.00 ▃▇▆▂▃
APACHE.score 0 1 54.67 19.96 3.00 41.00 54.00 67.00 147.00 ▂▇▅▁▁
Glasgow.Coma.Score 0 1 21.00 30.27 0.00 0.00 0.00 41.00 100.00 ▇▂▂▁▁
blood.pressure 0 1 78.52 38.05 0.00 50.00 63.00 115.00 259.00 ▆▇▆▁▁
WBC 0 1 15.65 11.87 0.00 8.40 14.10 20.05 192.00 ▇▁▁▁▁
Heart.rate 0 1 115.18 41.24 0.00 97.00 124.00 141.00 250.00 ▁▂▇▂▁
Respiratory.rate 0 1 28.09 14.08 0.00 14.00 30.00 38.00 100.00 ▅▇▂▁▁
Temperature 0 1 37.62 1.77 27.00 36.09 38.09 39.00 43.00 ▁▁▅▇▁
PaO2vs.FIO2 0 1 222.27 114.95 11.60 133.31 202.50 316.62 937.50 ▇▇▁▁▁
Albumin 0 1 3.09 0.78 0.30 2.60 3.50 3.50 29.00 ▇▁▁▁▁
Hematocrit 0 1 31.87 8.36 2.00 26.10 30.00 36.30 66.19 ▁▆▇▃▁
Bilirubin 0 1 2.27 4.80 0.10 0.80 1.01 1.40 58.20 ▇▁▁▁▁
Creatinine 0 1 2.13 2.05 0.10 1.00 1.50 2.40 25.10 ▇▁▁▁▁
Sodium 0 1 136.77 7.66 101.00 132.00 136.00 142.00 178.00 ▁▂▇▁▁
Potassium 0 1 4.07 1.03 1.10 3.40 3.80 4.60 11.90 ▂▇▁▁▁
PaCo2 0 1 38.75 13.18 1.00 31.00 37.00 42.00 156.00 ▃▇▁▁▁
PH 0 1 7.39 0.11 6.58 7.34 7.40 7.46 7.77 ▁▁▂▇▁
Weight 0 1 67.83 29.06 0.00 56.30 70.00 83.70 244.00 ▂▇▁▁▁
Length.of.Stay 0 1 21.56 25.87 2.00 7.00 14.00 25.00 394.00 ▇▁▁▁▁
Death 0 1 0.65 0.48 0.00 0.00 1.00 1.00 1.00 ▅▁▁▁▇
RHC.use 0 1 0.38 0.49 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▅

corrplot

The corrplot package is useful for visualizing correlation matrices with different styles (circle, color, etc.), making correlations easier to interpret.

require(corrplot)
#> Loading required package: corrplot
#> corrplot 0.92 loaded
# Step 1: Select only numerical variables from ObsData
ObsData_num <- dplyr::select(ObsData, where(is.numeric))

# Step 2: Compute the correlation matrix
corr_matrix <- cor(ObsData_num, use = "complete.obs")

# Step 3: Plot the correlation matrix using corrplot
corrplot::corrplot(corr_matrix, method = "circle")

visdat

The visdat package helps visualize missing data patterns, data types, and distributions.

require(visdat)
#> Loading required package: visdat
#> Warning: package 'visdat' was built under R version 4.2.3
# Visualize missing data patterns
vis_miss(ObsData)


# Visualize data types and missingness
vis_dat(ObsData)

naniar

The naniar package provides tools to handle and visualize missing data, helping to explore missingness patterns in the data.

require(naniar)
#> Loading required package: naniar
#> Warning: package 'naniar' was built under R version 4.2.3
#> 
#> Attaching package: 'naniar'
#> The following object is masked from 'package:skimr':
#> 
#>     n_complete
# Introduce some missing values for demonstration
ObsData_with_NA <- ObsData
ObsData_with_NA$age[sample(1:nrow(ObsData), 10)] <- NA
ObsData_with_NA$sex[sample(1:nrow(ObsData), 10)] <- NA

# Visualize missing data across variables
gg_miss_var(ObsData_with_NA)


# Plot the missing data upset plot
naniar::gg_miss_upset(ObsData_with_NA)
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?