R Functions (E)

Note

This review/summary page provides an extensive list of R functions tailored for exploratory data analysis tasks that we have used in this chapter. Each function is systematically described, highlighting its primary package source and its specific utility.

To learn more about these functions, readers can:

  1. Use R’s Built-in Help System: For each function, access its documentation by prefixing the function name with a question mark in the R console, e.g., ?summary. This displays the function’s manual page with descriptions, usage, and examples.

  2. Search Websites: Simply Google, or visit the CRAN website to search for specific function documentation. Websites like Stack Overflow and RStudio Community often have discussions related to R functions.

  3. Tutorials and Online Courses: Platforms like DataCamp, Coursera, and edX offer R courses that cover many functions in depth. Also there are examples of dedicated R tutorial websites that you might find useful. One example is “Introduction to R for health data analysis” by Ehsan Karim, An Hoang and Qu.

  4. Books: There are numerous R programming books, such as “R for Data Science” by Hadley Wickham and “The Art of R Programming” by Norman Matloff.

  5. Workshops and Webinars: Institutions and organizations occasionally offer R programming workshops or webinars.

Whenever in doubt, exploring existing resources can be highly beneficial.

Function_name Package_name Use
abs base Returns the absolute value, used here to flag Z-scores beyond a threshold.
aes ggplot2 Defines aesthetic mappings (e.g., x, y, fill) for a ggplot.
as.factor base Converts a variable to a factor. `as.factor` is a wrapper for the `factor` function.
as_gt gtsummary Converts a `gtsummary` table object to a `gt` table for further styling.
class base Returns the class (type) of an object or variable.
clean_names janitor Cleans column names to make them syntactically valid and consistent.
coef stats Extracts model coefficients (e.g., from a fitted regression model).
colnames base Returns the column names (variable names) of a data frame.
cor stats Computes the correlation matrix between numerical variables.
CreateTableOne tableone Creates the classic 'Table 1' summarizing dataset characteristics, optionally by strata.
create_report DataExplorer Generates a full automated exploratory data analysis report as a document.
datasummary_balance modelsummary Creates a Table 1-style balance table comparing groups.
datasummary_correlation modelsummary Extracts and displays correlations between variables.
datasummary_crosstab modelsummary Generates a contingency (cross-tabulation) table.
datasummary_skim modelsummary Produces a quick skim-style overview of each variable in a dataset.
dim base Returns the dimensions of a data frame (rows x columns).
element_text ggplot2 Controls text appearance (e.g., axis text angle) in a ggplot theme.
factor base Creates a factor variable with specified levels and labels.
geom_bar ggplot2 Draws bar charts to visualize counts of categorical variables.
geom_boxplot ggplot2 Draws boxplots to summarize distributions and detect outliers.
geom_density ggplot2 Draws density plots, useful for comparing distributions across groups.
geom_histogram ggplot2 Draws histograms to visualize the distribution of continuous variables.
geom_point ggplot2 Draws scatterplots to examine relationships between two numerical variables.
geom_tile ggplot2 Draws tiles, used here to build a correlation heatmap.
gg_miss_upset naniar Plots an UpSet chart of missing data combinations.
gg_miss_var naniar Plots the number of missing values per variable.
ggarrange ggpubr Arranges multiple ggplot objects into a single figure.
ggpairs GGally Creates a matrix of pairwise plots for several variables.
ggplot ggplot2 Initializes a ggplot object for building plots layer by layer.
glance broom Returns a one-row summary of goodness-of-fit statistics for a model.
head base Displays the first six elements of an object (e.g., a dataset).
introduce DataExplorer Provides an overview of dataset dimensions, variable types, and missingness.
is.na base Checks for missing values in a variable.
kable knitr Renders a data frame as a formatted table.
lm stats Fits a linear regression model.
md gt Formats text as markdown (used for table notes).
mean base Computes the arithmetic mean, used here for mean imputation.
melt reshape2 Reshapes a matrix from wide to long format (e.g., a correlation matrix).
modelsummary modelsummary Displays one or more regression models in a formatted table.
na.omit base/stats Removes all rows with missing values from a dataset.
plot_bar DataExplorer Plots categorical variable distributions as bar charts.
plot_boxplot DataExplorer Draws boxplots of variables grouped by a treatment or outcome.
plot_correlation DataExplorer Plots a correlation matrix of variables.
plot_histogram DataExplorer Plots histograms of numerical variables.
plot_missing DataExplorer Plots the amount of missing data per variable.
plot_qq DataExplorer Draws quantile-quantile plots to assess normality.
readRDS base Reads a single R object stored in an RDS file.
require base Loads a package, returning a logical value rather than erroring if unavailable.
round base Rounds numeric values to a specified number of digits.
sapply base Applies a function over elements (e.g., columns) of an object.
scale base Standardizes a numeric variable (computes Z-scores).
select dplyr Selects specified variables from a dataset.
skim skimr Provides a compact, type-aware summary of a dataset.
stargazer stargazer Outputs regression results in text, LaTeX, or HTML formats.
sum base Returns the sum of values; with `is.na` counts missing values.
summary base Provides a summary of an object, like variable statistics.
tab_source_note gt Adds a source note (caption) to a `gt` table.
table1 table1 Generates descriptive summary tables common in medical research.
tabyl janitor Generates cross-tabulations from a data frame.
tbl_summary gtsummary Creates customizable summary tables, including for survey data.
texreg texreg Exports regression tables to LaTeX, HTML, or Word.
theme ggplot2 Customizes non-data elements of a ggplot (e.g., legend, axis text).
tidy broom Converts a model output into a tidy data frame.
vis_dat visdat Visualizes data types and missingness across a dataset.
vis_miss visdat Visualizes missing data patterns across a dataset.
which base Returns the indices of elements satisfying a condition (e.g., outliers).