| Function_name | Package_name | Use |
|---|---|---|
| abs | base | Returns the absolute value, used here to flag Z-scores beyond a threshold. |
| aes | ggplot2 | Defines aesthetic mappings (e.g., x, y, fill) for a ggplot. |
| as.factor | base | Converts a variable to a factor. `as.factor` is a wrapper for the `factor` function. |
| as_gt | gtsummary | Converts a `gtsummary` table object to a `gt` table for further styling. |
| class | base | Returns the class (type) of an object or variable. |
| clean_names | janitor | Cleans column names to make them syntactically valid and consistent. |
| coef | stats | Extracts model coefficients (e.g., from a fitted regression model). |
| colnames | base | Returns the column names (variable names) of a data frame. |
| cor | stats | Computes the correlation matrix between numerical variables. |
| CreateTableOne | tableone | Creates the classic 'Table 1' summarizing dataset characteristics, optionally by strata. |
| create_report | DataExplorer | Generates a full automated exploratory data analysis report as a document. |
| datasummary_balance | modelsummary | Creates a Table 1-style balance table comparing groups. |
| datasummary_correlation | modelsummary | Extracts and displays correlations between variables. |
| datasummary_crosstab | modelsummary | Generates a contingency (cross-tabulation) table. |
| datasummary_skim | modelsummary | Produces a quick skim-style overview of each variable in a dataset. |
| dim | base | Returns the dimensions of a data frame (rows x columns). |
| element_text | ggplot2 | Controls text appearance (e.g., axis text angle) in a ggplot theme. |
| factor | base | Creates a factor variable with specified levels and labels. |
| geom_bar | ggplot2 | Draws bar charts to visualize counts of categorical variables. |
| geom_boxplot | ggplot2 | Draws boxplots to summarize distributions and detect outliers. |
| geom_density | ggplot2 | Draws density plots, useful for comparing distributions across groups. |
| geom_histogram | ggplot2 | Draws histograms to visualize the distribution of continuous variables. |
| geom_point | ggplot2 | Draws scatterplots to examine relationships between two numerical variables. |
| geom_tile | ggplot2 | Draws tiles, used here to build a correlation heatmap. |
| gg_miss_upset | naniar | Plots an UpSet chart of missing data combinations. |
| gg_miss_var | naniar | Plots the number of missing values per variable. |
| ggarrange | ggpubr | Arranges multiple ggplot objects into a single figure. |
| ggpairs | GGally | Creates a matrix of pairwise plots for several variables. |
| ggplot | ggplot2 | Initializes a ggplot object for building plots layer by layer. |
| glance | broom | Returns a one-row summary of goodness-of-fit statistics for a model. |
| head | base | Displays the first six elements of an object (e.g., a dataset). |
| introduce | DataExplorer | Provides an overview of dataset dimensions, variable types, and missingness. |
| is.na | base | Checks for missing values in a variable. |
| kable | knitr | Renders a data frame as a formatted table. |
| lm | stats | Fits a linear regression model. |
| md | gt | Formats text as markdown (used for table notes). |
| mean | base | Computes the arithmetic mean, used here for mean imputation. |
| melt | reshape2 | Reshapes a matrix from wide to long format (e.g., a correlation matrix). |
| modelsummary | modelsummary | Displays one or more regression models in a formatted table. |
| na.omit | base/stats | Removes all rows with missing values from a dataset. |
| plot_bar | DataExplorer | Plots categorical variable distributions as bar charts. |
| plot_boxplot | DataExplorer | Draws boxplots of variables grouped by a treatment or outcome. |
| plot_correlation | DataExplorer | Plots a correlation matrix of variables. |
| plot_histogram | DataExplorer | Plots histograms of numerical variables. |
| plot_missing | DataExplorer | Plots the amount of missing data per variable. |
| plot_qq | DataExplorer | Draws quantile-quantile plots to assess normality. |
| readRDS | base | Reads a single R object stored in an RDS file. |
| require | base | Loads a package, returning a logical value rather than erroring if unavailable. |
| round | base | Rounds numeric values to a specified number of digits. |
| sapply | base | Applies a function over elements (e.g., columns) of an object. |
| scale | base | Standardizes a numeric variable (computes Z-scores). |
| select | dplyr | Selects specified variables from a dataset. |
| skim | skimr | Provides a compact, type-aware summary of a dataset. |
| stargazer | stargazer | Outputs regression results in text, LaTeX, or HTML formats. |
| sum | base | Returns the sum of values; with `is.na` counts missing values. |
| summary | base | Provides a summary of an object, like variable statistics. |
| tab_source_note | gt | Adds a source note (caption) to a `gt` table. |
| table1 | table1 | Generates descriptive summary tables common in medical research. |
| tabyl | janitor | Generates cross-tabulations from a data frame. |
| tbl_summary | gtsummary | Creates customizable summary tables, including for survey data. |
| texreg | texreg | Exports regression tables to LaTeX, HTML, or Word. |
| theme | ggplot2 | Customizes non-data elements of a ggplot (e.g., legend, axis text). |
| tidy | broom | Converts a model output into a tidy data frame. |
| vis_dat | visdat | Visualizes data types and missingness across a dataset. |
| vis_miss | visdat | Visualizes missing data patterns across a dataset. |
| which | base | Returns the indices of elements satisfying a condition (e.g., outliers). |
R Functions (E)
This review/summary page provides an extensive list of R functions tailored for exploratory data analysis tasks that we have used in this chapter. Each function is systematically described, highlighting its primary package source and its specific utility.
To learn more about these functions, readers can:
Use R’s Built-in Help System: For each function, access its documentation by prefixing the function name with a question mark in the R console, e.g.,
?summary. This displays the function’s manual page with descriptions, usage, and examples.Search Websites: Simply Google, or visit the CRAN website to search for specific function documentation. Websites like Stack Overflow and RStudio Community often have discussions related to R functions.
Tutorials and Online Courses: Platforms like DataCamp, Coursera, and edX offer R courses that cover many functions in depth. Also there are examples of dedicated R tutorial websites that you might find useful. One example is “Introduction to R for health data analysis” by Ehsan Karim, An Hoang and Qu.
Books: There are numerous R programming books, such as “R for Data Science” by Hadley Wickham and “The Art of R Programming” by Norman Matloff.
Workshops and Webinars: Institutions and organizations occasionally offer R programming workshops or webinars.
Whenever in doubt, exploring existing resources can be highly beneficial.