Chapter 2 R packages (listed separately)

2.1 abd

The abd package contains data sets and sample code for The Analysis of Biological Data by Michael Whitlock and Dolph Schluter (2009; Roberts & Company Publishers).

2.2 ade4

Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) doi:10.18637/jss.v022.i04.

2.3 AER

Functions, data sets, examples, demos, and vignettes for the book Christian Kleiber and Achim Zeileis (2008), Applied Econometrics with R, Springer-Verlag, New York. ISBN 978-0-387-77316-2. (See the vignette “AER” for a package overview.)

2.4 agridat

Datasets from books, papers, and websites related to agriculture. Example graphics and analyses are included. Data come from small-plot trials, multi-environment trials, uniformity trials, yield monitors, and more.

2.5 alr4

Datasets to Accompany S. Weisberg (2014, ISBN: 978-1-118-38608-8), “Applied Linear Regression,” 4th edition. Many data files in this package are included in the alr3 package as well, so only one of them should be used.

2.6 boot

Functions and datasets for bootstrapping from the book “Bootstrap Methods and Their Application” by A. C. Davison and D. V. Hinkley (1997, CUP), originally written by Angelo Canty for S.

2.7 carData

Datasets to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage (2019).

2.8 datasets

Base R datasets

2.9 dslabs

Datasets and functions that can be used for data analysis practice, homework and projects in data science courses and workshops. 26 datasets are available for case studies in data visualization, statistical inference, modeling, linear regression, data wrangling and machine learning.

2.10 EnvStats

Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book “EnvStats: An R Package for Environmental Statistics” (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, https://www.springer.com/book/9781461484554).

2.11 faraway

Books are “Linear Models with R” published 1st Ed. August 2004, 2nd Ed. July 2014 by CRC press, ISBN 9781439887332, and “Extending the Linear Model with R” published by CRC press in 1st Ed. December 2005 and 2nd Ed. March 2016, ISBN 9781584884248 and “Practical Regression and ANOVA in R” contributed documentation on CRAN (now very dated).

2.12 fivethirtyeight

Datasets and code published by the data journalism website ‘FiveThirtyEight’ available at https://github.com/fivethirtyeight/data. Note that while we received guidance from editors at ‘FiveThirtyEight’, this package is not officially published by ‘FiveThirtyEight’.

2.13 FSAdata

The datasets to support the Fish Stock Assessment (‘FSA’) package.

2.14 ggplot2

A system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”. You provide the data, tell ‘ggplot2’ how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

2.15 likert

An approach to analyzing Likert response items, with an emphasis on visualizations. The stacked bar plot is the preferred method for presenting Likert results. Tabular results are also implemented along with density plots to assist researchers in determining whether Likert responses can be used quantitatively instead of qualitatively. See the likert(), summary.likert(), and plot.likert() functions to get started.

2.16 Lock5withR

Data sets and other utilities for ‘Statistics: Unlocking the Power of Data’ by Lock, Lock, Lock, Lock and Lock (ISBN : 978-0-470-60187-7, http://lock5stat.com/).

2.17 MASS

Functions and datasets to support Venables and Ripley, “Modern Applied Statistics with S” (4th edition, 2002).

2.18 mlbench

A collection of artificial and real-world machine learning benchmark problems, including, e.g., several data sets from the UCI repository.

2.19 openintro

Supplemental functions and data for ‘OpenIntro’ resources, which includes open-source textbooks and resources for introductory statistics (https://www.openintro.org/). The package contains data sets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.

2.20 Sleuth3

Data sets from Ramsey, F.L. and Schafer, D.W. (2013), “The Statistical Sleuth: A Course in Methods of Data Analysis (3rd ed)”, Cengage Learning.

2.21 tidyr

Tools to help to create tidy data, where each column is a variable, each row is an observation, and each cell contains a single value. ‘tidyr’ contains tools for changing the shape (pivoting) and hierarchy (nesting and ‘unnesting’) of a dataset, turning deeply nested lists into rectangular data frames (‘rectangling’), and extracting values out of string columns. It also includes tools for working with missing values (both implicit and explicit).

2.22 vcd

Visualization techniques, data sets, summary and inference procedures aimed particularly at categorical data. Special emphasis is given to highly extensible grid graphics. The package was package was originally inspired by the book “Visualizing Categorical Data” by Michael Friendly and is now the main support package for a new book, “Discrete Data Analysis with R” by Michael Friendly and David Meyer (2015).

2.23 vcdExtra

Provides additional data sets, methods and documentation to complement the ‘vcd’ package for Visualizing Categorical Data and the ‘gnm’ package for Generalized Nonlinear Models. In particular, ‘vcdExtra’ extends mosaic, assoc and sieve plots from ‘vcd’ to handle ‘glm()’ and ‘gnm()’ models and adds a 3D version in ‘mosaic3d’. Additionally, methods are provided for comparing and visualizing lists of ‘glm’ and ‘loglm’ objects. This package is now a support package for the book, “Discrete Data Analysis with R” by Michael Friendly and David Meyer.