32 Parallel coordinate plots cheatsheet
Kechengjie Zhu
32.1 Overview
A parallel coordinate plot maps each row in the data table as a line. Packages including GGally and parcoords help build & improve parallel coordinate plots in R.
32.3 Load Data
Using the mariokart data set for illustration.
df <- as.data.frame(openintro::mariokart)
32.4 Basics
ggparcoord(data = df,
column = c(2:7, 9, 11),
alphaLines = 0.5,) +
ggtitle("Relations across auction details")
data:image/s3,"s3://crabby-images/815f3/815f33b9cf65a6ece079767056ed8fd197771d3b" alt=""
32.4.1 Group by column
Pass to the groupColumn argument with a categorical variable representing groups.
ggparcoord(data = df,
column = c(2:3, 5:7, 9, 11),
alphaLines = 0.5,
groupColumn = "cond") +
ggtitle("Relations across auction details grouped")
data:image/s3,"s3://crabby-images/4a194/4a194605e7dfdeec2cd568d4ef2d4671f7a350a3" alt=""
32.4.2 Grouping Application: Highlight Certain Data Entries
Requires some manipulation on data frame.
modified <- df %>%
mutate(thresh = factor(ifelse(total_pr > 60, "Over 60", "Under 60"))) %>%
arrange(desc(thresh))
ggparcoord(data = modified,
column = c(2:3, 5:7, 9, 11),
alphaLines = 0.5,
groupColumn = "thresh") +
scale_color_manual(values = c("red", "grey")) +
ggtitle("Highlight sales with total price over $60")
data:image/s3,"s3://crabby-images/658cf/658cf22fe57102af0448cb6f802b6485c4f4a527" alt=""
32.4.3 Add data points
Toggle the logical argument showPoints to display/hide data points.
ggparcoord(data = df,
column = c(2:3, 5:7, 9, 11),
alphaLines = 0.5,
groupColumn = "cond",
showPoints = TRUE) +
ggtitle("Relations across auction details with points")
data:image/s3,"s3://crabby-images/2002f/2002f44389421941384bc56a5f0a9e40ec5601cf" alt=""
32.4.4 Spline interpolation
Smooth the lines with argument splineFactor. Value can be either logical or numeric.
ggparcoord(data = df,
column = c(2:3, 5:7, 9, 11),
alphaLines = 0.5,
groupColumn = "cond",
splineFactor = 7) +
ggtitle("Smoothed relations across auction details")
data:image/s3,"s3://crabby-images/abe11/abe11993f176ad4241f947b2f2bd1218ae8bcd34" alt=""
32.4.5 Add box plots
Add box plots with boxplot.
ggparcoord(data = df,
column = c(2:3, 5:7, 9, 11),
alphaLines = 0.2,
groupColumn = "cond",
boxplot = TRUE) +
ggtitle("Relations across auction details with box plots")
data:image/s3,"s3://crabby-images/ac52b/ac52b8bd9585a6d6ee2484744ddf0fc882aede88" alt=""
32.5 Scaling methods
Select scaling method with argument scale. Default method is “std”: subtract mean and divide by standard deviation.
32.5.1 “robust”
Subtract median and divide by median absolute deviation.
ggparcoord(data = df,
column = c(2:3, 5:7, 9, 11),
alphaLines = 0.5,
groupColumn = "cond",
scale = "robust")
data:image/s3,"s3://crabby-images/503b1/503b11a4e9caaa67b2e722d78784e32ca450a951" alt=""
32.5.2 “uniminmax”
Scale so the minimum of the variable is zero, and the maximum is one.
ggparcoord(data = df,
column = c(2:3, 5:7, 9, 11),
alphaLines = 0.5,
groupColumn = "cond",
scale = "uniminmax")
data:image/s3,"s3://crabby-images/4886b/4886b7b620f83863d8ba819f5d7de67c27298926" alt=""
32.5.3 “globalminmax”
No scaling: the range of the graphs is defined by the global minimum and the global maximum.
ggparcoord(data = df,
column = c(2:3, 5:7, 9, 11),
alphaLines = 0.5,
groupColumn = "cond",
scale = "globalminmax")
data:image/s3,"s3://crabby-images/ce733/ce733af94323274c2657a42a7ffe491e772faade" alt=""
32.5.4 “center”
Scale using method “uniminmax”, and then center each variable at the summary statistic specified by the scaleSummary argument.
ggparcoord(data = df,
column = c(2:3, 5:7, 9, 11),
alphaLines = 0.5,
groupColumn = "cond",
scale = "center",
scaleSummary = "mean")
data:image/s3,"s3://crabby-images/f65c0/f65c00f2e306f5d0a200ebca76cbbb311135d1e3" alt=""
32.5.5 “centerObs”
Scale using method “uniminmax”, and then center each variable at the row number specified by the centerObsID argument.
ggparcoord(data = df,
column = c(2:3, 5:7, 9, 11),
alphaLines = 0.5,
groupColumn = "cond",
scale = "centerObs",
centerObsID = 5)
data:image/s3,"s3://crabby-images/85858/85858b278d8f1e1b934ccaedf741db351c455ec3" alt=""
32.6 Ordering methods
32.6.1 “anyClass”
Calculate F-statistics for each class vs. the rest, order variables by their maximum F-statistics.
ggparcoord(data = df,
column = c(2:3, 5:7, 9, 11),
alphaLines = 0.5,
groupColumn = "cond",
order = "anyClass")
data:image/s3,"s3://crabby-images/30b8e/30b8efc5fd9d4f1f103e50cdd009e535b8156897" alt=""
32.6.2 “allClass”
Order variables by their overall F-statistic from an ANOVA with groupColumn as the explanatory variable.
ggparcoord(data = df,
column = c(2:3, 5:7, 9, 11),
alphaLines = 0.5,
groupColumn = "cond",
order = "allClass")
data:image/s3,"s3://crabby-images/71d96/71d9665e6c82eba682e3883266efe30b9aa69916" alt=""
32.6.3 “skewness”
Order variables by their skewness.
ggparcoord(data = df,
column = c(2:3, 5:7, 9, 11),
alphaLines = 0.5,
groupColumn = "cond",
order = "skewness")
data:image/s3,"s3://crabby-images/ad38b/ad38b5a43bee63dd46ede6dd1728d2f1bdf0affd" alt=""
32.7 Make Plots for Each Group with Facets
ggparcoord(data = df,
column = c(2:3, 5:7, 9, 11),
alphaLines = 0.5,
groupColumn = "cond") +
facet_wrap(~ cond) +
ggtitle("Relations across auction details")
data:image/s3,"s3://crabby-images/a08c7/a08c7031d6d37bdf40378e959d25deab6243b56b" alt=""