30 Interactive plot introduction

Yujie Zhou

Interactive plots are used widely in today’s data analysis.This tutorial will introduce four commonly used interactive plots: interactive bubble plot, interactive area plot, interactive heatmap, and interactive network. Unlike static plot, interactive plot can enable users to zoom in and out and give user a better use experience and simplify analysis process.

30.1 Interactive Bubble Plot:

A bubble plot is where a third dimension is added on a scatterplot. The size of each bubble represent the additional numeric variable.

We can draw the plot directly using “plotly” just right after we draw a bubble plot using “ggplot”. The following is an example to see how the number and lifespan of people in different nations are associated.

knitr::opts_chunk$set(warning = F, message = F)

head(gapminder)
## # A tibble: 6 × 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.
p <- gapminder %>%
  filter(year==1967) %>%
  ggplot( aes(x=gdpPercap, y=lifeExp, size=pop,color=continent)) +
  geom_point() +
  scale_x_log10() +
  theme_bw()

ggplotly(p)

While the interactive bubble plot show a positive relationship between GPD and human’s lifespan, it also gives an additional information on the number of people in each dot. However, the drawback of bubble plots will be it is hard to judge how the relationship between x and y variables and the their variables (polpulation size for this example).

30.2 Interactive Area Plot:

Area plot do not differ a lot from regular line graphs, but with two exceptions: 1. The area between the x-axis and each individual line is filled with some color 2. The x-axis must have zero value

Area plots use different colors to give users a good sense on how quantities have changed over time a period of time. However, users should carefully use area plots, because the overlapping areas of plots will hide some information and the area under each line is hard to percisely calculated by user just by looking at the graph.

knitr::opts_chunk$set(warning = F, message = F)
df <- read.csv("resources/interactive_plot_plotly_cheatsheet/StudentsPerformancePart.csv")
head(df)
##   gender race.ethnicity parental.level.of.education        lunch
## 1 female        group B           bachelor's degree     standard
## 2 female        group C                some college     standard
## 3 female        group B             master's degree     standard
## 4   male        group A          associate's degree free/reduced
## 5   male        group C                some college     standard
## 6 female        group B          associate's degree     standard
##   test.preparation.course math.score reading.score writing.score
## 1                    none         72            72            74
## 2               completed         69            90            88
## 3                    none         90            95            93
## 4                    none         47            57            44
## 5                    none         76            78            75
## 6                    none         71            83            78
df %>%
  filter(gender == "female",race.ethnicity=="group B")%>%
  pivot_longer(
    cols = c('math.score','reading.score','writing.score'),
    names_to = "Subject",
    values_to = "score"
  ) %>%
  group_by(Subject, parental.level.of.education) %>%
  summarise(score = mean(score)) %>%
  mutate(parental.level.of.education = factor(parental.level.of.education,levels=c("associate's degree", "bachelor's degree", "high school", "master's degree", "some college", "some high school")))%>%
  mutate(parental.level.of.education = as.numeric(parental.level.of.education)) %>%
  ggplot(aes(x= parental.level.of.education ,y=score,fill=Subject))+
  geom_area(alpha = 0.7) +
  scale_fill_manual(values = c("#F6D7A7", "#C8E3D4", "#87AAAA")) +
  scale_x_continuous(breaks =1:6,labels = c("associate's degree", "bachelor's degree", "high school", "master's degree", "some college", "some high school"))+
  theme_minimal() -> p2

p2

We can easily see how the student mean performance for each subject associated with their parents’ educational background just by glancing at the area under each separate line.

30.3 Interactive Heatmap:

Heatmap is a very useful visualization tool on two-dimensional data to reveal patterns and correlations between by rows and columns. To be more easy-to-use, we will introduce interactive heatmap. As we already familiar with heatmap R package to draw a heatmap, we will use an useful R package “heatmaply” to build interactive cluster heatmap.

knitr::opts_chunk$set(warning = F, message = F)
data("mtcars")
heatmaply(mtcars, scale="column", Colv = NULL,col =  topo.colors(10),xlab="design", ylab="car type", main=" interactive heatmap")

mtcars is a collection fuel consumption corresponding to 10 automobile designs and 32 automobiles. It is noteworthy we should first do some data transformations, such as normalizing or percentizing to make variables more comparable. Normalizing the matrix is done using the scale argument. It can be applied to row or to column. Here the column option is chosen.

Passing NULL to Colv is because heatmap tends to reorder column by a clustering algorithm. Removing column dendrogram can enable users to compare between raw data.

It is also to good use terrain.color(), rainbow(), heat.colors(), topo.colors() or cm.colors() interchangeably by selecting different color palette for the heatmap.

Instead of vertically naming the x-axis as heatmap does, interactive heatmap will automatically rotate the name for x-axis values as names are too long to fit.

A more visual and user friendly tool,shinyHeatmaply, to create an interactive heatmap is invented byJonathan Sidi. To apply this tool, we can install ‘shinyHeatmaply’ R package, or, alternatively, run it through GitHub by entering “devtools::install_github(‘yonicd/shinyHeatmaply’)”. The output of heatmap from shinyHeatmaply will provide very detail parameter summaries.

30.4 Interactive Network

Network is consist of mainly two parts: nodes and edge. This graph reflect interrelationships between each note (i.e. entity). The advantage of using network is too include all important entities and analyze as a whole instead of see each entity separately. Two addtional packages we need to import for interactive network are igraph and networkD3. Let’s first see how igraph is used to plot a static network for a square matrix we generated ramdonly.

knitr::opts_chunk$set(warning = F, message = F)

set.seed(12345)
randomdf <- matrix(sample(0:3, 16, replace=TRUE), nrow=4)

output <- graph_from_adjacency_matrix(randomdf)
plot(output)

The arrow indicates the direction of relationship between two notes. For example, there is one relationship from 4 to 2, but there is no relationship from 2 to 4.

Now, let’s step to interactive networks. There is a very simple function: simpleNetwork which can generate interactive network in a handy way.

knitr::opts_chunk$set(warning = F, message = F)

data <- data.frame(
  from=c("A", "A", "B", "D", "C", "D", "E", "B", "C", "D", "K", "A", "M"),
  to=c("B", "E", "F", "A", "C", "A", "B", "Z", "A", "C", "A", "B", "K") #reference:https://www.r-graph-gallery.com/network-interactive.html
)
p <- simpleNetwork(data,height="50px", width="50px",        
                   Source = 1,
                   Target = 2,
                   fontSize = 25,                    
                   linkColour = "#123",   
                   nodeColour = "#F47E5E",    
                   opacity = 0.9,             
                   zoom = T)
p

Interactive network above we can rotate, zoom in, and zoom out the network and see it as a 3D layout. Compared to a static network, interactive network has a much better looking because when data gets bigger, it will avoid overlapping of links.