49 Network Visualization in R
Yunze Pan
49.1 Introduction
visNetwork is a powerful tool in R to help us describe networks and explore the structure visually. It is extremely useful for us to obtain valuable information from an interactive network graph. In this tutorial, we will offer a quick introduction for newcomers to learn concepts of creating networks in R. Hope you will enjoy!
49.2 Installation
The main packages we are going to use for network visualization in R are visNetwork and igraph. They can be installed with install.packages(“visNetwork”) and install.packages(“igraph”).
49.3 Dataframe
In this section we will create a small network that simulates student interactions on campus. Our objective is to get you familiar with using visNetwork as quickly as as possible. In order to visualize interactive networks, we will first read two datasets (a nodes data.frame and an edges data.frame). Then, we can explore the various layout options by adding different variables on our nodes data.frame and edges data.frame.
49.3.1 Nodes
A nodes data.frame must include a id column. Each id represents the node we want to display in our graph. Other optional columns can also be added into our nodes data.frame. They can help us to distinguish nodes in our graph. For example, each node is a student with a unique assigned id, his/her name, major, and major.type.
nodes <- data.frame(id=1:7, # id column (must be called id)
                    name=c("Asher","Bella","Chloe","Daniel","Emma","Frank","Gabriel"), # student names
                    major=c("CS","CS","CS","STAT","DS","DS","DS"), # CS: computer science major, STAT: statistics major, DS: data science major
                    major.type=c(1,1,1,2,3,3,3)) # 1: CS, 2: STAT, 3: DS
data.frame(nodes)##   id    name major major.type
## 1  1   Asher    CS          1
## 2  2   Bella    CS          1
## 3  3   Chloe    CS          1
## 4  4  Daniel  STAT          2
## 5  5    Emma    DS          3
## 6  6   Frank    DS          3
## 7  7 Gabriel    DS          3
49.3.2 Edges
An edges data.frame must include a from column and a to column denoting the starting node and ending node of each edge. We use id to represent the starting node and ending node. We also add a weight column on our edges data.frame to describe the frequency of interactions between two nodes. For example, in the first row, we know student 1 reached out to student 2 once.
edges <- data.frame(from=c(1,1,2,3,5,5,6,7),
                    to=c(2,4,3,1,4,6,7,5),
                    weight=c(1,1,1,1,1,1,1,1))
data.frame(edges)##   from to weight
## 1    1  2      1
## 2    1  4      1
## 3    2  3      1
## 4    3  1      1
## 5    5  4      1
## 6    5  6      1
## 7    6  7      1
## 8    7  5      1
49.4 Visualiztion
Now we can visualize our student interaction network using visNetwork. Examples are showed as below. We will start from the default setting and then move on to customize our network for a better interactive visualization.
49.4.1 Minimal Example
visNetwork(nodes, edges)49.4.2 Customize Node
colors <- colorRampPalette(brewer.pal(3, "RdBu"))(3) # use three colors to distinguish students by their majors
nodes <- nodes %>% mutate(shape="dot", # "shape" variable: customize shape of nodes ("dot", "square", "triangle")
                          shadow=TRUE, # "shadow" variable: include/exclude shadow of nodes
                          title=major, # "title" variable: tooltip (html or character), when the mouse is above
                          label=name, # "label" variable: add labels on nodes
                          size=20, # "size" variable: set size of nodes
                          borderWidth=1, # "borderWidth" variable: set border width of nodes
                          color.background=colors[major.type], # "color.background" variable: set color of nodes
                          color.border="grey", # "color.border" variable: set frame color
                          color.highlight.background="yellow", # "color.highlight.background" variable: set color of the selected node
                          color.highlight.border="black") # "color.highlight.border" variable: set frame color of the selected node
visNetwork(nodes, edges, width="100%", main="Student Interaction Network") %>% # "main" variable: add a title
  visLayout(randomSeed=4) # give a random seed manually so that the layout will be the same every time49.4.3 Customize Edge
edges <- edges %>% mutate(width=weight*3, # "width" variable: set width of each edge
                          color="lightgrey", # "color" variable: set color of edges
                          arrows="to", # "arrows" variable: set arrow for each edge ("to", "middle", "from ")
                          smooth=TRUE) # "smooth" variable: each edge to be curved or not
visNetwork(nodes, edges, width="100%", main="Student Interaction Network") %>% 
  visLayout(randomSeed=4)49.4.4 Add Legend Based on Groups
nodes <- nodes %>% mutate(group=major) # add a "group" column on node data.frame and add groups on nodes
visNetwork(nodes, edges, width="100%", main="Student Interaction Network") %>%
  visLayout(randomSeed=4) %>% 
  visGroups(groupname="CS", color=colors[1]) %>% # color "colors[1]" for "CS" group 
  visGroups(groupname="STAT", color=colors[2]) %>%
  visGroups(groupname="DS", color=colors[3]) %>%
  visLegend(width=0.1, position="right", main="Academic Major") # "position" variable: set position ("left", "right") 49.4.5 Select by Node
nodes <- nodes %>% select(-group) # remove "group" column because we don't want to show legend this time
visNetwork(nodes, edges, width="100%", main="Student Interaction Network") %>%
  visLayout(randomSeed=4) %>% 
  visOptions(nodesIdSelection=TRUE, # "nodesIdSelection" variable: select a node by id
             selectedBy="major") %>% # "selectedBy" variable: select a node by the values of a column such as "major" column
  visLegend()49.4.6 Highlight Nearest Nodes
visNetwork(nodes, edges, width="100%", main="Student Interaction Network") %>% 
  visLayout(randomSeed=4) %>% 
  visOptions(highlightNearest = list(enabled = TRUE, # "enabled" variable: highlight nearest nodes and edges by clicking on a node
                                     degree = 2)) # "degree" variable: set degree of depth49.4.7 Edit Network
visNetwork(nodes, edges, width="100%", main="Student Interaction Network") %>%
  visLayout(randomSeed=4) %>% 
  visOptions(highlightNearest=TRUE, # degree of depth = 1
             nodesIdSelection=TRUE,
             selectedBy="major",
             manipulation=TRUE) %>%  # "manipulation" variable: add/delete nodes/edges or change edges
  visLegend()49.5 Export
Finally, we use visSave() to save network in html file.
our_network <- visNetwork(nodes, edges)
visSave(our_network, file = "Student Interaction Network.html", background="white")49.6 Help?
More information about visNetwork.
?visNodes
?visEdges
?visOptions
?visGroups
?visLegend
?visLayout49.8 External Resource
- 
visNetwork package;
 - star-wars-network.
 
49.7 Social Network Analysis
We have already learned how to visualize the interactive network. To help you better understand its application, we will use
visNetworkandigraphto perform our social network analysis.49.7.1 Dataset
We will investigate interactions in the movie Star Wars Episode IV. First, we import two csv files (“nodes.csv” and “edges.csv”). Each node in “nodes.csv” is a character and each edge in “edges.csv” tells whether two characters appeared together in a scene of the movie. Thus, edges are undirected. Since characters may appear in multiple scenes together, each edge has a
weight.We group our characters (“dark side” or “light side” or “other”).
Let’s try another network package called
igraphto explore the network.First, we use the
graph_from_data_framefunction, which needs two arguments:dandvertices. Theigraphobjectgindicates that there are 22 nodes and 66 edges.Next, we output a portion of the adjacency matrix for our network.
49.7.2 Visualization
Alternatively, we can show a heat map of our adjacency matrix. The number in each square equals to the weight of one edge. We observe LUKE is a very popular character.
We also compute characters’ importance using
strength()function based on the number of scenes they appear in and rank the importance in a descending order. The goal ofstrength()function is to sum up the edge weights of the adjacent edges for each node.Again, we use
visNetworkto visualize.You may wonder how important a character is in our Star Wars network. Therefore, we want to utilize three proposed measures (degree centrality, betweenness centrality, and closeness centrality) to quantify each node’s importance in a network and visualize how its importance is different from others.
49.7.3 Centrality Measurement
Degree centrality is defined as the number of adjacent edges to each node. After ranking the degree centrality, we find LUKE has the greatest value. It implies that LUKE is interacting with a great amount of unique characters. We color each node based on its degree centrality value. The node with the greatest value has the warmest color.
Betweenness centrality is defined as the number of shortest paths between nodes that pass through a particular node. After ranking the betweenness centrality, we find LEIA has the greatest value. It implies that LEIA tends to be very critical to the communication process. We color each node based on its betweenness centrality value. The node with the greatest value has the warmest color.
Closeness centrality is defined as the number of steps required to access every other node from a given node. After ranking the closeness centrality, we find BIGGS has the greatest value. It implies that BIGGS is close to many other characters. We color each node based on its closeness centrality value. The node with the greatest value has the warmest color.
Lastly, we output our network and find discrepancies among three measurements.