49 Network Visualization in R
Yunze Pan
49.1 Introduction
visNetwork
is a powerful tool in R to help us describe networks and explore the structure visually. It is extremely useful for us to obtain valuable information from an interactive network graph. In this tutorial, we will offer a quick introduction for newcomers to learn concepts of creating networks in R. Hope you will enjoy!
49.2 Installation
The main packages we are going to use for network visualization in R are visNetwork
and igraph
. They can be installed with install.packages(“visNetwork”) and install.packages(“igraph”).
49.3 Dataframe
In this section we will create a small network that simulates student interactions on campus. Our objective is to get you familiar with using visNetwork
as quickly as as possible. In order to visualize interactive networks, we will first read two datasets (a nodes data.frame and an edges data.frame). Then, we can explore the various layout options by adding different variables on our nodes data.frame and edges data.frame.
49.3.1 Nodes
A nodes data.frame must include a id
column. Each id
represents the node we want to display in our graph. Other optional columns can also be added into our nodes data.frame. They can help us to distinguish nodes in our graph. For example, each node is a student with a unique assigned id
, his/her name
, major
, and major.type
.
nodes <- data.frame(id=1:7, # id column (must be called id)
name=c("Asher","Bella","Chloe","Daniel","Emma","Frank","Gabriel"), # student names
major=c("CS","CS","CS","STAT","DS","DS","DS"), # CS: computer science major, STAT: statistics major, DS: data science major
major.type=c(1,1,1,2,3,3,3)) # 1: CS, 2: STAT, 3: DS
data.frame(nodes)
## id name major major.type
## 1 1 Asher CS 1
## 2 2 Bella CS 1
## 3 3 Chloe CS 1
## 4 4 Daniel STAT 2
## 5 5 Emma DS 3
## 6 6 Frank DS 3
## 7 7 Gabriel DS 3
49.3.2 Edges
An edges data.frame must include a from
column and a to
column denoting the starting node and ending node of each edge. We use id
to represent the starting node and ending node. We also add a weight
column on our edges data.frame to describe the frequency of interactions between two nodes. For example, in the first row, we know student 1
reached out to student 2
once.
edges <- data.frame(from=c(1,1,2,3,5,5,6,7),
to=c(2,4,3,1,4,6,7,5),
weight=c(1,1,1,1,1,1,1,1))
data.frame(edges)
## from to weight
## 1 1 2 1
## 2 1 4 1
## 3 2 3 1
## 4 3 1 1
## 5 5 4 1
## 6 5 6 1
## 7 6 7 1
## 8 7 5 1
49.4 Visualiztion
Now we can visualize our student interaction network using visNetwork
. Examples are showed as below. We will start from the default setting and then move on to customize our network for a better interactive visualization.
49.4.1 Minimal Example
visNetwork(nodes, edges)
49.4.2 Customize Node
colors <- colorRampPalette(brewer.pal(3, "RdBu"))(3) # use three colors to distinguish students by their majors
nodes <- nodes %>% mutate(shape="dot", # "shape" variable: customize shape of nodes ("dot", "square", "triangle")
shadow=TRUE, # "shadow" variable: include/exclude shadow of nodes
title=major, # "title" variable: tooltip (html or character), when the mouse is above
label=name, # "label" variable: add labels on nodes
size=20, # "size" variable: set size of nodes
borderWidth=1, # "borderWidth" variable: set border width of nodes
color.background=colors[major.type], # "color.background" variable: set color of nodes
color.border="grey", # "color.border" variable: set frame color
color.highlight.background="yellow", # "color.highlight.background" variable: set color of the selected node
color.highlight.border="black") # "color.highlight.border" variable: set frame color of the selected node
visNetwork(nodes, edges, width="100%", main="Student Interaction Network") %>% # "main" variable: add a title
visLayout(randomSeed=4) # give a random seed manually so that the layout will be the same every time
49.4.3 Customize Edge
edges <- edges %>% mutate(width=weight*3, # "width" variable: set width of each edge
color="lightgrey", # "color" variable: set color of edges
arrows="to", # "arrows" variable: set arrow for each edge ("to", "middle", "from ")
smooth=TRUE) # "smooth" variable: each edge to be curved or not
visNetwork(nodes, edges, width="100%", main="Student Interaction Network") %>%
visLayout(randomSeed=4)
49.4.4 Add Legend Based on Groups
nodes <- nodes %>% mutate(group=major) # add a "group" column on node data.frame and add groups on nodes
visNetwork(nodes, edges, width="100%", main="Student Interaction Network") %>%
visLayout(randomSeed=4) %>%
visGroups(groupname="CS", color=colors[1]) %>% # color "colors[1]" for "CS" group
visGroups(groupname="STAT", color=colors[2]) %>%
visGroups(groupname="DS", color=colors[3]) %>%
visLegend(width=0.1, position="right", main="Academic Major") # "position" variable: set position ("left", "right")
49.4.5 Select by Node
nodes <- nodes %>% select(-group) # remove "group" column because we don't want to show legend this time
visNetwork(nodes, edges, width="100%", main="Student Interaction Network") %>%
visLayout(randomSeed=4) %>%
visOptions(nodesIdSelection=TRUE, # "nodesIdSelection" variable: select a node by id
selectedBy="major") %>% # "selectedBy" variable: select a node by the values of a column such as "major" column
visLegend()
49.4.6 Highlight Nearest Nodes
visNetwork(nodes, edges, width="100%", main="Student Interaction Network") %>%
visLayout(randomSeed=4) %>%
visOptions(highlightNearest = list(enabled = TRUE, # "enabled" variable: highlight nearest nodes and edges by clicking on a node
degree = 2)) # "degree" variable: set degree of depth
49.4.7 Edit Network
visNetwork(nodes, edges, width="100%", main="Student Interaction Network") %>%
visLayout(randomSeed=4) %>%
visOptions(highlightNearest=TRUE, # degree of depth = 1
nodesIdSelection=TRUE,
selectedBy="major",
manipulation=TRUE) %>% # "manipulation" variable: add/delete nodes/edges or change edges
visLegend()
49.5 Export
Finally, we use visSave()
to save network in html file.
our_network <- visNetwork(nodes, edges)
visSave(our_network, file = "Student Interaction Network.html", background="white")
49.6 Help?
More information about visNetwork
.
?visNodes
?visEdges
?visOptions
?visGroups
?visLegend
?visLayout
49.8 External Resource
-
visNetwork package;
- star-wars-network.
49.7 Social Network Analysis
We have already learned how to visualize the interactive network. To help you better understand its application, we will use
visNetwork
andigraph
to perform our social network analysis.49.7.1 Dataset
We will investigate interactions in the movie Star Wars Episode IV. First, we import two csv files (“nodes.csv” and “edges.csv”). Each node in “nodes.csv” is a character and each edge in “edges.csv” tells whether two characters appeared together in a scene of the movie. Thus, edges are undirected. Since characters may appear in multiple scenes together, each edge has a
weight
.We group our characters (“dark side” or “light side” or “other”).
Let’s try another network package called
igraph
to explore the network.First, we use the
graph_from_data_frame
function, which needs two arguments:d
andvertices
. Theigraph
objectg
indicates that there are 22 nodes and 66 edges.Next, we output a portion of the adjacency matrix for our network.
49.7.2 Visualization
Alternatively, we can show a heat map of our adjacency matrix. The number in each square equals to the weight of one edge. We observe LUKE is a very popular character.
We also compute characters’ importance using
strength()
function based on the number of scenes they appear in and rank the importance in a descending order. The goal ofstrength()
function is to sum up the edge weights of the adjacent edges for each node.Again, we use
visNetwork
to visualize.You may wonder how important a character is in our Star Wars network. Therefore, we want to utilize three proposed measures (degree centrality, betweenness centrality, and closeness centrality) to quantify each node’s importance in a network and visualize how its importance is different from others.
49.7.3 Centrality Measurement
Degree centrality is defined as the number of adjacent edges to each node. After ranking the degree centrality, we find LUKE has the greatest value. It implies that LUKE is interacting with a great amount of unique characters. We color each node based on its degree centrality value. The node with the greatest value has the warmest color.
Betweenness centrality is defined as the number of shortest paths between nodes that pass through a particular node. After ranking the betweenness centrality, we find LEIA has the greatest value. It implies that LEIA tends to be very critical to the communication process. We color each node based on its betweenness centrality value. The node with the greatest value has the warmest color.
Closeness centrality is defined as the number of steps required to access every other node from a given node. After ranking the closeness centrality, we find BIGGS has the greatest value. It implies that BIGGS is close to many other characters. We color each node based on its closeness centrality value. The node with the greatest value has the warmest color.
Lastly, we output our network and find discrepancies among three measurements.