114 ggplot2_treemapify
114.1 Download treemapify
In August 2017, R language has added new geometric objects supporting ggplot2 dendrograms. From then, there is no need to resort to the assistance of third-party packages. And it can be downloaded either using “install.package(”treemapify“)” or “devtools::install_github(”wilkox/treemapify“)”.
After downloading this package, we will have an additional functon in our ggplot2 which is named “geom_treemap()”.
## Load packages needed for this instruction
library("ggplot2")
library("treemapify")
library("RColorBrewer")
114.2 Basic Visualization of Treemap
Before we give example using treemapify package to draw treemap figure, we can first get to know this kind of figures:
In the tree diagram, the numerical variables are converted into rectangular area sizes, and the types of variables are distinguished by labels.
example <- data.frame(value<-c(100, 23, 1300, 1083, 260, 17, 30),
kind<-c('A','B','C','D','E','F','G'))
example$kind <- as.factor(example$kind)
ggplot(example, aes(area=value,fill=kind,label=kind)) +
geom_treemap()+
geom_treemap_text(place='center')
library(treemap)
group=c(rep("group-1", 3), rep("group-2",2), rep("group-3",4))
subgroup=paste("subgroup" , c(1,2,3,1,2,1,2,3,4), sep="-")
value=c(10, 6, 7, 9, 10, 7, 2, 2,20)
data=data.frame(group,subgroup,value)
# treemap
treemap(data,
index=c("group","subgroup"), #sub plots two groups
vSize="value", # Sizes are allocated according to the numeric variable
type="index") # Color according to the classification
114.3 Visualization using treemapify package
We will have an example using the data set G20 included in treemapfiy package. We can first have a preview of this data set:
str(G20)
## 'data.frame': 20 obs. of 6 variables:
## $ region : Factor w/ 8 levels "Africa","Asia",..: 1 6 6 6 8 8 2 2 2 2 ...
## $ country : Factor w/ 20 levels "Argentina","Australia",..: 16 20 4 13 3 1 5 12 17 9 ...
## $ gdp_mil_usd : int 384315 15684750 1819081 1177116 2395968 474954 8227037 5963969 1155872 1824832 ...
## $ hdi : num 0.629 0.937 0.911 0.775 0.73 0.811 0.699 0.912 0.909 0.554 ...
## $ econ_classification: Factor w/ 2 levels "Advanced","Developing": 2 1 1 2 2 2 2 1 1 2 ...
## $ hemisphere : Factor w/ 2 levels "Northern","Southern": 2 1 1 1 2 2 1 1 1 1 ...
head(G20)
## region country gdp_mil_usd hdi econ_classification hemisphere
## 1 Africa South Africa 384315 0.629 Developing Southern
## 2 North America United States 15684750 0.937 Advanced Northern
## 3 North America Canada 1819081 0.911 Advanced Northern
## 4 North America Mexico 1177116 0.775 Developing Northern
## 5 South America Brazil 2395968 0.730 Developing Southern
## 6 South America Argentina 474954 0.811 Developing Southern
This data set describes the economic indicators of the 20 countries participating in the summit. It contains five fields, namely the global region (region), country name (country), and GDP indicator (gdp_mil_usd) (should be some kind of secondary calculation Indicator), human development index (hdi), already economic development level (econ_classification).
The tree diagram is a special type of graph without an explicit coordinate system. Relying on the square algorithm, the sample population square is divided into a single rectangular box according to the actual observation value accounted for in the population. Therefore, it needs at least one numeric variable as an input parameter.
114.3.1 Plot a simple Treemap figure
ggplot(G20, aes(area = gdp_mil_usd)) +
geom_treemap()
Because area only defines the square size of a numeric variable, the fill color can be defined separately. But color can often be used alone as a way of expressing a numerical metric.
## Draw the plot with simple blue color
ggplot(G20, aes(area = gdp_mil_usd)) +
geom_treemap(fill="light blue")
## Draw the plot using scale_fill_distiller with palette
ggplot(G20, aes(area = gdp_mil_usd, fill = hdi)) +
geom_treemap()+
scale_fill_distiller(palette="Blues")
114.3.2 Add labels onto the treemap
The author of the package wrote an optimized text label function geom_treemap_text for the ggplot treemap (Because treemap exceeds the category of the three traditional coordinate systems. There is no explicit coordinate system. The algorithm is relatively special and one cannot use Geom_text() to add tags).
The place parameter controls the position of the label in each box relative to the surroundings, and grow (which is set default to be True) controls whether the label is adaptive to the box size (largely scaled up and down)
ggplot(G20, aes(area = gdp_mil_usd, fill = hdi, label = country)) +
geom_treemap() +
geom_treemap_text(colour = "Blue",
grow = TRUE, # default value
place = "center") +
scale_fill_distiller(palette="Blues")
114.3.3 Subgroups
This package supports sub-groups, which is extensively applied in practical application scenarios. For example, while observing the size of national indicators, we also want to obtain the overall indicators of the region to which the country belongs. By adding sub-groups, We can obtain information in two dimensions. By setting the subgroup parameter (a categorical variable) in the aesthetic mapping, the function can automatically complete the variable aggregation calculation of the subgroup within the function, and use a frame to show the size of the subcategory in the graph formation.
The reflow parameter is used to control whether the label adapts to the size of the rectangular block. If the original size exceeds the rectangular block, it will be automatically displayed in a new line.
ggplot(G20, aes(area = gdp_mil_usd,
fill = hdi,
label = country,
subgroup = region)) +
geom_treemap() +
geom_treemap_subgroup_border() +
geom_treemap_subgroup_text(place = "center",
alpha = 0.3,
colour ="black",
min.size = 0) +
geom_treemap_text(colour = "Blue",
place = "topleft",
reflow = TRUE)+
scale_fill_distiller(palette="Blues")
When you feel that using sub-group cannot achieve a good visualization, geom_treemap also supports the facet_grid function in ggplot2. This is the benefit of all ggplot2 extension functions and can inherit the advanced graphics properties derived from ggplot2.
ggplot(G20, aes(area = gdp_mil_usd,
fill = region,
label = country)) +
geom_treemap() +
geom_treemap_text(colour = "black") +
facet_wrap( ~ econ_classification) +
scale_fill_brewer(palette="Blues")+
labs(title = "The G-20 major economies by economic classification",
caption = "The area of each country is proportional to its relative GDP within the economic group (advanced or developing)",
fill = "Region" ) +
theme(legend.position = "bottom",
plot.caption=element_text(hjust=0))