127 Visualization in R v.s. Python
Zining Chen
For data scientists, R studio and Python might be two tools that are most familiar to. However, for people that are previsouly really procifient in Python (like me), it can be a little tough and unfamiliar with the grammar and functioning in R. Since R and Python are holding completely different packages for plotting, I will introduce a collection of the comman usage of the different code and syntax used for data visualization in R and Python. It also works as a cheatsheet for those that come from Python get started in R faster.
127.1 Basic setup
The dataset is retrived from Kaggle. https://www.kaggle.com/anandhuh/covid-in-african-countries-latest-data
Generally, in R studio, the packages used for visualzation is using ggplot. And packages in R using library()
. For example:
library(tidyverse)
library(dplyr)
library(vcd)
df_r <- read_csv("resources/r_vs_python/covid_africa.csv")
In python, plotting can be done using matplotlib. And importing packages be like:
data:image/s3,"s3://crabby-images/9fb4f/9fb4f94630c5102d1f940213e8cc76c9cd06c547" alt=""
127.2 Histogram
R:
#original histogram
hist(df_r$`Total Deaths`, xlab = "cases", main = "Total deaths histogram")
data:image/s3,"s3://crabby-images/9f18d/9f18d8e570fd0e6ed64d69af4fa036fb47e4fafe" alt=""
#basic histogram
ggplot(df_r, aes(x = `Total Deaths`)) +
geom_histogram(color = "white", fill = "lightblue") +
ggtitle("Total deaths histogram") + labs(x = "deaths")
data:image/s3,"s3://crabby-images/8e01b/8e01bedb855d738bac709024a6b5a136af45edb3" alt=""
Python:
data:image/s3,"s3://crabby-images/d7717/d77175af4f0fddad487efb01af05b9ee5facfd22" alt=""
data:image/s3,"s3://crabby-images/6576a/6576a4803b2470a3c0b12e15d53a6259c6959b3b" alt=""
127.3 Barplot
R:
#basic barplot
barplot(`Total Cases` ~ Country, data = df_r, las=2, main = "Total cases barplot")
data:image/s3,"s3://crabby-images/6d305/6d305239198c67468a88972eb060dddf0769b501" alt=""
#ggplot barplot
ggplot(df_r, aes(x = Country, y = `Total Cases`)) +
geom_bar(stat='identity', fill = "cornflowerblue") +
ggtitle("Total cases barchart") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
data:image/s3,"s3://crabby-images/24a69/24a69bac63c48635b67daaceb62b11baececc795" alt=""
Python:
data:image/s3,"s3://crabby-images/7f6dc/7f6dcf9998a8038bd77c70dad5db0ecb39439b8f" alt=""
data:image/s3,"s3://crabby-images/e0c63/e0c6376aa0933f63c56b9533fbaf25f7d993b921" alt=""
127.4 Boxplot
R:
#basic boxplot
boxplot(df_r$Population, main = "Population boxplot", xlab = "population")
data:image/s3,"s3://crabby-images/5c5b4/5c5b4fcd033706ab18e28825ebcf0685c92f9b3d" alt=""
#ggplot boxplot
ggplot(df_r, aes(x = Population)) +
geom_boxplot() +
ggtitle("Population boxplot")
data:image/s3,"s3://crabby-images/ac692/ac692d409a0e554f3dbe753b043aa1f4f47e41ee" alt=""
Python:
data:image/s3,"s3://crabby-images/0a6c9/0a6c98805d1247828d1af3e1b3b0d8f7b98372ed" alt=""
127.5 Scatterplot
R:
# with a regression line
ggplot(na.omit(df_r), aes(x = `Total Tests`, y =`Population`)) +
geom_point() +
geom_smooth(method=lm, se=FALSE, color="blue")
data:image/s3,"s3://crabby-images/e5cf0/e5cf06e2f31d1d082fe42f11b3e5396fa8ee8876" alt=""
Python:
data:image/s3,"s3://crabby-images/b2ec7/b2ec7adafe6b9d270433c21da3269aa271003850" alt=""
127.6 Parallel Coordinates
R:
#choose the first 10 countries for better
GGally::ggparcoord(df_r[1:10,], columns = c(2:5), scale = "globalminmax", groupColumn = "Country") +
xlab("country") + ylab("count")
data:image/s3,"s3://crabby-images/fe1bf/fe1bf96adcfda5128c85acfa47c5d2f97d23d6ce" alt=""
Python:
data:image/s3,"s3://crabby-images/40c32/40c327387f3ba3109444e086053be3c9811a0384" alt=""
127.7 Heatmap
We use another dataset as an example. The dataset is retrived from Kaggle. https://www.kaggle.com/sonukumari47/students-performance-in-exams
R:
df_r2 <- read_csv("resources/r_vs_python/student_performance.csv")
df_r2 <- df_r2[,-1]
Python:
data:image/s3,"s3://crabby-images/0a171/0a17106f3fe9f586899f9402d3181a571f8ecadd" alt=""
R:
ggplot(df_r2, aes(x = `parental level of education`, y = `race/ethnicity`,fill = `math percentage`)) +
geom_tile() +
scale_fill_viridis_c(direction = -1) + ggtitle("Square heatmap") +
theme(axis.text.x = element_text(angle = 10))
data:image/s3,"s3://crabby-images/64b14/64b146de67b2bbc21e9e37515c44a020489b4aed" alt=""
Python:
data:image/s3,"s3://crabby-images/5164f/5164fd456f1eff929fd8dee939a866de7c6840c4" alt=""