library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
## ✔ tibble 2.0.1 ✔ dplyr 0.7.8
## ✔ tidyr 0.8.2 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.3.0
## ── Conflicts ─────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(ggplot2)
library(leaflet)
1. Introduction
In this blog post, we are going to compare graphing methods in R and Tableau. (Mapping, and bar chart). The data set used in this blog can be found at Kaggle (https://www.kaggle.com/ankkur13/boston-crime-data); Boston crime data (2016 - 2018).
The data consists 2,60,760 rows and 17 columns. Each row represents individual incident. Here are types of features in columns:
INCIDENT_NUMBER, OFFENSE_CODE, OFFENSE_CODE_GROUP, OFFENSE_DESCRIPTION, DISTRICT, REPORTING_AREA, SHOOTING, OCCURRED_ON_DATE, YEAR, MONTH, DAY_OF_WEEK, HOUR, UCR_PART, STREET, LATITUDE, LONGITUDE, LOCATION.
2. General comparison of R and Tableau
Generally, for beginners, Tableau is much easier to start with than R because Tableau has a much lower learning curve. Here is a learning curve comparison of different statistics softwares:
For R, users need to know the basics of data structure and how to code. For example, users need to know how to handle matrix, dataframe, list, etc to get even simple jobs done. However, in order to use Tableau, coding is not necessary. If they work around with some trials and errors, they will be able to understand how to use it. Basically, all the jobs in Tableau can be completed by dragging and clicking various features.
This is an initial page when Tableau is opened. You can simply drag variables into rows and columns box and Tableau will make the graph automatically.
Here is a initial page for R:
R has more options to customize because we are basically coding to make the plot. The color scheme, margin, plot size, and everything. On the other hand, Tableau gives less customization options, but it works really well in changing minor details because it is quick.
3. Comparisons in real examples
There are graphical examples from both tools, R and Tableau and we are going to answer some questions worth further considerations. We will start by comparing the mapping methods.
First, here are two plots of map describing crime frequencies by Districts in Boston using R and Tableau.
crime <- read_csv("crime.csv")
## Parsed with column specification:
## cols(
## INCIDENT_NUMBER = col_character(),
## OFFENSE_CODE = col_character(),
## OFFENSE_CODE_GROUP = col_character(),
## OFFENSE_DESCRIPTION = col_character(),
## DISTRICT = col_character(),
## REPORTING_AREA = col_double(),
## SHOOTING = col_logical(),
## OCCURRED_ON_DATE = col_datetime(format = ""),
## YEAR = col_double(),
## MONTH = col_double(),
## DAY_OF_WEEK = col_character(),
## HOUR = col_double(),
## UCR_PART = col_character(),
## STREET = col_character(),
## Lat = col_double(),
## Long = col_double(),
## Location = col_character()
## )
## Warning: 1055 parsing failures.
## row col expected actual file
## 1053 SHOOTING 1/0/T/F/TRUE/FALSE Y 'crime.csv'
## 1054 SHOOTING 1/0/T/F/TRUE/FALSE Y 'crime.csv'
## 1075 SHOOTING 1/0/T/F/TRUE/FALSE Y 'crime.csv'
## 1908 SHOOTING 1/0/T/F/TRUE/FALSE Y 'crime.csv'
## 1909 SHOOTING 1/0/T/F/TRUE/FALSE Y 'crime.csv'
## .... ........ .................. ...... ...........
## See problems(...) for more details.
crime <- crime %>%
filter(Lat != "" & Long !="") %>%
filter(Lat != -1 & Long != -1)
n <- length(levels(as.factor(crime$DISTRICT)))
par <- colorFactor(topo.colors(n), domain = crime$DISTRICT)
leaflet(crime) %>%
addTiles() %>%
addProviderTiles("CartoDB.Positron") %>%
addCircleMarkers(~Long, ~Lat,
radius = 1,
fillColor = ~par(DISTRICT),
stroke = FALSE, fillOpacity = 0.5
)