library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  2.0.1     ✔ dplyr   0.7.8
## ✔ tidyr   0.8.2     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.3.0
## ── Conflicts ─────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(ggplot2)
library(leaflet)

1. Introduction

In this blog post, we are going to compare graphing methods in R and Tableau. (Mapping, and bar chart). The data set used in this blog can be found at Kaggle (https://www.kaggle.com/ankkur13/boston-crime-data); Boston crime data (2016 - 2018).

The data consists 2,60,760 rows and 17 columns. Each row represents individual incident. Here are types of features in columns:

INCIDENT_NUMBER, OFFENSE_CODE, OFFENSE_CODE_GROUP, OFFENSE_DESCRIPTION, DISTRICT, REPORTING_AREA, SHOOTING, OCCURRED_ON_DATE, YEAR, MONTH, DAY_OF_WEEK, HOUR, UCR_PART, STREET, LATITUDE, LONGITUDE, LOCATION.

2. General comparison of R and Tableau

Generally, for beginners, Tableau is much easier to start with than R because Tableau has a much lower learning curve. Here is a learning curve comparison of different statistics softwares:

For R, users need to know the basics of data structure and how to code. For example, users need to know how to handle matrix, dataframe, list, etc to get even simple jobs done. However, in order to use Tableau, coding is not necessary. If they work around with some trials and errors, they will be able to understand how to use it. Basically, all the jobs in Tableau can be completed by dragging and clicking various features.

This is an initial page when Tableau is opened. You can simply drag variables into rows and columns box and Tableau will make the graph automatically.

Here is a initial page for R:

R has more options to customize because we are basically coding to make the plot. The color scheme, margin, plot size, and everything. On the other hand, Tableau gives less customization options, but it works really well in changing minor details because it is quick.

3. Comparisons in real examples

There are graphical examples from both tools, R and Tableau and we are going to answer some questions worth further considerations. We will start by comparing the mapping methods.

3.1 Mapping

First, here are two plots of map describing crime frequencies by Districts in Boston using R and Tableau.

3.1.1 Mapping in R

crime <- read_csv("crime.csv")
## Parsed with column specification:
## cols(
##   INCIDENT_NUMBER = col_character(),
##   OFFENSE_CODE = col_character(),
##   OFFENSE_CODE_GROUP = col_character(),
##   OFFENSE_DESCRIPTION = col_character(),
##   DISTRICT = col_character(),
##   REPORTING_AREA = col_double(),
##   SHOOTING = col_logical(),
##   OCCURRED_ON_DATE = col_datetime(format = ""),
##   YEAR = col_double(),
##   MONTH = col_double(),
##   DAY_OF_WEEK = col_character(),
##   HOUR = col_double(),
##   UCR_PART = col_character(),
##   STREET = col_character(),
##   Lat = col_double(),
##   Long = col_double(),
##   Location = col_character()
## )
## Warning: 1055 parsing failures.
##  row      col           expected actual        file
## 1053 SHOOTING 1/0/T/F/TRUE/FALSE      Y 'crime.csv'
## 1054 SHOOTING 1/0/T/F/TRUE/FALSE      Y 'crime.csv'
## 1075 SHOOTING 1/0/T/F/TRUE/FALSE      Y 'crime.csv'
## 1908 SHOOTING 1/0/T/F/TRUE/FALSE      Y 'crime.csv'
## 1909 SHOOTING 1/0/T/F/TRUE/FALSE      Y 'crime.csv'
## .... ........ .................. ...... ...........
## See problems(...) for more details.
crime <- crime %>%
  filter(Lat != "" & Long !="") %>%
  filter(Lat != -1 & Long != -1)
n <- length(levels(as.factor(crime$DISTRICT)))
par <- colorFactor(topo.colors(n), domain = crime$DISTRICT)

leaflet(crime) %>%
  addTiles() %>%
  addProviderTiles("CartoDB.Positron") %>%
  addCircleMarkers(~Long, ~Lat,
    radius = 1,
    fillColor = ~par(DISTRICT),
    stroke = FALSE, fillOpacity = 0.5
  )

As shown above, we can make an interactive map using the leaflet package. We colored each incidents by districts of Boston. From the R code, we can change color scheme by changing color pallete at ‘colorFactor’ line. Also, we can change size of circles by changing the radius in ‘addCircleMarkers’ line and change opacity of circles by changing value for ‘fillOpacity.’

3.1.2 Mapping in Tableau

As you can see from this picture above, in order to reproduce the same map, we need to drag ‘longitude’ into column box, ‘latitude’ into rows box, and then drag ‘district’ into the marks box, then change color for ‘district.’ If we add the ‘Highlight’ box, we can easily highlight only the district you are interested in and it would look like this below:

  • Comparisons in mapping
R Tableau
User specific Handle easily
Customizable Cell Easy, fast to learn

3.2 Stacked Bar Chart Examples

3.2.1 R

ggplot(data = crime, aes(x=HOUR, fill = OFFENSE_CODE_GROUP)) +
  geom_histogram(bins = 24, position="stack") +
  scale_fill_discrete(guide=FALSE) +
  labs(x = "Hours", y="Counts", title="Number of crimes by hour")

3.2.2 Tableau

This is a simple bar chart for the number of crimes committed by hour. In order to create this plot, we just need to drag ‘hour’ to column and sum of ‘number of records’ to rows. If we simply drag ‘offense code group’ into marks with colors, we get this graph:

So, the simple histogram changed into a stacked bar chart. Also, we can exclude particular offenses by right clicking the factor on the legend and click exclude.

  • Comparisons in bar charts
R Tableau
Customizable Easy to visualize, add/drop factors

4. Questions to consider

1. Do R and Tableau do the same thing or complement each other?

Both tools have been developed to accomplish similar tasks. However, there are some huge differences in terms of efficiencies and purposes. The biggest difference is that, in R, users are allowed to clean the data as the way they want but it is not possible in Tableau in which data cannot be modified. Both softwares can be used as users would like depending on their tastes. Therefore they could do the same taks, but they are not complement each other.

2. Was one designed to replace the other?

In general, R is broadly used in data processing and visualizing but Tableau is solely for visualizing the data. Therefore, users can choose to use one another depending on their preferences.

3. Is one better suited for certain types of tasks?

Tableau is very well suited for visualizing the data. It is very easy to use and produce graphical outcomes because the software was developed for that reason so coding is not necessary.

4. Do their features differ? If so, how?

Their features differ by a lot. In R, users need to code most of the specifics, including color, bin size, margins, etc. However, in Tableau, it is completly against coding. What it means is that if users want to edit them, they can simply click and drag as they wish manually. This is a double-edged sword because this means that R can automate the process in editing, but in Tableau, users might have to do it all manually.

5. How does the quality of documentation compare?

The quality of documentation really depends on the user’s coding abilities. It is needless to say that outcomes will be better as users are well trained in using any software. In general, however, Tableau provides more pleasant views and the design of the graphs even with an easy level of producing skills.

6. Are both in active development?

Yes, both are highly recommended softwares to analyze and present the data so they have been in active development. The difference is that there are thousands of packages in R and people are free to add their packages for particular purposes. However, Tableau is not an open source and you need to rely on the developer.

7. Do you experience any problems with either?

One of significant drawbacks using R is the process speed. It highly depends on the size of the data and the processing time gets longer when the data size exceeds some level. For Tableau, it is hard to edit the dataset.

8. Do the learning curves differ?

Yes. Users need to know the basics of data structure and how to code to basically be able to code in R. However, in order to use Tableau, users are not required to know how to code. With a few hours of playing around in Tableau, users will be able to understand how to use it. Basically, all of the tasks can be complete by dragging and clicking different features.