Chapter 31 Intro to Tableau with R
Daniel Schmidle and Pranav Gopal
With an increasing number of job listings requesting experience with Tableau for Data Scientist, Analyst and ML Engineer positions, it is becoming increasingly likely that we might find ourselves needing this tool as part of our repertoire. Since 2003, Tableau has been helping people see and understand data to create actionable insights from their analysis. And without a doubt, Tableau has easy and extensive data visualization capabilities.
However, as we travel through the Tableau-verse in wonderment of all the beautiful graphs and charts, we still might feel as if something is missing: the computational, data manipulating, statistical power of R we have grown to love. Fortunately, through the Tableau Desktop and the Tableau Server we can still harness the power of R to create incredible data visualizations and analysis.
Thus, here is how to start utilizing R in Tableau:
Setup
First, you will need R installed on your device (https://www.r-project.org/ ) and Tableau Desktop through the Tableau homepage (free for students: https://www.tableau.com/academic/students)
Tableau will now be able to find and connect to R through a socket server.
Now, on the Tableau Desktop go to Help → Settings and Performance → Manage Analytics Extension Connection..
Set the extension Rserve, and set the Server: Localhost and Port: 6311
You can check the connection by the Test Connection
And now you are all set up to use R in Tableau. So, let’s do one easy example to show how to use R functions in the Tableau Desktop.
Creating Your First R Function
To begin using the computational power of R we must first understand how Tableau implements R.
Here, the common Tableau built-in Sample - Superstore is used for the base dataset in the example.
Start, by creating a new calculated field (one easy way is to right click the Data panel on the left hand side and choose Create Calculated Field)
Now, here is where the R magic happens:
Inside the Calculated Field window, we can start by renaming our field, here we name it “First R Example (expenses)”
Next, inside the function field we need to declare to Tableau what data type we want the R function to return. The types are SCRIPT_BOOL , SCRIPT_INT , SCRIPT_REAL , SCRIPT_STR . The functions list window on the right will allow you to select the data types / Tableau functions and provide further details for each:
In the above example SCRIPT_REAL is chosen since we want floating point numbers to be returned.
Next, Inside the SCRIPT_REAL(“ #Your R function goes here”,…) with each argument that you want to pass in being represented by .arg1 , .arg2 ,… , argN for N arguments. In this example the variable x is calculating the difference of .arg1 and .arg2
Next, the final part of the function SCRIPT_REAL(“ Your R function goes here”, specified data arguments for (.arg1, .arg2, etc) go here ). In this example SUM(Sales) is being passed to .arg1 and SUM([Profit]) is being passed to .arg2 so that x = SUM(Sales) - SUM([Profit])
Attention! Tableau will only allow aggregate data to be passed to the SCRIPT function, and thus R function. Therefore, the arguments must be of the form SUM() or AVG() etc. and not in vector format. This limits the normal R functionality however, the limitations become more versatile as aggregate manipulation occurs in Tableau.
So, to finish off our first example we can now use the R output to visualize Sales on the First R Example (calculated expenses) by dragging the calculated expenses to the columns section, sales to the Rows, and Sub-Category to the Color sections:
Example 2 (getting some more use out of R)
To get a little deeper with an R function let us create a linear model for predictions. Still using the Sample - Superstore data from the previous example, we create a new calculated field:
Here, we are using the lm() function from R to create a linear model of SUM([Profit]) on AVG(Quantity), AVG(Sales), and from our first example [First R Example (calculated expenses)]. This linear model will estimate an expected profit for a category we choose. So, plotting Profit on Expected Profit with aggregated color by Sub-Category creates a nice visualization of the expected vs real outcomes from the model. To note here, a scatter plot was chosen to represent the model outcomes, but the possibilities of other plot types are just a click away in tableau.
From this example we can start to see the possibilities of R computation within Tableau.
Example 3 (The R Tableau Tango)
A great example of how easy Tableau is to create dynamic, visually stunning graphs is no better demonstrated than with geospatial data. Tableau can create amazing, interactive visualizations for any region with just a couple clicks of the mouse and the right data.
For this example we used a Covid-19 dataset taken from the JHU CSSE COVID-19 Data as well as The New York Times. https://data.world/covid-19-data-resource-hub/covid-19-case-counts
But, say we wanted to do a point estimate for the Death Rate parameter of Covid-19 by region. And let’s say we wanted to use bootstrapping from the daily reported data. This sort of computation is where R shines. So, since this type of simulation would not be possible from the Tableau Desktop, using R separately and feeding the result back to Tableau would create the perfect team for all your statistical needs. So, lets begin:
library(tidyverse)
library(dplyr)
covid <- read.csv("data/COVID-19 Activity.csv")
covid_clean <- select(covid, 1, 4, 10, 13)
covid_clean$Death_Rate <- covid_clean$PEOPLE_DEATH_COUNT/covid_clean$PEOPLE_POSITIVE_CASES_COUNT
covid_clean <- do.call(data.frame, lapply(covid_clean, function(x) replace(x, is.infinite(x), NA)))
covid_clean <- covid_clean %>% drop_na("Death_Rate")
countries <- unique(covid_clean$COUNTRY_SHORT_NAME)
Est_Death_Rate <- rep(NA,length(countries))
death_tbl <- data.frame(countries, Est_Death_Rate)
for (c in countries){
country <- covid_clean[covid_clean$COUNTRY_SHORT_NAME == c,]
c_death <- replicate(1000, mean(sample(country$Death_Rate, 100, replace = TRUE)))
death_tbl$Est_Death_Rate[death_tbl$countries == c] <- mean(c_death)
}
It is clear from the above examples that Tableau is extremely user-friendly in data visualization. It allows users to create beautiful and colorful plots with no coding knowledge at all. Through it’s drag and drop environment, users can easily change color palettes, filter plots by certain variables, and resize plots instead of having to write time-consuming R code to achieve the same result. However, Tableau’s limited ability for data manipulation makes it much better suited for simple calculations and plotting data that is already pre-processed.
Example 2 showcased the ability to use R within Tableau for more advanced capabilities such as linear regression and other machine learning models. Additionally, Example 3 dealt with statistical analysis of bootstrapping Covid-19 data to calculate a point estimated Death Rate per country. This is where combining the packages available in R to pre-process the data, and then passing it back to Tableau creates a powerhouse for statistical analysis and data visualization.
All three of these examples showcase the capabilities of using Tableau and R together. Combining the analytical ability of R packages with the drag and drop data visualization of Tableau will make you a better data scientist, and hopefully will help you land the job of your dreams!
Sources
https://www.tableau.com/learn/whitepapers/using-r-and-tableau
https://www.youtube.com/watch?v=E092x4a9_Y4