2 R Basics

So…there is soooo much to the world of R. Textbooks, cheatsheets, exercises, and other buzzwords full of resources you could go through. There are over 13800 packages on CRAN, the network through which R code and packages are distributed. It can be overwhelming. However, bear in mind that R is being used for a lot of different things, not all of which are relevant to EDAV.

To help you navigate the landscape, here we provide a collection of resources that you should be familiar with in the context of this course. This is not to say that any of these resources are prerequisites, but they will come up in the course and we want to give you places to learn about them.

Since people come with a variety of backgrounds, we will try to provide the essentials as well as some resources for more advanced users. Do not feel you have to go through all of these resources, but know that they are here if/when you need them.

2.1 Essentials checklist

In an effort to get everyone on the same page, here is a checklist of essentials so you can get up and running with this course. It will echo/reference a lot of info said below, but we want to make sure everything mentioned is clear and understood.

Okay, then. Here are the essentials, in checklist form:

2.1.1 Learn R

  1. Download R and RStudio: This is the biggest thing to do by far. Make sure to download both R and RStudio, as mentioned in Setting up R and RStudio.
  2. Learn your way around RStudio: RStudio is powerful…if you know how to use it. Take the time to look through the DataCamp sections on the RStudio IDE so you feel comfortable (see Use RStudio like a pro section).
  3. Try something!: Getting comfortable with an IDE is all about practice. So while the DataCamp vids are great, don’t solely rely on them. Try things out for yourself! Here are some things to play around with:
    • Create an R Script file, paste in print("Hello, World!"), and run it
    • Create an R Markdown file and have it generate an HTML page
    • Download some packages like tidyverse or MASS
    • Do some math in the console
  4. Study R: See Learning about R below.
  5. Learn how to get help: Make sure you are comfortable searching for answers when you get stuck. See the section below on getting help for some…help.

2.1.2 Prepare for class

  1. Get the Textbook: This course uses Graphical Data Analysis with R as its textbook. Here is an Amazon link for a physical copy and a link to the book’s website.
  2. Setup DataCamp Account: A lot of the references and support materials discussed in edav.info/ are from DataCamp, an online collection of courses/articles on data science. Some of the sections are free, but most are behind a paywall. However, DataCamp currently provides full access to the site for students with .edu email addresses. If you are enrolled in this course, during the first week of class you will receive an invitation to create an account using your columbia.edu email address, which will grant you full access.

2.2 Getting started

2.2.1 Setting up R and RStudio

It is super important to get up and running with R and RStudio as soon as you can.

  • This video from DataCamp pretty much covers it. Know that you will be downloading two separate things: R, which is a programming language; and RStudio, which is an IDE (integrated development environment…fancy tool for working with R) that will make working with R a lot more enjoyable.

2.2.2 Use RStudio like a pro

Great! RStudio is up and running on your computer! Now make sure you get comfy with what it can do.

  • Don’t know your way around the RStudio IDE? I highly recommend this DataCamp course. Sections from Part 1 (Orientation, Programming, and Projects) are the most relevant for this course. They include videos about all the regions in RStudio, how to program efficiently/effectively in the IDE (gotta love those keyboard shortcuts), and the benefits of setting up R projects. A little hazy on that last sentence? The course will help.

  • Just want a quick reference to brush up with? Take a look at the RStudio Cheatsheets page. Another option is this RStudio webinar.

  • Want to make the RStudio IDE your own? Look into modifying the preferences. You can customize the look of the IDE like default colors and typefaces, tweak default behaviors like clearing the environment on load, and integrate a session with a git repository. If something about the IDE bugs you, chances are you can make it more to your liking.

2.2.3 Learning about R

R is just like any language, programming or otherwise: you need to use it to get used to it.

General advice: don’t get caught up in the details. Keep a list of questions and move on.

2.3 Packages and imports

2.3.1 Installing packages

A lot of the cool stuff comes from installing packages into R.

  • How do you install packages? The main function we use is install.packages("<package_name>"), which installs from CRAN, a well-known place where packages are stored. Then, once installed, you can use packages by calling them within library().

  • Still confused? This DataCamp video should help explain the process. Also be sure to try the accompanying exercise to make sure you have a feel for loading a package.

  • Want more info? Check out this DataCamp article on everything about installing packages in R. As well as covering the basics, this article shows you how to install packages that are not located on CRAN using devtools, as well as ways to monitor the status/health of your installed packages.

2.3.2 Tidyverse

Don’t know what the tidyverse is? It’s great and we use it throughout this course. Specifically, ggplot2 and dplyr, two packages within the Tidyverse.

  • What’s ggplot? Check out this DataCamp course. This course is split up into three parts and it is quite long, but it does go over pretty much everything ggplot has to offer. If you are starting out, stick with Part 1.

  • What’s dplyr? Make friends with this DataCamp course. It goes through the main dplyr verbs: select, mutate, filter, arrange, summarise; as well as the lovely pipe operator.
    Check out these super cool animations, which depict a data frame as it is transformed by dplyr and tidyr functions (a great application of gganimate!)

  • Want case studies to go through? Try this one or this one.

2.3.3 Importing data

We often will need to pull data into RStudio to work with it.

  • “Pull data”? I’m already confused. But wait! Here’s a DataCamp course on importing data. Note: This course explains how to import every kind of data format under the sun…all you need to be familiar with for this course (mostly) is pulling in CSV files using read_csv. So, if you are overwhelmed, just stick to the read_csv stuff.

  • Importing every data format under the sun you say? I want to know how to do that. Here’s Part 1, as well as Part 2, which focuses on databases and HTTP requests. Go nuts.

2.4 Communicating Results

2.4.1 R Markdown & Knitr

R Markdown is how you will be writing assignments for this course and Knitr is how you will generate an output file for submission. In general, they’re a great way to communicate your findings to others (for the python-lovers among you, this is the Jupyter Notebook of the R world).

  • Want to jump right in? Open a new R Markdown file (File > New File > R Markdown…), and set its Default Output Format to HTML. You will get a R Markdown template you can tinker with. Try knitting the document to see what everything does. For more info on what is happening behind the scenes, checkout this R Markdown Quick Tour.

  • Want a simple description of R Markdown? Checkout this RStudio article for a description on how to combine text, source code, and output into one document.

  • Prefer videos? DataCamp course to the rescue! There is also an RStudio webinar about it.

  • Don’t know about Knitr? Here’s the specific section on Knitr from the DataCamp course cited above. With this package, you can embed code directly into your R Markdown files and generate output documents. Make sure to go through the later exercises to learn about code chunks and chunk options so you can fine-tune your final output document with ease.

  • Wondering what chunk options are? Have you ever wanted to align graphs in your output PDF differently? Or re-size a plot in your output document? Or suppress an annoying message a package raises? Chunk options address this. We have made an R Markdown file showing off different chunk options that you can download from our github repo and play around with. Also make sure to checkout the documentation on chunk and package options for a full list of what’s possible.

  • The R Markdown page from RStudio has lessons with extensive info. Also, more cheatsheets.

2.4.2 Submitting Assignments

Here’s a quick run-down of how to submit your assignments using R Markdown and Knitr.

  • Create R Markdown file with PDF output format: We will often provide you with a template, and feel free to add on to it directly, but make sure its output format is set to pdf_document. Write out your explanations and insert code chunks to answer the questions provided. If you want to make a new file, go to File > New File > R Markdown… and set the Default Output Format to PDF. Either way, the header of the .Rmd file should look something like this:

  • Add PDF Dependencies: As stated when you create a new R Markdown file, the PDF output format requires TeX:

  • Make sure you download TeX for your machine. Here are some Medium articles on the process of creating PDF reports (the articles cover starting from scratch with no installs at all, but you can skip over to installing TeX only):

This can be a little complicated, but it will make that Knit button near the top of the IDE magically generate a PDF for you.

If you are in a rush and want a shortcut, you can instead set the Default Output Format to HTML. When you open the file in your browser, you can save it as a PDF. It will not be as nicely formatted, but it will still work.

2.5 Getting help

via https://dev.to/rly

First off…breeeeeeathe. We can fix this. There are a bunch of resources out there that can help you.

2.5.1 Things to try

  • Remember: Always try to help yourself! This article has a great list of tools to help you learn about anything you may be confused by. This includes learning about functions and packages as well as searching for info about a function/package/problem/etc. This is the perfect place to learn how to get the info you need.

  • The RStudio Help menu (in the top toolbar) is a fantastic place to go for understanding/fixing any problems. There are links to documentation and manuals as well as cheatsheets and a lovely collection of keyboard shortcuts.

  • Vignettes are a great way to learn about packages and how they work. Vignettes are like stylized manuals that can do a better job at explaining a package’s contents. For example, ggplot2 has a vignette on aesthetics called ggplot2-specs that talks about different ways you can map data to different formats.
    • Typing browseVignettes() in the console will show you all the vignettes for all of the packages you have installed.
    • You can also see vignettes by package by typing vignette(package = "<package_name>") into the console.
    • To run a specific vignette, use vignette("<vignette_name>"). If the vignette can’t be resolved, include the package name as well: vignette("<vignette_name", package = "<package_name>")
  • Don’t ignore errors. They are telling you so much! If you give up because red text showed up in your console, take the time to see what that red text is saying. Learn how to read errors and what they are telling you. They usually include where the problem happened and what R thinks the problem stems from.

More Advanced: Learn to love debugger mode. Debugging can have a steep learning curve, but huge payoffs. Take a look at these videos about debugging with R. Topics include running the debugger, setting breakpoints, customizing preferences, and more. Note: R Markdown files have some limitations for debugging, as discussed in this article. You could also consider working out your code in a .R file before including it in your R Markdown homework submission.

2.5.2 Help me, R community!

Relax. There are a bunch of people using the same tools you are.







with