8 Tibble vs. DataFrame

Jingfei Fang

8.0.1 Introduction

A tibble is often considered a neater format of a data frame, and it is often used in the tidyverse and ggplot2 packages. It contains the same information as a data frame, but the manipulation and representation of tibbles is different from data frames in some aspects.

8.0.2 1. Getting started with tibbles

You can do it with tidyverse:

#install.packages("tidyverse")
library(tidyverse)

Or you can do it by installing tibble package directly:

#install.packages("tibble")
library(tibble)

8.0.3 2. Creating a tibble

You can create a tibble directly:

tib <- tibble(a = c(1,2,3), b = c(4,5,6), c = c(7,8,9))
tib
## # A tibble: 3 × 3
##       a     b     c
##   <dbl> <dbl> <dbl>
## 1     1     4     7
## 2     2     5     8
## 3     3     6     9

Or you can create a tibble from an existing data frame by using as_tibble(). We will use ‘iris’ dataset as an example:

df <- iris
class(df)
## [1] "data.frame"
tib <- as_tibble(df)
tib
## # A tibble: 150 × 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## # … with 140 more rows

8.0.4 3. Unlike data frames, tibbles don’t show the entire dataset when you print it.

tib
## # A tibble: 150 × 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## # … with 140 more rows

8.0.5 4. Tibbles cannot access a column when you provide a partial name of the column, but data frames can.

8.0.5.1 Tibble

If you try to match the column name with only a partial name, it will not work.

tib <- tibble(str = c("a","b","c","d"), int = c(1,2,3,4))
tib$st
## NULL

Only when you provide the entire column name, it will work.

tib$str
## [1] "a" "b" "c" "d"

8.0.5.2 Data Frame

However, you can access the “str” column by only providing a partial column name “st” (as long as this partial name is unique).

df <- data.frame(str = c("a","b","c","d"), int = c(1,2,3,4))
df$st
## [1] "a" "b" "c" "d"

8.0.6 5. When you access only one column of a tibble, it will keep the tibble structure. But when you access one column of a data frame, it will become a vector.

8.0.6.1 Tibble

tib[,"str"]
## # A tibble: 4 × 1
##   str  
##   <chr>
## 1 a    
## 2 b    
## 3 c    
## 4 d

Checking if it’s still a tibble:

is_tibble(tib[,"str"])
## [1] TRUE

We can see the tibble structure is preserved.

8.0.6.2 Data Frame

df[,"str"]
## [1] "a" "b" "c" "d"

Checking if it’s still a data frame:

is.data.frame(df[,"str"])
## [1] FALSE

It’s no longer a data frame.

8.0.6.3 However, other forms of subsetting, including [[ ]] and $, work the same for tibbles and data frames.

tib[["str"]]
## [1] "a" "b" "c" "d"
df[["str"]]
## [1] "a" "b" "c" "d"
tib$str
## [1] "a" "b" "c" "d"
df$str
## [1] "a" "b" "c" "d"

We can see that subsetting with [[ ]] and $ also don’t preserve the tibble structure.

8.0.7 6. When assigning a new column to a tibble, the input will not be recycled, which means you have to provide an input of the same length of the other columns. But a data frame will recycle the input.

8.0.7.1 Tibble

tib
## # A tibble: 4 × 2
##   str     int
##   <chr> <dbl>
## 1 a         1
## 2 b         2
## 3 c         3
## 4 d         4
tib$newcol <- c(5,6)
## Error:
## ! Assigned data `c(5, 6)` must be compatible with existing data.
## ✖ Existing data has 4 rows.
## ✖ Assigned data has 2 rows.
## ℹ Only vectors of size 1 are recycled.

It gives an error because the tibble has columns of length 4, but the input (5,6) only has length 2 and is not recycled. You have to provide an input of same length:

tib$newcol <- rep(c(5,6),2)
tib
## # A tibble: 4 × 3
##   str     int newcol
##   <chr> <dbl>  <dbl>
## 1 a         1      5
## 2 b         2      6
## 3 c         3      5
## 4 d         4      6

8.0.7.2 Data Frame

Data frames will recycle the input.

df
##   str int
## 1   a   1
## 2   b   2
## 3   c   3
## 4   d   4
df$newcol <- c(5,6)
df
##   str int newcol
## 1   a   1      5
## 2   b   2      6
## 3   c   3      5
## 4   d   4      6

8.0.8 7. Reading with builtin read.csv() function will output data frames, while reading with read_csv() in “readr” package inside tidyverse will output tibbles.

8.0.8.1 Reading csv file with read.csv()

data <- read.csv("https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv")
class(data)
## [1] "data.frame"

8.0.8.2 Reading csv file with read_csv()

data <- read_csv("https://people.sc.fsu.edu/~jburkardt/data/csv/addresses.csv")
class(data)
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

8.0.9 8. Tibbles don’t support support arithmetic operations on all columns well, the result will be converted into a data frame without any notice.

8.0.9.1 Tibble

We can see that when we try to multiply all the elements of the tibble by 2, the result is correct but it is turned into a data frame without notifications.

tib <- tibble(a = c(1,2,3), b = c(4,5,6), c = c(7,8,9))
class(tib*2)
## [1] "data.frame"

8.0.9.2 Data Frame

But data frames have no issue with it, they will not be converted into any other type.

df <- data.frame(a = c(1,2,3), b = c(4,5,6), c = c(7,8,9))
class(df*2)
## [1] "data.frame"

8.0.10 9. Tibbles preserve all the variable types, while data frames have the option to convert string into factor. (In older versions of R, data frames will convert string into factor by default)

8.0.10.1 Tibble

We can see that the original data types of variables are preserved in a tibble.

tib <- tibble(str = c("a","b","c","d"), int = c(1,2,3,4))
str(tib)
## tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
##  $ str: chr [1:4] "a" "b" "c" "d"
##  $ int: num [1:4] 1 2 3 4

8.0.10.2 Data Frame

If we use data frame, it will also preserve the original types, because “stringAsFactors = FALSE” by default in the new versions of R.

df <- data.frame(str = c("a","b","c","d"), int = c(1,2,3,4))
str(df)
## 'data.frame':    4 obs. of  2 variables:
##  $ str: chr  "a" "b" "c" "d"
##  $ int: num  1 2 3 4

However, we also have the option to convert string into factor when creating the data frame by setting “stringAsFactors = TRUE”.

df <- data.frame(str = c("a","b","c","d"), int = c(1,2,3,4), stringsAsFactors = TRUE)
class(df$str)
## [1] "factor"

We can see that the “str” column has been converted into factor.

8.0.11 10. Tibbles work well with ggplot2, just like data frames.

8.0.11.1 Tibble:

ggplot(data = tib, mapping = aes(x=str, y=int)) +
  geom_col(width = 0.3)

8.0.11.2 Data Frame:

ggplot(data = df, mapping = aes(x=str, y=int)) +
  geom_col(width = 0.3)