25 Dates in R
25.1 Introduction
Working with dates and time can be very frustrating. In general, work with the least cumbersome class. That means if your variable is years, store it as an integer; there’s no reason to use a date or date-time class. If your variable does not involve time, use the Date
class in R.
25.2 Converting to Date
class
You can convert character data to Date
class with as.Date()
:
<- "2018-10-12"
dchar <- as.Date(dchar) ddate
Note that the two appear the same, although the class is different:
dchar
## [1] "2018-10-12"
ddate
## [1] "2018-10-12"
class(dchar)
## [1] "character"
class(ddate)
## [1] "Date"
If the date is not in YYYY-MM-DD or YYYY/MM/DD form, you will need to specify the format to convert to Date
class, using conversion specifications that begin with %
, such as:
as.Date("Thursday, January 6, 2005", format = "%A, %B %d, %Y")
## [1] "2005-01-06"
For a list of the conversion specifications available in R, see ?strptime
.
The tidyverse lubridate makes it easy to convert dates that are not in standard format with ymd()
, ydm()
, mdy()
, myd()
, dmy()
, and dym()
(among many other useful date-time functions):
::mdy("April 13, 1907") lubridate
## [1] "1907-04-13"
Try as.Date("April 13, 1907")
and you will see the benefit of using a lubridate function.
25.3 Working with Date
Class
It is well worth the effort to convert to Date
class, because there’s a lot you can do with dates in a Date
class that you can’t do if you store the dates as character data.
Number of days between dates:
as.Date("2017-11-02") - as.Date("2017-01-01")
## Time difference of 305 days
Compare dates:
as.Date("2017-11-12") > as.Date("2017-3-3")
## [1] TRUE
Note that Sys.Date()
returns today’s date as a Date
class:
Sys.Date()
## [1] "2022-01-30"
class(Sys.Date())
## [1] "Date"
R has functions to pull particular pieces of information from a date:
<- Sys.Date()
today weekdays(today)
## [1] "Sunday"
weekdays(today, abbreviate = TRUE)
## [1] "Sun"
months(today)
## [1] "January"
months(today, abbreviate = TRUE)
## [1] "Jan"
quarters(today)
## [1] "Q1"
The lubridate package provides additional functions to extract information from a date:
<- Sys.Date()
today ::year(today) lubridate
## [1] 2022
::yday(today) lubridate
## [1] 30
::month(today) lubridate
## [1] 1
::month(today, label = TRUE) lubridate
## [1] Jan
## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
::mday(today) lubridate
## [1] 30
::week(today) lubridate
## [1] 5
::wday(today) lubridate
## [1] 1
25.4 Plotting with a Date
class variable
Both base R graphics and ggplot2 “know” how to work with a Date
class variable, and label the axes properly:
25.4.1 base R
<- read.csv("data/mortgage.csv")
df $DATE <- as.Date(df$DATE)
dfplot(df$DATE, df$X5.1.ARM, type = "l") # on the order of years
plot(df$DATE[1:30], df$X5.1.ARM[1:30], type = "l") # switch to months
Note the the change in x-axis labels in the second graph.
25.4.2 ggplot2
# readr
library(tidyverse)
Note that unlike base Rread.csv()
, readr::read_csv()
automatically reads DATE in as a Date
class since it’s in YYYY-MM-DD format:
<- readr::read_csv("data/mortgage.csv")
df
<- ggplot(df, aes(DATE, `30 YR FIXED`)) +
g geom_line() +
theme_grey(14)
g
ggplot(df %>% filter(DATE < as.Date("2006-01-01")),
aes(DATE, `30 YR FIXED`)) +
geom_line() +
theme_grey(14)
Again, when the data is filtered, the x-axis labels switch from years to months.
25.4.2.1 Breaks, limits, labels
We can control the x-axis breaks, limits, and labels with scale_x_date()
:
library(lubridate)
+ scale_x_date(limits = c(ymd("2008-01-01"), ymd("2008-12-31"))) +
g ggtitle("limits = c(ymd(\"2008-01-01\"), ymd(\"2008-12-31\"))")
+ scale_x_date(date_breaks = "4 years") +
g ggtitle("scale_x_date(date_breaks = \"4 years\")")
+ scale_x_date(date_labels = "%Y-%m") +
g ggtitle("scale_x_date(date_labels = \"%Y-%m\")")
(Yes, even in the tidyverse we cannot completely escape the %
conversion specification notation. Remember ?strptime
for help.)
25.4.2.2 Annotations
We can use geom_vline()
with annotate()
to mark specific events in a time series:
ggplot(df, aes(DATE, `30 YR FIXED`)) +
geom_line() +
geom_vline(xintercept = ymd("2008-09-29"), color = "blue") +
annotate("text", x = ymd("2008-09-29"), y = 3.75,
label = " Market crash\n 9/29/08", color = "blue",
hjust = 0) +
scale_x_date(limits = c(ymd("2008-01-01"), ymd("2009-12-31")),
date_breaks = "1 year",
date_labels = "%Y") +
theme_grey(16) +
ggtitle("`geom_vline()` with `annotate()`")
25.5 Date and time classes
Sys.time()
## [1] "2022-01-30 12:34:05 EST"
Considering submitting a pull request to expand this section.
with