Chapter 26 ggmosaic

Qiang Zhao Mike Yao-Yi Wang

Chinese version

26.1 Overview

This cheat sheet is inspired by the Chapter 15 Chart: Mosaic of the edav.info. Instead of using the mosaic function from the package vcd to plot the mosaic plot, this cheat sheet shows how to achieve the same output through using ggmosaic.

26.2 Introduction

  • Mosaic plot is only for categorical data
  • Variables to put in geom_mosaic:
    • weight: Count/Freq column
    • x: product(Y, X2, X1)
    • fill: dependent variable Y
    • conds: conditional variable

26.4 Splitting on One Variable(binned data)

##     Age   Favorite     Music Freq
## 1   old bubble gum classical    1
## 2   old bubble gum      rock    1
## 3   old     coffee classical    3
## 4   old     coffee      rock    1
## 5 young bubble gum classical    2
## 6 young bubble gum      rock    5
## 7 young     coffee classical    1
## 8 young     coffee      rock    0

First, we will show the ggmosaic only split on Age:

Important: The ggmosaic can take binned data by assigning the weight = Freq column of the dataset at its aesthetics, it is not like vcd::mosaic(), which can only take binned data with count column name as Freq.

26.5 Splitting on One Variable(unbinned data)

However, for unbinned data, we could just ignore the weight and let it set to default.

The unbinned data:

##      Age   Favorite     Music
## 1    old bubble gum classical
## 2    old bubble gum      rock
## 3    old     coffee classical
## 4    old     coffee classical
## 5    old     coffee classical
## 6    old     coffee      rock
## 7  young bubble gum classical
## 8  young bubble gum classical
## 9  young bubble gum      rock
## 10 young bubble gum      rock
## 11 young bubble gum      rock
## 12 young bubble gum      rock
## 13 young bubble gum      rock
## 14 young     coffee classical

Note: We will use unbinned data for the rest of example

26.6 Splitting on Two Variables

Split on Age, then Music:

Split on Music, then Age:

For plotting mosaic plot on Y ~ X, we want to set x = product(Y, X) in aes as we always want to split the dependent variable last. We also need to set fill = Y as we want to color base on dependent variable.

26.7 Splitting on Three Variables

Split on Age, then Music, then Favorite:

Note that in the above example, by default the order of split and their directions as follows:

  1. Age – vertical split

  2. Music – horizontal split

  3. Favorite – vertical split

26.8 Adjusting the Direction of Splits

The directions can be adjusted as we want. For example, we want to create a doubledecker plot for the above example following below criteria:

Splitting order:

  1. Age – vertical split (“hspine”)

  2. Music – vertical split (“hspine”)

  3. Favorite (dependent variable)– horizontal split (“vspine”)

Note that the divider vector is in order of which the variables appear in the product(Favorite, Music, Age), however the order of splits is Age, Music, then Favorite. Also note that in the divider vector, “vspine” = horizontal split and “hspine” = vertical split.

26.9 Alternative approach: Conditional

We can also use conditional property to achieve the same result as the above. In this case, geom_mosaic(aes(x = product(last_split), fill = last_split, conds = product(second_split, first_split)).

26.11 Comparison with vcd::mosaic

There are often confusions between ggmosaic:geom_mosaic and vcd:mosaic as the syntax for splitting order and splitting direction are quite different for the two. The vcd:mosaic follows the order of mosaic(last_split ~ first_split + second_split) and the direction vector in the order of splits is (first_split, second_split, third_split) with “v” being the vertical split and “h” being the horizontal split. However, ggmosaic:geom_mosaic follow the different pattern, the order of split is product(last_split, second_split, first_split) and the divider (similar to direction in vcd:mosaic) in the order of split is divider = c(last_split, second_split, first_split) with “vspine” being the horizontal split and “hspine” being the vertical split.