Understanding the Binomial Recursively

library(tidyverse)
colors <- c("#00874F", "#67A9DF")

What is the probability that you’ll get an even number of heads if you flip a fair coin 20 times?

My go-to explanations are visual, so it bothered me that the visual for a binomial with \(n = 20, p = .5\) does not clearly make the case that the probability, as you might guess, is .5:

 plot_even_odd <- function(n) {
  data.frame(x = 0:n, prob = dbinom(0:n, n, .5),
             num_heads = ifelse((0:n) %% 2 == 0, "even", "odd")) %>% 
  
  ggplot(aes(x, prob, fill = num_heads)) + geom_col() +
    scale_x_discrete(limits = 0:n) +
  ggtitle(paste0("Binomial Distribution n = ", n, ", p = .5"))
}

plot_even_odd(20)

There are 11 ways to get an even number of heads and 10 ways to get an odd number of heads: it is not immediately evident that the length of the even bars is equal to that of the odd bars.

The question is much simpler for \(n = 19\):

plot_even_odd(19)

Every red bar (even number of heads) matches with a blue bar (odd number of heads): \(P(X = 9) = P(X = 10)\), \(P(X = 8) = P(X = 11)\), and so on. Therefore, the probability of all the reds or all the blues is .5. Now let’s return to \(n = 20\). We know that the number of arrangements of heads and tails for \(n = 20\) is:

last flip tails: all of the arrangements for 19 flips + “T” –> same number of heads as with 19 flips

last flip heads: all of the arrangements for 19 flips + “H” –> one more heads than with 19 flips

Since with \(n = 20\) we have twice as many arrangements as with \(n = 19\), the probability for each individual arrangement ending with “T” or “H” is half of what it was for \(n = 19\).

In general,

\[\frac{b(x; n, .5)}{2} + \frac{b(x-1; n, .5)}{2} = b(x; n+1, .5)\]

Graphically we can show the distribution of \(n = 20\) as that of \(n = 19\) plus the additional flip:

df20a <-
  data.frame(x = 1:20,
             prob = dbinom(0:19, 19, .5) / 2,
             fill = "last flip = heads")

df20b <-
  data.frame(x = 0:19,
             prob = dbinom(0:19, 19, .5) / 2,
             fill = "last flip = tails")
  

df20 <- rbind(df20a, df20b) %>% mutate(x = factor(x))

df20 %>%
  ggplot(aes(x, prob, fill = fill)) + geom_col() +
  facet_wrap( ~ fill, ncol = 1) +
  guides(fill = FALSE) +
  ggtitle("Binomial n = 20, shown as n = 19 + last flip")

Combining the two, we have:

df20 %>%
  ggplot(aes(x, prob, fill = fill)) + geom_col() +
  annotate("text", x = 11, y = .036, label = "X") +
  annotate("text", x = 12, y = .084, label = "X") +
  ggtitle("Binomial n = 20, fill = last flip") + 
  theme(legend.position = "bottom", legend.title = element_blank())

How does this help? We know that the area of the red bars is the same as the area of the blue bars. We also know that every blue bar matches with a red bar to its right–see for example the pair marked with X’s–since they represent the same arrangements with the exception that the red one ends with a heads while the blue one ends with a tails.

Thus we have matching sets of bars, one odd and one even, so the probability of either is one half:

df20 %>%
  ggplot(aes(x, prob, fill = factor(as.integer(x) + as.integer(fill)))) + 
  geom_col() +
  ggtitle("Binomial n = 20") + 
  scale_fill_manual(values = c('#1b9e77', '#d95f02', '#7570b3', '#e7298a', '#66a61e', '#e6ab02', '#a6761d', '#666666', '#1b9e77', '#d95f02', '#7570b3', '#e7298a', '#66a61e', '#e6ab02', '#a6761d', '#666666', '#1b9e77','#d95f02','#7570b3','#e7298a')) + guides(fill = FALSE)

And one more time with a smaller n:

df4a <-
  data.frame(x = 1:4,
             prob = dbinom(0:3, 3, .5) / 2,
             fill = "last flip = heads")

df4b <-
  data.frame(x = 0:3,
             prob = dbinom(0:3, 3, .5) / 2,
             fill = "last flip = tails")
  

df4 <- rbind(df4a, df4b) 

df4 %>%
  ggplot(aes(x, prob, fill = fill)) + geom_col() +
  facet_wrap( ~ fill, ncol = 1) +
  guides(fill = FALSE) +
  ggtitle("Binomial n = 4, shown as n = 3 + last flip")

Or:

df4 %>%
  ggplot(aes(x, prob, fill = fill)) + geom_col() +
  ggtitle("Binomial n = 4, fill = last flip")

Matching pairs:

df4 %>%
  mutate(label = ifelse(as.integer(x) %% 2 == 0, "even", "odd")) %>% 
  ggplot(aes(x, prob, fill = factor(as.integer(x) + as.integer(fill)), label = label)) + 
  geom_col() + 
  geom_text(position = position_stack(vjust = 0.5)) +
  ggtitle("Binomial n = 4") + 
  scale_fill_manual(values = c('#1b9e77', '#d95f02', '#7570b3', '#e7298a')) +
    guides(fill = FALSE)

Are you convinced that the probability of getting an even number of heads is equal to the probability of getting an odd number of heads and therefore equal to 50%?

Back

Understanding the Binomial Recursively

Joyce Robbins

3/5/2019