The Birthday Problem & More

Considering the Birthday Problem as a special case of a more general problem along with considering similar problems and extensions

What is commonly referred to as the Birthday Problem asks the question: What is the minimum number of people in a group so that the probability that at least two people in the group (ignoring leap years) is more than 50%. The answer of 23 surprises most people because it is considerably less than expected. For a group of 23 people the probability that at least two people share the same birthday is approximately 0.5073 (to 4 significant figures).

A great deal of literature has been produced discussing and explaining the Birthday Problem and the mathematics involved can be considered commensurate with IB Maths SL and HL. However, since so much has already been written about the Birthday Problem then it is far too easy for a student to simply transcribe a solution and relevant discussion found in a textbook, from a website, or professional journal article, etc. By doing so the student is most likely using mathematics at an appropriate level but will struggle to demonstrate much personal engagement. Being such a common problem it will be very difficult for a student to “personalize” an Exploration whose topic focuses on the Birthday Problem.

However, I think it is possible to use the Birthday Problem as the starting point for an Exploration that focuses on other questions that are related to, or are extensions of, the Birthday Problem. The key insight that opens up a wider, and possibly more ‘personal’, treatment of this kind of intriguing probability question is that the Birthday Problem is simply a special case of a general scenario that is not limited to just the matching of birthdays. At its simplest level, the Birthday Problem is the selection, with replacement, of a certain number of items from a larger set of 365 items. The essential characteristic is that after the selection of each item it is not removed from the larger set from which it was selected – so, there is the chance of it being selected again. Multiple people could have – or ‘select’ – the same birthday. Thus, the Birthday Problem is a special case of the following problem:

If an experiment with N equally likely outcomes is performed R times then what is the probability that at least one of the outcomes occurs two or more times?

Finding the probability that at least one of the N equally likely outcomes occurs two or more times during R repetitions for such an experiment is best done by using a ‘complement’ approach. Find the probability that during R repetitions of the experiment that none of the N outcomes occurs two or more times which is the complement of at least one occurring two or more times. Subtracting this ‘complement’ probability from 1 gives the probability that at least one of the N outcomes occurs at least twice during R repetitions.

For illustration purposes, here is the computation for the probability that at least two people in a group of 10 share the same birthday – assuming no leap year and that each of the 365 days is equally likely.

Probability of at least 2 share a birthday in a group of 10 $=1-\left(\frac{365}{365}\cdot \frac{364}{365}\cdot \frac{363}{365}\cdot \frac{362}{365}\cdot \frac{361}{365}\cdot \frac{360}{365}\cdot \frac{359}{365}\cdot \frac{358}{365}\cdot \frac{357}{365}\cdot \frac{356}{365}\right)\approx 0.1169481771$
This is equivalent to choosing $\text{R}=10$ times (with replacement) from $\text{N}=365$ equally likely items – or performing an experiment $\text{R}=10$ times that has $\text{N}=365$ equally likely outcomes. In general, for 10 selections with replacement, the probability of at least one of N outcomes occurring at least twice $=1-\left(\frac{\text{N}}{\text{N}}\cdot \frac{\text{N}-1}{\text{N}}\cdot \frac{\text{N}-2}{\text{N}}\cdot \frac{\text{N}-3}{\text{N}}\cdot \frac{\text{N}-4}{\text{N}}\cdot \frac{\text{N}-5}{\text{N}}\cdot \frac{\text{N}-6}{\text{N}}\cdot \frac{\text{N}-7}{\text{N}}\cdot \frac{\text{N}-8}{\text{N}}\cdot \frac{\text{N}-9}{\text{N}}\right)=$
$=1-\frac{\frac{\text{N}!}{\left(\text{N}-10\right)!}}{{\text{N}}^{10}}=1-\frac{\text{N}!}{{\text{N}}^{10}\left(\text{N}-10\right)!}$ .

And, for the most general case, the probability that at least one of the N equally likely outcomes occurs two or more times during R repetitions (trials) is:

$\text{P}\left(\text{N},\text{R}\right)=1-\frac{\frac{\text{N}!}{\left(\text{N}-\text{R}\right)!}}{{\text{N}}^{\text{R}}}=1-\frac{\text{N}!}{{\text{N}}^{\text{R}}\left(\text{N}-\text{R}\right)!}$

Alternatively, $\text{P}\left(\text{N},\text{R}\right)=1-\frac{\frac{\text{N}!}{\left(\text{N}-\text{R}\right)!}}{{\text{N}}^{\text{R}}}=1-\frac{{}_{\text{N}}\text{P}{}_{\text{R}}}{{\text{N}}^{\text{R}}}$ , where ${}_{\text{N}}\text{P}{}_{\text{R}}$ is the number of permutations of N items chosen R at a time.

Thus the probability P that any outcome occurs at least twice can be expressed as a function in terms of two variables N (total # of possible outcomes) and R (# of selections with replacement). Figure 1 shows this function being defined on a TI-Nspire CX and the computation for the probability that at least one number from the set of integers from 1 to 365 is selected at least twice when 23 selections are made with replacement – i.e. the ‘answer’ to the Birthday Problem.

Using this function, $p\left(n,r\right)$ , it is possible to use a trial-and-error approach to solve questions like the Birthday Problem that have different values for N. For example:

From a standard deck of 52 cards ( $\text{N}=52$ ), one person at a time chooses a card and then replaces the card before the next person does the same. What is the smallest number of people each choosing one card so that the probability of at least two people choosing the same card is greater than 50% ?

Just nine cards (Figure 2) need to be chosen with replacement from 52 different cards in order to have a probability greater than 50% that at least two of the cards will match. This result seems to be as counterintuitive as the result of 23 for the Birthday Problem. What about other values of N? We now know it’s 23 for $\text{N}=365$ and 9 for $\text{N}=52$ . It may be interesting to investigate the minimum value of R (# of selections with replacement) for other values of N. Figure 3 shows some results found by using the same trial-and-error approach shown in Figure 2 for $\text{N}=52$ , and these results were entered into the TI-Nspire CX (Figure 4).

Figure 5 is a scatter plot for the data in Figure 4. The range for each axis has been changed in Figure 6 to get a better view of the first seven data points that are close together in Figure 5.

It’s not surprising that as N increases the minimum value of R (min_R) for the probability to be at least 50% for at least two of the R selections (with replacement) to match also increases. But, it’s clear that min_R increases more slowly than N. It would be interesting to explore how to best describe (mathematically) the relationship between N and min_R as N changes.

Further thoughts / questions / extensions

All the probabilities in this article are, of course, theoretical probabilities for an ‘experiment’. The more often the experiment is repeated the closer the experimental probability should get to the theoretical probability. It might be worthwhile to include an experimental aspect in an Exploration – perhaps by writing a program (using a random number generator) to perform numerous trials of the experiment of making R selections with replacement from N items and determine for what percentage of the trials at least two of the selections matched.

There are other questions that can be asked about matching birthdays. One reason that a group of only 23 will have greater than a 50% chance of having at least two people sharing a birthday is because it can be any two (or more) in the group matching – not necessarily matching with a specific person. The following question has a much different answer:

You want to find someone whose birthday (day & month) matches yours. What is the least number of people you need to ask for their birthday so the probability is more than 50% ? [see 2nd article below for answer]

And a small change in the wording of the original Birthday Problem eliminates the application of a complement approach to solve it: What is the minimum number of people in a group so that the probability that exactly two people in the group (ignoring leap years) is more than 50% ?

two interesting – but quite different – articles on the Birthday Problem: