I've been following a course on probability and i will try to explain to you what is the expected value of a geometric distribution.

What is the expected value of a geometric distribution? An example question. Suppose you have a six sided die, how many times do you need to roll the die on average to get all six numbers? Interesting question which could apply to many situations. The answer is \(14.7\) but how do we get to that number?

Before i dive into the calculation part let's talk about the "expected value" and the "geometric distribution" separately.

Expected value

Suppose that you play a game where you can win x different bags with different amounts of money (losing can be represented by winning the 0 amount). If you play many times what is the average amount that you will win? After playing a real game many times you can just take a look at your winning and divide the amount of times you played and you will know. You could also play a hypothetical game (see the example below) then you would also get an average and thus an "expected value" for the "real game". Numerically they will be the same, but the term "expected value" is used here instead of "average" to indicate that it's about a game of chance and a chance implies you have yet to start (take your chances).

A numerical example

Suppose you can win three different bags with the amounts: \(0\), \(200\) and \(600\)
The house is greedy so it puts a \(0.5\) chance on winning \(0\) (nothing). \(0.4\) chance on winning \(200\) and only \(0.1\) chance on \(600\). After playing a \(1000\) times how much money do you expect to have gained?

Well the expectation is that in \(1000\) plays the \(0.5\) chance will occur \(500\) times. \(1000 * 0.4 = 400\) times and \(1000 * 0.1 = 100\) times. Therefor the money in 1000 times is: \(500 * 0 + 400 * 200 + 100 * 600 = 140000\)

The expected value is therefor \(\frac{140000}{1000} = 140\)
Which is a bit strange because when you play just 1 time you expect either 0, 200 or 600. Nevertheless take the term "expected value" as a statistical term, which we know is based on the average.

Geometric distribution

Bernoulli trial

When you play a game of chance and have several outcomes, what is the chance of one particular outcome? In the money bags example, what is the chance of a bag with 600? It was 0.1, which makes the chance of NOT having a bag of 600: \((1 - 0.1) = 0.9\) This seems rather obvious, however in math somebody had to formalize it. So credits to Jacob Bernoulli back in the 18th century who coined "the chance of something happening, or not happening" the Bernoulli trial. What is important to remember about the bernoulli trial that it's a game of chance (or experiment if you will) with exactly two possible outcomes. An often used example here is a coin toss which results in heads or tails.

Example: the money bag game

Now comes the question .. how many times do we need to play the money game until we get the big price? Potentially unlimited times because there is always a 90% chance that we don't win the \(600\). Let's start calculating the chances to win the big price when we play a couple of times.

When playing one time the chance is 10%
When playing two times, it implies no win on the first time, so there is a 90% chance of actually not getting that big price on the first time, and then a 10% chance of getting the big price on the second time. Making the total chance of winning the big price in exactly two plays: \(0.9 * 0.1 = 0.09\)

Let p be our chance to get the big price we can write it down like this: \((1 - p) * p\)

For the 3rd play the chance is: \((1 - p)^2 * p = (1 - p)^{n - 1} * p = 0.009\)

In tabular form for the first five plays we get:

Plays (n) Notation Numerically Chance of success in exactly n plays
\(1\) \((1-p)^{1-1} * p = 1 * p = p\) \(0.9^0 * 0.1 = 1 * 0.1 = 0.1\) \(0.1 \)
\(2\) \((1-p)^{2-1} * p \) \(0.9^1 * 0.1 \) \(0.09 \)
\(3\) \((1-p)^{3-1} * p \) \(0.9^2 * 0.1 \) \(0.081 \)
\(4\) \((1-p)^{4-1} * p \) \(0.9^3 * 0.1 \) \(0.0729 \)
\(5\) \((1-p)^{5-1} * p \) \(0.9^4 * 0.1 \) \(0.06561 \)

The price can be won in one play (the first), or in two plays, or in three or in many many plays. The thing is that in an infinite amount of plays that big prize is eventually going to be won. In other words the chance that the big prize is won when trying an unlimited times is always 100% That means that all the chances calculated in the last column of our table have to sum to 1.

Let's write the chance that a price will be won in \(X\) plays as \(P(X)\). Note that this is different from the chance p that the big price is won !!

Geometric series & distribution

The table in the last paragraph where the chance P(X) is calculated for each N is the probability mass function for a geometric series. Probability mass function because we the probability P(X) is given for each mass N (mass or "weights" will be explained further down). A geometric series because each next value of P(X) can be calculated by multiplying another time by 0.9 From wikipedia the definition of a geometric series is given as:

In mathematics, a geometric series is a series with a constant ratio between successive terms.

That's why when we talk about the probability for each term in the series we speak of a geometric distribution.

Putting things together

Remember the expected value? Just as there are now two different probabilities \(p\) and \(P(X)\) there are now also two kinds of expected values. The first is already explained. The second expected value is the answer to the previous stated question "how many times do we need to play the money game until we get the big price?". This answer doesn't have to be exactly \(1\), \(2\), \(3\) or any other integer bigger than \(0\). For the same reason as we get \(140\) instead of \(0\), \(200\) or \(600\) before. In every day speach an answer like "we will have to try \(4.3476\) times on average to win the big price" is an acceptable answer. Let's try to find this answer.

When we try one time the chance is \(0.1\)
When we try two times the chance is \(0.09\)
Just the average of this would be: \( \frac{1 * 0.1 + 2 * 0.09}{0.1 + 0.09} = 1,473\)
However we can not stop at two tries, we would need infinitively many tries.

Also note that the numbers here:

(1 * 0.1 + 2 * 0.09 + 3 * 0.009 + ..) / (0.1 + 0.09 + 0.009 + ..) = ?
 ^         ^          ^

is a number incrementing from \(1\) to infinity. Skipping a number would be silly. Let's play \(1\), \(2\) and \(4\) times. But you would have to say on the third time you have no chance of winning the big prize, otherwise you would have to stop the game on a win and only have played three times. But this would make chance (\(p\)) different than if you would be playing less or more times. While that could be a rule of the game it's not a geometric distribution anymore because the chances of the third try depend on the tries before that.

Compare this the calculation of the expected value for the amount of money. Where you had the money as weights times it's respective chance to win that particular amount. Here our number n takes the place of the weights. Below the previous expected value calculation with the numbers used as weights annotated.

(500 * 0 + 400 * 200 + 100 * 600) / 1000 = 0.5 * 0 + 0.4 * 200 + 0.1 * 600
       ^         ^           ^                   ^         ^           ^

Let's continue the series we had, the formula before was \(\frac{1 * 0.1 + 2 * 0.09}{0.1 + 0.09} = 1,473\). The denominator \(0.1 + 0.09 + ..\) here has the first two values of the last column in the table. We know that the chance for infinitive many times will be 1. So the denominator will become 1. For the numerator we have \(1 * 0.1 + 2 * 0.09 + .. + ..\) which we can write as: \(n * p * (1 - p)^{n - 1}\) for n starts at 1 to infinity. Writing it with a sum sign we get the formula \(\sum_{n=1}^{\infty} np(1 - p)^{n - 1}\)

Simplification of formula and proof

This big formula can be simplified to simply \(\frac{1}{p}\). There are various ways how to get to \(\frac{1}{p}\) i will not attempt to proof this in this blog post. But i will link a few resources about this proof before continuing.

On this page when you scroll down to Moments you can click to unfold 3 different proofs. Here are two videos by Khan Academy and Club Academia which both cover the same style of proof.

But let's consider if this simplified formula is intuitive. If i have a six sided die, the chance of rolling a \(4\) is \(\frac{1}{6}\). The expected value is \(\frac{1}{p}\), so i expect to roll on average \(\frac{1}{1/p} = p = 6\) times before that 4 shows up when i try this experiment many times. It seems intuitive numerically that \(\frac{1}{6} + \frac{1}{6} + \frac{1}{6} + \frac{1}{6} + \frac{1}{6} + \frac{1}{6} = 1\) so six times. But to me it was surprising that the chance of one roll is so directly related to the average amount of rolls which are needed to get the "success" condition (the \(4\) in this example). Maybe it is a little mind-warping, after all the proofs go to infinity and back 😉

Solving the problem

Now we have the formula for the expected amount of rolls until we see a particular number (the win condition). What about rolling that dice until all six possible numbers appeared?

Brute force

Before we dive into the elegance of math let's first try simulating it. Below is an annotated program written in rust that starts the rolls 10 million times, which on average gives fairly accurate results like \(14.6977991\) or \(14.7015299\) compared to the actual answer of 14.7

use rand::prelude::*;
use std::collections::HashSet;

fn main() {
  let iterations = 10_u64.pow(7); // amount of times to do the experiment = 10^7

  let mut rng = rand::thread_rng(); // prepare randomly rolling the dice

  let mut rolls = Vec::new();
  for _ in 0..iterations {
    let mut n = 0;
    let mut seen = HashSet::new();
    while seen.len() != 6 { // while not all numbers have been rolled (seen), start the experiment
      n += 1; // count the amount of rolls
      let nr: u8 = rng.gen_range(0, 6); // roll the dice
      if seen.contains(&nr) {
        continue; // try again when the number already occured
      seen.insert(nr); // the number did not happen before, mark as seen
    rolls.push(n); // remember the amount of rolls needed in this experiment

  println!("Converged to: {}", rolls.iter().sum::<u64>() as f64 / rolls.len() as f64); // sum all the amount of rolls divided by amount of expirements

The program helps to understand the steps needed to perform the experiment and additionally the result will verify that the answer of the math will indeed approach 14.7

The solution

The solution follows by the realisation that to get six unique numbers we have to restart the geometric distribution a couple of times. What do i mean by that? Let's just start rolling the dice and consider what we need to do. On the first roll we already get a number we haven't seen before, the chance of this happening is \(p = 1\). Obvious because it's the first role. Now there are five numbers left which we need. The chance of getting another unique number is \(p = \frac{5}{6}\) (\(\frac{1}{6}\) that we roll the same number again). The third unique number will chance of \(p = \frac{4}{6}\). Continuing let's calculate the average amount of rolls needed (expected value) for each next unique number that we need:

n p P(X)
\(1\) \(1 \) \(1/1 = 1 \)
\(2\) \(\frac{5}{6} = \frac{6-n}{6} = 0.833\) \(\frac{1}{p} = \frac{1}{0.833} = \frac{6}{5} = 1.2\)
\(3\) \(\frac{4}{6} = \frac{6-n}{6} = 0.666\) \(\frac{1}{p} = \frac{1}{0.666} = \frac{6}{4} = 1.5\)
\(4\) \(\frac{3}{6} = \frac{6-n}{6} = 0.5 \) \(\frac{1}{p} = \frac{1}{0.5 } = \frac{6}{3} = 2 \)
\(5\) \(\frac{2}{6} = \frac{6-n}{6} = 0.333\) \(\frac{1}{p} = \frac{1}{0.333} = \frac{6}{2} = 3 \)
\(6\) \(\frac{1}{6} = \frac{6-n}{6} = 0.166\) \(\frac{1}{p} = \frac{1}{0.166} = \frac{6}{1} = 6 \)

Summing all \(P(X)\) gives us \(1 + 1.2 + 1.5 + 2 + 3 + 6 = 14.7\)

As formula: \(\sum_{n=1}^{6} \frac{6}{n}\)