Some intuitions: answering questions when faced with uncertainty

Some intuitions: answering questions when faced with uncertainty#

Before we get started with any maths, let’s explore an everyday problem to illustrate what statistics are about. In daily life, we often wonder about things we don’t know. I think most people would like to figure out how much pasta they need to cook for dinner, to finally stop eating pasta for three days when we just wanted a light meal for a single evening. Or when we get dressed in the morning, we would like to know whether it is hot or cold outside, so that we dress appropriately. In empirical sciences, we might want to know whehther and how some quantities relate to other, such as length of penguin wings and their body size, or whether the activation in a particular brain regions reflects the attentional demands of a cognitive task.

In some cases, the answer is straightforward: if we want to know if it is warm outside, we can stick our arm out of the window, the answer is straightforward: we can open the window and stick our arm out. Sometimes it is a bit more complicated: if we want to know how much pasta we should cook, we should probably cook different amounts of pasta over different meals (and make sure we are equally hungry for each meal) and check which amount was the most adequate. The field of probability and statistics are basically dedicated to describe this kind of problems: there is something we can’t know directly, but we can gather some observations and based on those make an educated guess about what we want to know, and that’s particularly useful the more complex a problem gets.

We could use many example to illustrate all the necessary concepts of statistics and the usefulness of Bayesian statistics. But it just so happens that some of these examples are a bit easier to wrap ones head around. There is nothing special about such problems and any other problems could work, I simply find them more intuitive. I will build on these intuition to illustrate the math behind Bayesian inference and variational Laplace in a way that I hope remains intuitive from beginning to end.

To keep things simple, we will use the example of tossing a coin. Let’s say we are making a bet with a colleague that our coin is going to land on head, while our colleague bets that it is going to land on tail. Our colleague has trust issues, and he argues that we need to make sure that the coin is fair–meaning it has an equal chance of landing on heads or tail. If the coin is biased, say it lands more often on head than on tail, then the bet isn’t quite fair, and the person betting on head has an advantage. So before we get started, we need to make sure that the coin is fair.

Intuitions about probabilities using a coin toss example#

We are trying to answer a simple question: is the coin fair or not?

Running (simulated) experiments#

We don’t know how often the coin lands on head and how often it lands on tail (given a total number of tosses), yet we need to figure it out somehow. One way to do this is to run a little experiment. We can throw the coin in the air twice. If the coin is balanced, then we would expect it to land once on head, once on tail. This is not a very good experiment, because this is not what we mean by fair. If the coin would alternate between head and tail on every second try, then it wouldn’t be fair bet, beacuse if anyone knew what the coin landed on in the previous throw, they could know what it will land on in this throw and would win any bet. What we mean by fair is that the chances for the coin to land on head or tail are equal on every single throw.

So how can we know whether that is truly the case? It would seem intuitive to say “let’s throw the coin many times”. Then, we can count how often the coin lands on head and how often it lands on tail to see if across all these tosses, we get half head and half tail. We can conduct this experiment programatically. Even if you don’t understand exactly how the code works, you only need to understand the following:

n_throw: how often we throw our simulated coin
n_head: each time the coin lands on head, add 1, so that then we know the total number of head out of 10 tosses
n_tail: each time the coin lands on tail, add 1, so that then we know the total number of tail out of 10 tosses
p_head: Probability of getting head out of our 10 tosses, i.e. how often it lands on head out of 10
p_head: Probability of getting tail out of our 10 tosses, i.e. how often it lands on tail out of 10

Let’s say we do 10 tosses and see what happens:

import numpy as np
np.random.seed(0)
n_throw = 10 # We will throw the coin 10 times
n_head = 0 # Before we start, we have zero head
n_tail = 0 # And zero tails

for i in range(n_throw):  # Repeat the same thing 10 times (throwing the coin)
    rnd = np.random.uniform()  # Draw a random number between 0 and 1 (following a uniform distribution, so each value between 0 and 1 is equally likely)
    if rnd <= 0.5:  # If our random number is less than 0.5, we consider that our coin landed on head.
        print(f"Throw {i}: Head")
        n_head += 1
    else:   # If our random number is more than 0.5, we consider that our coin landed on tail
        print(f"Throw {i}: Tail")
        n_tail += 1

# Compute the probability of head and tail:
p_head = n_head/n_throw   # The probability of head is simply how often we obtained head in our 10 throw, divided by the number throws
p_tail = n_tail/n_throw   # The probability of head is simply how often we obtained head in our 10 throw, divided by the number throws
print(f"\nP(Head)={p_head}")
print(f"P(Tail)={p_tail}")

Throw 0: Tail
Throw 1: Tail
Throw 2: Tail
Throw 3: Tail
Throw 4: Head
Throw 5: Tail
Throw 6: Head
Throw 7: Tail
Throw 8: Tail
Throw 9: Head

P(Head)=0.3
P(Tail)=0.7

Interpreting the data#

After 10 throw, the coin didn’t land half of the time on head and half of the time on head. Instead, it landed 30% of the time on head and 70% of the time on tail. That doesn’t seem very balanced. At the same time, you might argue that maybe the coin is indeed balanced, it is just that in these 10 throws, we got “unlucky” because it landed more often on tail than on head. Indeed, if whether we get head or tail is random in each toss, then even if the coin is balanced, it is not impossible to get 7 times tail out of 10 throws. It is also not impossible to get 10 times head in a row.

That being said, you probably get the intuition that while it is possible to get 10 times head in a row, it is not very likely. And you are even less likely to get 1000 heads in a row if you were to do a 1000 throws.

Running a better experiment#

Back to our initial problem: we want to know if our coin is fair. Based on our initial experiment, it is unclear whether we should answer yes, because we don’t know if we just got unlucky and ended up in a case that is not representative of the true head/tail ratio of our coin. How can we do a better experiment? Well just as when we said that throwing the coin twice isn’t enough to tell whether the coin is balanced, it is quite intuitive to think that 10 times isn’t enough either. So we can try to increase the number of coin tosses we make: the more often we repeat our little experiment the more confident we can be in our final answer as to whether the coin is biased.

Let’s try it out (single toss results aren’t shown anymore, only the final proportion of head to tail):

# Same as before, but let's increase the number of throws:
n_throw = 20 # 20 throws instead of 10
n_head = 0 # Before we start, we have zero head
n_tail = 0 # And zero tails

for i in range(n_throw):  # Repeat the same thing 10 times (throwing the coin)
    rnd = np.random.uniform()  # Draw a random number between 0 and 1 (following a uniform distribution, so each value between 0 and 1 is equally likely)
    if rnd <= 0.5:  # If our random number is less than 0.5, we consider that our coin landed on head.
        n_head += 1
    else:   # If our random number is more than 0.5, we consider that our coin landed on tail
        n_tail += 1

# Compute the probability of head and tail:
p_head = n_head/n_throw   # The probability of head is simply how often we obtained head in our 10 throw, divided by the number throws
p_tail = n_tail/n_throw   # The probability of head is simply how often we obtained head in our 10 throw, divided by the number throws
print(f"\nP(Head)={p_head}")
print(f"P(Tail)={p_tail}")

P(Head)=0.35
P(Tail)=0.65

We can see that when we increase the number of throws, the final probability change, and it seems to get closer to 50/50. But still not quite 50/50. So the same question applies again: is our coin biased, or is it just that particular draw that didn’t land on 50/50? We can try to increase the number of throws to much more, say a 1000:

# Same as before, but let's increase the number of throws:
n_throw = 1000 # 20 throws instead of 10
n_head = 0 # Before we start, we have zero head
n_tail = 0 # And zero tails

for i in range(n_throw):  # Repeat the same thing 10 times (throwing the coin)
    rnd = np.random.uniform()  # Draw a random number between 0 and 1 (following a uniform distribution, so each value between 0 and 1 is equally likely)
    if rnd <= 0.5:  # If our random number is less than 0.5, we consider that our coin landed on head.
        n_head += 1
    else:   # If our random number is more than 0.5, we consider that our coin landed on tail
        n_tail += 1

# Compute the probability of head and tail:
p_head = n_head/n_throw   # The probability of head is simply how often we obtained head in our 10 throw, divided by the number throws
p_tail = n_tail/n_throw   # The probability of head is simply how often we obtained head in our 10 throw, divided by the number throws
print(f"\nP(Head)={p_head}")
print(f"P(Tail)={p_tail}")

P(Head)=0.523
P(Tail)=0.477

When we increase the number of iterations to a 1000, the probability of head and tail seems to get close to 50/50. And if our intuition that we should get more reliable answer if we throw the coin more often is true, then the result of this experiment would lead us to believe that our coin is probably not biased. There is one way in which we can show that increasing the number of throws yields a more reliable answer. We could perform the same experiment several times.

Let’s say we have a first experiment in which we throw the coin 10 times, and a second experiment in which we throw the coin 1000 times. To know which of these two experiments is most reliable, we can repeat each experiment 10 times. A more reliable experiment should gives similar results across repetitions.

# Same as before, but let's increase the number of throws:
n_iteration = 5

# ========================================================
# Experiment 1:
print("="*40)
print("Experiment 1")
for i in range(n_iteration):
    n_throw = 10 # 20 throws instead of 10
    n_head = 0 # Before we start, we have zero head
    n_tail = 0 # And zero tails

    for ii in range(n_throw):  # Repeat the same thing 10 times (throwing the coin)
        rnd = np.random.uniform()  # Draw a random number between 0 and 1 (following a uniform distribution, so each value between 0 and 1 is equally likely)
        if rnd <= 0.5:  # If our random number is less than 0.5, we consider that our coin landed on head.
            n_head += 1
        else:   # If our random number is more than 0.5, we consider that our coin landed on tail
            n_tail += 1

    # Compute the probability of head and tail:
    p_head = n_head/n_throw   # The probability of head is simply how often we obtained head in our 10 throw, divided by the number throws
    p_tail = n_tail/n_throw   # The probability of head is simply how often we obtained head in our 10 throw, divided by the number throws
    print(f"P(Head)={p_head}/P(Tail)={p_tail}")

# ========================================================
# Experiment 2:
print("="*40)
print("Experiment 2")
for i in range(n_iteration):
    n_throw = 1000 # 20 throws instead of 10
    n_head = 0 # Before we start, we have zero head
    n_tail = 0 # And zero tails

    for ii in range(n_throw):  # Repeat the same thing 10 times (throwing the coin)
        rnd = np.random.uniform()  # Draw a random number between 0 and 1 (following a uniform distribution, so each value between 0 and 1 is equally likely)
        if rnd <= 0.5:  # If our random number is less than 0.5, we consider that our coin landed on head.
            n_head += 1
        else:   # If our random number is more than 0.5, we consider that our coin landed on tail
            n_tail += 1

    # Compute the probability of head and tail:
    p_head = n_head/n_throw   # The probability of head is simply how often we obtained head in our 10 throw, divided by the number throws
    p_tail = n_tail/n_throw   # The probability of head is simply how often we obtained head in our 10 throw, divided by the number throws
    print(f"P(Head)={p_head}/P(Tail)={p_tail}")

========================================
Experiment 1
P(Head)=0.6/P(Tail)=0.4
P(Head)=0.5/P(Tail)=0.5
P(Head)=0.6/P(Tail)=0.4
P(Head)=0.6/P(Tail)=0.4
P(Head)=0.5/P(Tail)=0.5
========================================
Experiment 2
P(Head)=0.481/P(Tail)=0.519
P(Head)=0.497/P(Tail)=0.503
P(Head)=0.531/P(Tail)=0.469
P(Head)=0.512/P(Tail)=0.488
P(Head)=0.507/P(Tail)=0.493

In the first experiment (where we throw the coin 10 times), the results we get differ quite a bit between repetition, sometimes we get 50/50, but sometimes we get 20% head, 80% tail. In comparison, the results of the second experiment (where we throw the coin 1000 times) vary much less across repeats: we get results between 40% and 60% probability for head and tail. So it would seem that our intuition that throwing the coin many times gets us more reliable results. So if we want to know if our coin is biased or not, we should throw the coin many many times.

And based on the 1000 throws experiment, it would seem that we get results that are close to 50/50, which seems to be what we would expect to happen if our coin isn’t biased

Answering the question based on our data#

Our intuition seems to tell us that our coin isn’t biased, because when we throw it a 1000 times, we get close to 50/50. So a resonable answer is our coin isn’t biased.

Along the way, we also noticed a few interesting things. First, we realized that we can’t trust a 100% the results of our experiment, because even if the coin is balanced, we might not get exactly 50/50 head and tail when we throw the coin several times. In a sense, this means that we can never know for sure whether or not our coin is biased:

If the coin is perfectly fair, we will not get exactly 50/50 every time
If the coin isn’t fair, we might still get 50/50 in some experiments

But then, how can we know based on the results of our experiment whether the best answer is Yes or no, if the same outcome can occur with a biased and with a fair coin? The answer is this: while we can not know with certainty whether or not our coin is biased, we can figure out how likely is it that the coin is not biased, given the results of my experiment. In other words, we can get the answer to the question: “How confident are you that the coin is not biased?”. If the answer is very, then the best answer is to say: the coin is not biased.

This may sound very familiar with the intuition of increasing the number of coin toss to get an answer that we felt we can trust more. This begs the question: why should we trust the results of the experiment where we throw the coin more often more than the one of the experiment where we throw the coin only a few times? A logical answer is that if the coin isn’t biased, we are less likely to observe something very different from the 50/50 ratio when we throw the coin many times. Another way to say that is: given that our coin is not biased, the probability of observing something close to 50/50 is larger when we throw the coin many times than when we throw the coin a few times

This sentence sounds a bit similar to the one above (how likely is it that my coin is not biased, given the results of my experiment), but it is phrased the other way around. In our experiment, because we felt confident that the results of our experiment accurately reflects the true ratio of the coin and because our final results was about 50/50 we deduce that we are confident that the true ratio is 50/50, given that we have observed a ratio of 50/50 in our experiment. It is crucial to understand that these are not the same. ThThe difference is the same as between saying “How confident am I that the sky is cloudy given that it is raining” vs. “How confident am I that it is raining, given that the sky is cloudy?”. While the sky is most often cloudy when it is raining, it is not always true that it is raining when the sky is cloudy. To decide whether or not to proceed with our bet, we need to know how confident we are that our coin isn’t biased, given the results of our experiment.

This is in a nutshell the goal of Bayesian statistics: figuring out the probability of a parameter (such as the true head/tail ratio), given some data (such as the result of an experiment in which we throw the coin a bunch of times). Knowing the probability of a parameter basically tells us how much we should trust the results of our experiment. In our example, we did it all based on intuition, but we will now dig into the actual math that enable to get that probability for any kinds of problems.