STATISTICS

What is Statistical Independence?

We will define a fundamental property in statistics that greatly simplifies building and computing statistical models.

Mathematical Jargon
Two events A and B are independent if the occurrence of event B provides no new information on the probability of A occurring. This can be mathematically expressed as:

Probability of A and B = Probability of A x Probability of B

Example
If I flip two coins at the same time, the result of one coin does not affect the result of the other. Before doing the math, we first note that all possible combinations of throwing 2 coins is the following set {HH, HT, TH, TT}
of size 4. So what is the probability of obtaining a Heads and Tails, and are these events independent? Let's check:

1. P(Coin 1 is Heads AND Coin 2 is Tails) = {HT} / {HH, HT, TH, TT} = 1/4 = 25%
2. P(Coin 1 is Heads) x P(Coin 2 is Tails) = 50% x 50% = 25%

Both equations #1 and #2 are equal, satisfying the definition of independence.

In Football Terms
One advantage of assuming independence is that it helps reduce the mathematical complexity of problems. When events are not independent, the dependence between events has to be calculated, and before you know it these calculations
balloon.
Lets take the Poisson model you see in your typical Excel for football model. It involves multiplying the Poisson distribution of goals scored by Home Team by goals scored by Away team:

POI(Home Avg. Scored; Goals) x POI(Away Avg. Scored; Goals)


Look familiar? That is the independence assumption in action! The correct score heatmap above is a 5x5 matrix = 25 different possibilities.
If we had not assumed independence between Home and Away goals, we would've to calculate 2 x 25 = 50 possibilities because the order in which events occur is relevant. As an example, the probability of Home Scored = 1 when Away
Scored = 0 is not the same as probability Away Scored = 0 when Home Scored = 1. Under the independence assumption, this order is irrelevant, but whether we can make this assumption is a whole other story.
