Total Probability Theorem Explained | Machine Learning

Learn the Theorem of Total Probability, a core concept in probability theory for calculating event probabilities in machine learning and AI.

7.6 Theorem of Total Probability

The Theorem of Total Probability, also known as the Theorem of Elimination, is a fundamental concept in probability theory. It provides a method for calculating the probability of an event by considering its relationship to a set of mutually exclusive and exhaustive events that partition the sample space.

Definition and Formula

The theorem is applicable when we want to find the probability of an event $E$, and we know that $E$ can occur only in conjunction with one of several other events that are:

  1. Mutually Exclusive: No two of these events can occur simultaneously.
  2. Exhaustive: One of these events must occur.

Let $H_1, H_2, \ldots, H_n$ be a set of events that form a complete partition of the sample space. This means:

  • $H_i \cap H_j = \emptyset$ for all $i \neq j$ (mutually exclusive)
  • $\bigcup_{i=1}^n H_i = S$, where $S$ is the entire sample space (exhaustive)

If $E$ is any event in the sample space, the Theorem of Total Probability states that the probability of $E$ can be calculated as the sum of the probabilities of $E$ occurring with each of these partitioning events:

$$ P(E) = P(H_1 \cap E) + P(H_2 \cap E) + \ldots + P(H_n \cap E) $$

Using the definition of conditional probability, $P(A \cap B) = P(A) \times P(B|A)$, we can rewrite the formula as:

$$ P(E) = P(H_1) \times P(E | H_1) + P(H_2) \times P(E | H_2) + \ldots + P(H_n) \times P(E | H_n) $$

This can be more compactly written using summation notation:

$$ P(E) = \sum_{i=1}^n P(H_i) \times P(E | H_i) $$

This formulation is particularly useful when the probabilities of the individual partitioning events ($P(H_i)$) and the conditional probabilities of event $E$ given each partitioning event ($P(E | H_i)$) are known.

Why is it called the "Theorem of Elimination"?

It's called the Theorem of Elimination because it allows us to "eliminate" the uncertainty about which intermediate event ($H_i$) has occurred by considering all possibilities and their associated probabilities. We effectively break down the probability of $E$ into mutually exclusive cases, sum them up, and arrive at the overall probability of $E$.

Real-World Example: Defective Machine Problem

Problem Statement:

A company manufactures a certain product using machines from three different factories: Factory X, Factory Y, and Factory Z.

  • The probability that a randomly selected machine comes from Factory X is $0.5$.
  • The probability that a randomly selected machine comes from Factory Y is $0.3$.
  • The probability that a randomly selected machine comes from Factory Z is $0.2$.

Each factory has a different rate of producing defective items:

  • The probability that a machine from Factory X produces a defective item is $0.01$.
  • The probability that a machine from Factory Y produces a defective item is $0.02$.
  • The probability that a machine from Factory Z produces a defective item is $0.05$.

Question: What is the overall probability that a randomly chosen machine produces a defective item?

Step-by-Step Solution:

Let $D$ be the event that a randomly chosen machine produces a defective item. Let $H_1$ be the event that the machine is from Factory X. Let $H_2$ be the event that the machine is from Factory Y. Let $H_3$ be the event that the machine is from Factory Z.

We are given:

  • $P(H_1) = 0.5$
  • $P(H_2) = 0.3$
  • $P(H_3) = 0.2$

And the conditional probabilities of a defective item given the factory:

  • $P(D | H_1) = 0.01$
  • $P(D | H_2) = 0.02$
  • $P(D | H_3) = 0.05$

The events $H_1, H_2, H_3$ are mutually exclusive (a machine cannot come from more than one factory) and exhaustive (all machines come from one of these three factories, as $0.5 + 0.3 + 0.2 = 1$).

We can now apply the Theorem of Total Probability to find $P(D)$:

$$ P(D) = P(H_1) \times P(D | H_1) + P(H_2) \times P(D | H_2) + P(H_3) \times P(D | H_3) $$

Substitute the given values:

$$ P(D) = (0.5 \times 0.01) + (0.3 \times 0.02) + (0.2 \times 0.05) $$

Calculate each term:

$$ P(D) = 0.005 + 0.006 + 0.01 $$

Add the results:

$$ P(D) = 0.021 $$

Final Answer:

The probability that a randomly chosen machine produces a defective item is $0.021$, or $2.1%$.

  • What is the Theorem of Total Probability? It's a rule used to calculate the probability of an event by summing the probabilities of that event occurring through a set of mutually exclusive and exhaustive intermediate events.

  • Explain the formula for the Theorem of Total Probability. The formula is $P(E) = \sum_{i=1}^n P(H_i) \times P(E | H_i)$, where $H_i$ are mutually exclusive and exhaustive events, and $E$ is the event whose probability we want to find. It means we consider each way $E$ can happen via an $H_i$, calculate the probability of that specific path, and sum them up.

  • When is it appropriate to use the Total Probability Theorem? Use it when you want to find the probability of an event $E$, but $E$ can only occur via a set of distinct, non-overlapping scenarios ($H_i$), and you know the probability of each scenario and the probability of $E$ within each scenario.

  • What are mutually exclusive and exhaustive events in the context of this theorem?

    • Mutually Exclusive: The intermediate events ($H_i$) cannot happen at the same time. If one occurs, the others cannot.
    • Exhaustive: One of the intermediate events ($H_i$) is guaranteed to happen; together, they cover all possibilities in the sample space.
  • Provide a real-world scenario where the Total Probability Theorem applies. Any situation where an outcome depends on a prior choice or classification, such as:

    • Disease diagnosis based on symptoms from different patient groups.
    • Investment returns based on different economic scenarios.
    • Product quality based on manufacturing processes from different plants.
  • How does conditional probability relate to the Total Probability Theorem? Conditional probability ($P(E|H_i)$) is crucial because the theorem uses the probability of event $E$ given that a specific intermediate event $H_i$ has occurred. It helps us quantify the likelihood of $E$ within each distinct path.

  • What is the difference between Total Probability and Bayes’ Theorem? The Theorem of Total Probability is used to find the forward probability of an event $E$ ($P(E)$), given prior probabilities of $H_i$ and conditional probabilities $P(E|H_i)$. Bayes' Theorem, on the other hand, is used to find the reverse conditional probability ($P(H_i|E)$) – the probability of a specific prior event $H_i$ given that event $E$ has occurred.

  • Why is the Total Probability Theorem called the “Theorem of Elimination”? It's called this because it systematically "eliminates" the uncertainty about which of the $H_i$ events occurred by accounting for each possibility and its probability, thereby allowing for a direct calculation of $P(E)$.

  • How do you calculate the probability of an event when it depends on several prior events? You identify the mutually exclusive and exhaustive prior events ($H_i$). Then, you find the probability of each prior event ($P(H_i)$) and the probability of the target event $E$ given each prior event ($P(E|H_i)$). Finally, you sum the products of these probabilities: $\sum P(H_i) \times P(E|H_i)$.

  • Can you solve a problem involving machines and defective items using total probability? Yes, as demonstrated in the example above, the theorem is perfectly suited for such problems where a final outcome (defective item) depends on which source (factory) produced it, given the probabilities associated with each source.

Key Concepts

  • Partition: A set of events that are mutually exclusive and exhaustive.
  • Mutually Exclusive Events: Events that cannot occur at the same time.
  • Exhaustive Events: A set of events that cover all possible outcomes in the sample space.
  • Conditional Probability: The probability of an event occurring given that another event has already occurred.