Bernoulli Distribution: Mean & Variance Explained

Understand the mean & variance of the Bernoulli distribution, a core concept in probability for AI & machine learning. Analyze binary events effectively.

11.3 Mean and Variance of the Bernoulli Distribution

The Bernoulli distribution models a random experiment with exactly two possible outcomes: success (typically represented as 1) and failure (typically represented as 0). It's a fundamental concept in probability theory, forming the basis for more complex distributions like the Binomial distribution. Understanding its mean and variance is crucial for analyzing binary events.


I. Mean (Expected Value) of the Bernoulli Distribution

The mean, denoted by $\mu$ or $E[X]$, represents the expected value or average outcome of a Bernoulli random variable over a large number of trials. For a Bernoulli distribution, the mean is simply the probability of success.

Formula

If $X$ is a Bernoulli random variable with:

  • $P(X = 1) = p$ (probability of success)
  • $P(X = 0) = q = 1 - p$ (probability of failure)

The mean is calculated as:

$$ \mu = E[X] = p $$

Explanation

The expected value is calculated by summing the product of each possible outcome and its corresponding probability:

$$ E[X] = (1 \times P(X = 1)) + (0 \times P(X = 0)) $$ $$ E[X] = (1 \times p) + (0 \times q) $$ $$ E[X] = p + 0 $$ $$ E[X] = p $$

This intuitively means that, on average, the outcome will be closer to the probability of success. If an event has a 70% chance of success ($p=0.7$), then over many trials, you would expect the average outcome to be 0.7.


II. Variance of the Bernoulli Distribution

The variance, denoted by $\sigma^2$ or $Var[X]$, measures the spread or dispersion of the possible outcomes from the mean. For a Bernoulli distribution, the variance quantifies how much the results tend to deviate from the expected value ($p$).

Formula

The variance of a Bernoulli distribution is given by:

$$ \sigma^2 = Var[X] = p \times (1 - p) $$ $$ \sigma^2 = pq $$

Derivation

The variance can be derived using the formula: $Var[X] = E[X^2] - (E[X])^2$.

First, let's calculate $E[X^2]$:

$$ E[X^2] = (1^2 \times P(X = 1)) + (0^2 \times P(X = 0)) $$ $$ E[X^2] = (1 \times p) + (0 \times q) $$ $$ E[X^2] = p $$

Now, substitute this and $(E[X])^2$ into the variance formula:

$$ Var[X] = E[X^2] - (E[X])^2 $$ $$ Var[X] = p - (p)^2 $$ $$ Var[X] = p - p^2 $$ $$ Var[X] = p(1 - p) $$ $$ Var[X] = pq $$

Interpretation of Variance

The variance $p(1-p)$ highlights that the spread of outcomes is dependent on both the probability of success ($p$) and the probability of failure ($q$).

  • Maximum Variance: The variance is maximized when $p = 0.5$. In this case, $Var[X] = 0.5 \times (1 - 0.5) = 0.25$. This occurs when success and failure are equally likely, leading to the greatest uncertainty in outcomes.
  • Minimum Variance: As $p$ approaches 0 or 1 (meaning the outcome is almost certain to be failure or success, respectively), the variance approaches 0. For example, if $p=0.99$, $Var[X] = 0.99 \times 0.01 = 0.0099$. If $p=0.01$, $Var[X] = 0.01 \times 0.99 = 0.0099$. This indicates very little variability when one outcome is highly probable.

Summary of Properties

MeasureFormulaInterpretation
Mean$\mu = E[X] = p$The average expected outcome (probability of success).
Variance$\sigma^2 = p(1-p)$The spread or variability of outcomes from the mean.

This understanding of the mean and variance of the Bernoulli distribution is foundational for many statistical analyses, data science applications, and business decision-making processes involving binary outcomes.


Frequently Asked Questions

  • What are the key properties of the Bernoulli distribution? The key properties are that it has only two possible outcomes (success/failure), it's a discrete probability distribution, and its mean is $p$ and its variance is $p(1-p)$.

  • Why is the Bernoulli distribution used for binary outcomes? It's specifically designed to model a single trial of an experiment where there are only two mutually exclusive and exhaustive outcomes, such as a coin flip, a customer making a purchase, or a machine producing a defect.

  • How is the probability of failure calculated in a Bernoulli distribution? The probability of failure ($q$) is simply the complement of the probability of success ($p$). Therefore, $q = 1 - p$.

  • How do you calculate the expected value of a Bernoulli distribution? The expected value is equal to the probability of success, $p$.

  • Can you explain the variance formula of a Bernoulli distribution? The variance is $p(1-p)$, meaning it measures the spread of outcomes. It's highest when $p=0.5$ and approaches zero as $p$ gets closer to 0 or 1.

  • How does the Bernoulli distribution relate to the Binomial distribution? The Binomial distribution is essentially a sum of $n$ independent Bernoulli trials. If you have $n$ identical and independent Bernoulli experiments, the number of successes follows a Binomial distribution.