Master Bernoulli distribution with essential terminologies like success, failure, binary outcomes, and trials. Crucial for understanding AI/ML probability concepts.

11.1 Terminologies Associated with the Bernoulli Distribution

Understanding the Bernoulli distribution requires familiarity with several fundamental terms. These concepts form the bedrock for analyzing binary outcomes in probability and statistics.

Core Terminologies

A Bernoulli trial is a single experiment with exactly two possible outcomes. The Bernoulli distribution models the probability of one of these outcomes.

1. Success and Failure

In a Bernoulli trial, there are only two mutually exclusive outcomes:

Success: Typically denoted by the value 1.
Failure: Typically denoted by the value 0.

These binary outcomes are the basis for all calculations within the Bernoulli model.

2. Probability of Success ($p$)

The probability of success in a single Bernoulli trial is represented by the parameter $p$. This value is always between 0 and 1 (exclusive):

$0 < p < 1$

It quantifies the likelihood of observing a "success" (outcome 1) in any given trial.

3. Probability of Failure ($q$)

The probability of failure is the complement of the probability of success. It is calculated as:

$q = 1 - p$

This means that the sum of the probabilities of success and failure always equals 1. For example, if the probability of success ($p$) is 0.7, the probability of failure ($q$) will be 0.3.

4. Bernoulli Random Variable ($X$)

A Bernoulli random variable, typically denoted by $X$, is a variable that takes on only two possible values corresponding to the outcomes of a Bernoulli trial:

$X = \begin{cases} 1 & \text{if success} \ 0 & \text{if failure} \end{cases}$

It mathematically represents the outcome of a single Bernoulli experiment.

5. Bernoulli Trial

A Bernoulli trial is a single experiment that meets the following criteria:

It has exactly two possible outcomes (success or failure).
The probability of success ($p$) is the same for every trial.
Each trial is independent, meaning the outcome of one trial does not affect the outcome of any other trial.

Example: Flipping a fair coin once and defining "heads" as success and "tails" as failure is a Bernoulli trial.

The Bernoulli parameter, denoted by $p$, is the single value that completely defines the Bernoulli distribution. It represents the probability of success. Knowing $p$ allows us to determine the probability of both success and failure for any given trial.

7. Expected Value (Mean)

The expected value, often denoted as $E[X]$ or $\mu$, represents the long-run average outcome of a Bernoulli random variable over many repeated trials. For a Bernoulli distribution, the expected value is simply the probability of success:

$E[X] = p$

Example: If a Bernoulli trial has a probability of success $p = 0.6$, then over many trials, we would expect, on average, 0.6 successes per trial.

8. Variance

The variance, denoted as $Var(X)$ or $\sigma^2$, measures the spread or dispersion of the outcomes around the expected value. For a Bernoulli distribution, the variance is calculated as:

$Var(X) = p(1 - p)$

Since $q = 1 - p$, this can also be written as:

$Var(X) = pq$

This value indicates how much the outcomes are expected to deviate from the mean.

The Bernoulli distribution serves as a fundamental building block for several other important probability distributions.

9. Binomial Distribution

The binomial distribution is a direct generalization of the Bernoulli distribution. It models the probability of obtaining a specific number of successes in a fixed sequence of n independent and identical Bernoulli trials.

Notation: $X \sim \text{Binomial}(n, p)$
Where:
- $n$ is the total number of trials.
- $p$ is the probability of success in each individual trial.

Example: The number of heads in 10 coin flips, where each flip is a Bernoulli trial with $p=0.5$, follows a binomial distribution.

10. Geometric Distribution

The geometric distribution is used when we are interested in the number of Bernoulli trials required to achieve the first success. It assumes a sequence of independent Bernoulli trials, each with the same probability of success $p$.

Models: The number of trials until the first success.
Key Assumption: Independent Bernoulli trials with a constant probability of success $p$.

Example: The number of times you need to roll a die until you get a '6' (where rolling a '6' is a success with $p=1/6$).

11. Negative Binomial Distribution

The negative binomial distribution is a further extension of the geometric distribution. It models the number of Bernoulli trials needed to achieve a specific, predetermined number of successes, say $r$.

Models: The number of trials until the $r$-th success occurs.
Key Assumption: Independent Bernoulli trials with a constant probability of success $p$.

Example: The number of times you need to flip a coin until you get 3 heads.

Interview Questions

What are the two possible outcomes in a Bernoulli trial?
How is the probability of success denoted, and what is its range in a Bernoulli distribution?
How do you calculate the probability of failure in a Bernoulli trial?
What is a Bernoulli random variable, and what values can it take?
What does the Bernoulli parameter ($p$) represent?
How do you calculate the expected value (mean) of a Bernoulli random variable?
What is the formula for the variance of a Bernoulli distribution?
How is the binomial distribution related to the Bernoulli distribution?
What does the geometric distribution model in relation to Bernoulli trials?
Explain the negative binomial distribution and how it extends the geometric distribution.

Bernoulli Distribution: Key Terminologies for ML & AI