Hypergeometric vs Binomial Distribution: Key Differences in Stats

Discover the crucial differences between Hypergeometric and Binomial distributions in statistics. Understand when to use each based on sampling methods for AI & ML.

15.5 Differences Between Hypergeometric and Binomial Distributions

Understanding the distinction between the Hypergeometric and Binomial Distributions is essential in statistics. These distributions are used in different scenarios based on the sampling method and the dependency of events.

Key Differences Summarized

The following table outlines the fundamental differences:

BasisHypergeometric DistributionBinomial Distribution
Population SizeFinite population with sampling without replacement.Finite or infinite population with sampling with or without replacement.
Dependency of TrialsDependent trials – each selection affects the next.Independent trials – each selection does not affect the others.
FormulaInvolves combinatorial terms (binomial coefficients) and depends on population size.Uses a basic probability formula with powers of success/failure probabilities.
Typical Use CasesQuality control, population sampling, genetics.Coin tosses, number of defective products, customer responses.
Parameters Used$N$ (population size), $K$ (number of successes in population), $n$ (sample size), $x$ (observed successes).$n$ (number of trials), $p$ (probability of success).

When to Use Which Distribution

  • Use the Hypergeometric Distribution when sampling without replacement from a finite population, and when trials are dependent. Each draw alters the probability of subsequent draws.

  • Use the Binomial Distribution when trials are independent, meaning the outcome of one trial does not influence the outcome of others. This typically occurs when sampling with replacement or when the population is very large (approximating infinite). The probability of success remains constant across all trials.

Formulas

Hypergeometric Distribution Formula

The probability of getting exactly $x$ successes in $n$ draws from a finite population of size $N$ containing $K$ successes is given by:

$$ P(X=x) = \frac{\binom{K}{x} \binom{N-K}{n-x}}{\binom{N}{n}} $$

Where:

  • $\binom{a}{b}$ denotes the binomial coefficient, calculated as $\frac{a!}{b!(a-b)!}$.

Binomial Distribution Formula

The probability of getting exactly $x$ successes in $n$ independent trials, where the probability of success in each trial is $p$, is given by:

$$ P(X=x) = \binom{n}{x} p^x (1-p)^{n-x} $$

Examples

Hypergeometric Distribution Example

Imagine a batch of 50 items, where 10 are defective. If you randomly select 5 items without replacement, what is the probability that exactly 2 of them are defective?

  • $N = 50$ (total items)
  • $K = 10$ (total defective items)
  • $n = 5$ (sample size)
  • $x = 2$ (desired number of defective items in sample)

Using the Hypergeometric formula: $$ P(X=2) = \frac{\binom{10}{2} \binom{50-10}{5-2}}{\binom{50}{5}} = \frac{\binom{10}{2} \binom{40}{3}}{\binom{50}{5}} $$

Binomial Distribution Example

If a fair coin is tossed 10 times, what is the probability of getting exactly 6 heads?

  • $n = 10$ (number of trials)
  • $p = 0.5$ (probability of getting a head on a single toss)
  • $x = 6$ (desired number of heads)

Using the Binomial formula: $$ P(X=6) = \binom{10}{6} (0.5)^6 (1-0.5)^{10-6} = \binom{10}{6} (0.5)^{10} $$

Interview Questions

Here are some common interview questions related to the differences between these distributions:

  • What are the main differences between the Hypergeometric and Binomial Distributions?
  • When should you use the Hypergeometric Distribution instead of the Binomial Distribution?
  • Explain how dependency between trials affects the choice of distribution.
  • What are the key parameters used in the Hypergeometric Distribution?
  • How does the sampling method differ between the Hypergeometric and Binomial Distributions?
  • Describe a real-world example where the Hypergeometric Distribution is appropriate.
  • How do the formulas for the Hypergeometric and Binomial Distributions differ?
  • Why is the Binomial Distribution suitable for independent trials?
  • What is the significance of population size in choosing the distribution?
  • How can quality control processes be modeled using the Hypergeometric Distribution?