Hypergeometric Distribution: Business Stats & AI Use Cases
Learn the Hypergeometric Distribution in business statistics. Understand its meaning, examples, and crucial uses in AI and machine learning, especially for sampling without replacement.
15. Hypergeometric Distribution in Business Statistics
The Hypergeometric Distribution is a discrete probability distribution that describes the probability of obtaining a specific number of successes in a sample drawn without replacement from a finite population. This is a crucial distinction from the Binomial Distribution, which assumes sampling with replacement or from an infinitely large population.
15.1 Probability Density Function (PDF)
The Probability Density Function (PDF) for the Hypergeometric Distribution gives the probability of getting exactly $k$ successes in $n$ draws from a population of size $N$ that contains $K$ successes.
The formula is:
$$P(X=k) = \frac{{\binom{K}{k} \binom{N-K}{n-k}}}{{\binom{N}{n}}}$$
Where:
- $N$: The total population size.
- $K$: The total number of success states in the population.
- $n$: The number of draws (i.e., quantity drawn in each trial).
- $k$: The number of observed successes.
- $\binom{a}{b}$ represents the binomial coefficient, "a choose b", calculated as $\frac{a!}{b!(a-b)!}$.
15.2 Mean and Variance
For a Hypergeometric Distribution:
-
Mean (Expected Value), $E(X)$: The mean represents the average number of successes expected in $n$ draws.
$$E(X) = n \frac{K}{N}$$
-
Variance, $Var(X)$: The variance measures the spread of the distribution around the mean. It includes a finite population correction factor.
$$Var(X) = n \frac{K}{N} \left(1 - \frac{K}{N}\right) \left(\frac{N-n}{N-1}\right)$$
The term $\left(\frac{N-n}{N-1}\right)$ is the finite population correction factor.
15.3 Examples of Hypergeometric Distribution
Here are some practical examples of where the Hypergeometric Distribution is applied in business statistics:
Example 1: Quality Control - Defective Parts
A manufacturing company produces batches of 100 widgets, and historically, 5% of these widgets are defective. If a quality inspector randomly selects a sample of 10 widgets from a batch without replacement, what is the probability that exactly 2 of the selected widgets are defective?
- $N = 100$ (total widgets in the batch)
- $K = 5$ (total defective widgets in the batch, since 5% of 100 is 5)
- $n = 10$ (sample size)
- $k = 2$ (number of defective widgets we want to find the probability for)
Using the PDF formula: $$P(X=2) = \frac{{\binom{5}{2} \binom{100-5}{10-2}}}{{\binom{100}{10}}}$$ $$P(X=2) = \frac{{\binom{5}{2} \binom{95}{8}}}{{\binom{100}{10}}}$$
Calculating this would give the probability of finding exactly 2 defective widgets in a sample of 10.
Example 2: Marketing - Customer Surveys
A marketing team wants to survey customers who purchased a specific product last month. Out of 200 customers who bought the product, 30 were first-time buyers. If the team randomly selects 15 customers from this list without replacement, what is the probability that exactly 4 of them are first-time buyers?
- $N = 200$ (total customers who bought the product)
- $K = 30$ (total first-time buyers)
- $n = 15$ (sample size)
- $k = 4$ (number of first-time buyers in the sample)
Using the PDF formula: $$P(X=4) = \frac{{\binom{30}{4} \binom{200-30}{15-4}}}{{\binom{200}{15}}}$$ $$P(X=4) = \frac{{\binom{30}{4} \binom{170}{11}}}{{\binom{200}{15}}}$$
15.4 When to Use the Hypergeometric Distribution?
The Hypergeometric Distribution is appropriate when the following conditions are met:
- Sampling Without Replacement: The draws are made from a finite population, and once an item is drawn, it is not returned to the population. This is the most critical condition.
- Two Outcomes: Each draw results in one of two mutually exclusive outcomes: "success" or "failure".
- Fixed Population Size: The total population size ($N$) is known and finite.
- Fixed Number of Successes: The total number of "success" items ($K$) in the population is known.
- Fixed Sample Size: The number of items drawn ($n$) is fixed.
Common scenarios include:
- Quality control inspections where items are removed from a batch.
- Lottery draws.
- Sampling for opinion polls or market research from a limited group without replacement.
- Any situation where drawing an item changes the composition of the remaining population for subsequent draws.
15.5 Difference Between Hypergeometric Distribution and Binomial Distribution
The primary difference lies in the sampling method:
Feature | Hypergeometric Distribution | Binomial Distribution |
---|---|---|
Sampling Method | Without Replacement | With Replacement (or from an infinite population) |
Population Size ($N$) | Finite and known | Can be finite, but often treated as effectively infinite |
Probability of Success | Changes with each draw (dependent events) | Remains constant for each draw (independent events) |
Underlying Assumption | The composition of the population changes after each draw. | The composition of the population does not change. |
Mean | $n \frac{K}{N}$ | $np$ (where $p = K/N$) |
Variance | $n \frac{K}{N} (1 - \frac{K}{N}) (\frac{N-n}{N-1})$ | $np(1-p)$ |
When the population size ($N$) is very large compared to the sample size ($n$), the probability of success ($p = K/N$) changes very little with each draw. In such cases, the Hypergeometric Distribution can be approximated by the Binomial Distribution. A common rule of thumb is that the approximation is good if $n/N < 0.1$ (i.e., the sample size is less than 10% of the population size).
15.6 Conclusion
The Hypergeometric Distribution is an essential tool in business statistics for analyzing situations involving sampling without replacement from a finite population. Understanding its PDF, mean, variance, and its key differences from the Binomial Distribution allows for accurate probability calculations in quality control, market analysis, and other business contexts where the sampling process significantly alters the population's composition for subsequent draws.
Negative Binomial Distribution: 14.5 Real-World AI Examples
Explore 14.5 practical examples of the Negative Binomial Distribution in AI and Machine Learning, from anomaly detection to model convergence analysis.
Hypergeometric PDF: Probability Density Function Explained
Understand the Hypergeometric Probability Density Function (PDF) for sampling without replacement. Learn how it calculates success probabilities in quality control & ML.