Hypergeometric PDF: Probability Density Function Explained
Understand the Hypergeometric Probability Density Function (PDF) for sampling without replacement. Learn how it calculates success probabilities in quality control & ML.
15.1 Probability Density Function (PDF) of the Hypergeometric Distribution
The Probability Density Function (PDF) of the Hypergeometric Distribution calculates the probability of obtaining exactly $x$ successes in a sample of size $n$, drawn without replacement from a finite population of size $N$, which contains $k$ total successes.
This distribution is crucial when the outcome of each draw is dependent on previous draws, a common scenario in quality control, sampling, and various real-world probability problems.
Probability Formula
The probability $P(X = x)$ of observing exactly $x$ successes is given by the following formula:
$$ P(X = x) = \frac{\binom{k}{x} \binom{N-k}{n-x}}{\binom{N}{n}} $$
Where:
- $P(X = x)$: The probability of obtaining exactly $x$ successes in the sample.
- $\binom{k}{x}$: The number of ways to choose $x$ successes from the $k$ successful items available in the population. This is the binomial coefficient, often read as "k choose x".
- $\binom{N-k}{n-x}$: The number of ways to choose the remaining $(n-x)$ failures from the $(N-k)$ failures present in the population.
- $\binom{N}{n}$: The total number of ways to choose a sample of size $n$ from the entire population of size $N$.
Explanation
The Hypergeometric PDF is applicable and useful in situations characterized by:
- Sampling without Replacement: Once an item is drawn from the population, it is not returned. This means the pool of available items changes with each draw.
- Finite Population: The total number of items in the population ($N$) is known and fixed.
- Two Categories: The population can be clearly divided into two distinct groups: "successes" (e.g., defective items, skilled workers) and "failures" (e.g., non-defective items, unskilled workers).
- Dependent Trials: The probability of success on any given draw is not constant, as it depends on the outcomes of previous draws. This contrasts with the Binomial Distribution, where trials are independent.
Example Use Case
Consider a scenario where you are auditing a batch of 50 manufactured products ($N=50$). You know that 5 of these products are defective ($k=5$). If you randomly select a sample of 10 products ($n=10$) for inspection, the Hypergeometric PDF can be used to calculate the probability that exactly 2 of the selected products are defective ($x=2$).
$$ P(X = 2) = \frac{\binom{5}{2} \binom{50-5}{10-2}}{\binom{50}{10}} = \frac{\binom{5}{2} \binom{45}{8}}{\binom{50}{10}} $$
This calculation would tell you the likelihood of finding precisely two defective items in your sample, which is critical for quality control decisions.
Key Assumptions of the Hypergeometric Distribution
The Hypergeometric Distribution relies on the following core assumptions:
- The population is finite.
- Each draw is made without replacement.
- The population is divided into two mutually exclusive categories (successes and failures).
- The sample size is fixed.
- The probability of success changes with each draw.
Comparison with Binomial Distribution
The Hypergeometric Distribution is often contrasted with the Binomial Distribution. The key difference lies in the nature of sampling:
- Hypergeometric: Sampling is without replacement from a finite population. Trials are dependent.
- Binomial: Sampling is with replacement, or from a very large population where replacement has a negligible effect. Trials are independent.
When the sample size ($n$) is small relative to the population size ($N$), the probabilities calculated by the Hypergeometric and Binomial distributions will be very similar. However, as $n$ approaches $N$, the differences become significant.
Interview Questions
Here are some common interview questions related to the Hypergeometric Distribution:
- What is the Probability Density Function (PDF) of the Hypergeometric Distribution?
- How is the formula for Hypergeometric probability derived?
- When should you use the Hypergeometric Distribution instead of the Binomial Distribution?
- Explain the role of combinations (binomial coefficients) in the Hypergeometric PDF formula.
- How does sampling without replacement affect the probability calculation compared to sampling with replacement?
- What are the key assumptions of the Hypergeometric Distribution?
- Give a real-world example where the Hypergeometric PDF is applicable.
- Why is the Hypergeometric Distribution considered to involve dependent trials?
- How do you interpret the term $\binom{k}{x}$ in the Hypergeometric formula?
- How would increasing the sample size ($n$) affect the probability output in the Hypergeometric Distribution, assuming other parameters remain constant?
Hypergeometric Distribution: Business Stats & AI Use Cases
Learn the Hypergeometric Distribution in business statistics. Understand its meaning, examples, and crucial uses in AI and machine learning, especially for sampling without replacement.
Hypergeometric Mean & Variance | Probability
Learn the mean and variance of the Hypergeometric Distribution. Understand probability without replacement, crucial for AI/ML data analysis.