Explore the Gamma Distribution, a key continuous probability model. Understand its statistical properties & applications in machine learning, like modeling event occurrences.

17. The Gamma Distribution in Statistics

This document provides a comprehensive overview of the Gamma distribution, a fundamental probability distribution in statistics with wide-ranging applications.

17.1 What is the Gamma Distribution?

The Gamma distribution is a continuous probability distribution that is often used to model the waiting time until a certain number of events occur in a Poisson process. It's a versatile distribution that can take on various shapes depending on its parameters, making it suitable for modeling a wide array of phenomena, including:

Lifetimes of electronic components: The time until a device fails.
Waiting times: The time until the k-th customer arrives at a service point.
Rainfall amounts: Modeling the amount of rain that falls over a period.
Insurance claims: The size of insurance payouts.

It is a flexible distribution because its parameters allow it to represent skewed data, and it can approximate other distributions, as discussed in its special cases.

17.2 Gamma Distribution Function

The Gamma distribution is defined by two parameters:

Shape Parameter ($\alpha$ or $k$): This parameter, often denoted by $\alpha$ or $k$, controls the shape of the distribution. As $\alpha$ increases, the distribution becomes more symmetric and bell-shaped.
Rate Parameter ($\beta$ or $\theta$): This parameter, often denoted by $\beta$ or $\theta$, controls the scale of the distribution. It is the inverse of the scale parameter. A larger $\beta$ means the distribution is more concentrated around its mean. Alternatively, a Scale Parameter ($\theta$ or $1/\beta$) is sometimes used, where a larger $\theta$ stretches the distribution out. For consistency, we will primarily use the shape parameter $\alpha$ and the rate parameter $\beta$.

The Gamma distribution is related to the Gamma function, denoted by $\Gamma(z)$, which is a generalization of the factorial function to complex and real numbers. For a positive real number $z$, the Gamma function is defined as:

$\Gamma(z) = \int_0^\infty t^{z-1} e^{-t} dt$

17.3 Gamma Distribution Formula – Probability Density Function (PDF)

The Probability Density Function (PDF) of the Gamma distribution is given by:

$f(x; \alpha, \beta) = \frac{\beta^\alpha x^{\alpha-1} e^{-\beta x}}{\Gamma(\alpha)}$

where:

$x > 0$ is the random variable.
$\alpha > 0$ is the shape parameter.
$\beta > 0$ is the rate parameter.
$\Gamma(\alpha)$ is the Gamma function.

This formula describes the likelihood of observing a specific value $x$ for a random variable following a Gamma distribution with parameters $\alpha$ and $\beta$.

17.4 Gamma Distribution Mean and Variance

For a Gamma distribution with shape parameter $\alpha$ and rate parameter $\beta$:

Mean ($\text{E}[X]$): $\text{E}[X] = \frac{\alpha}{\beta}$
Variance ($\text{Var}[X]$): $\text{Var}[X] = \frac{\alpha}{\beta^2}$

These formulas provide a quick way to understand the central tendency and spread of the Gamma distribution based on its parameters.

17.5 Special Case 1: Exponential Distribution

The Exponential distribution is a special case of the Gamma distribution when the shape parameter $\alpha = 1$.

If $X \sim \text{Gamma}(\alpha=1, \beta)$, then $X \sim \text{Exponential}(\beta)$.

The PDF for the Exponential distribution is:

$f(x; \beta) = \beta e^{-\beta x}$, for $x \geq 0$.

The Exponential distribution is commonly used to model the time until the first event occurs in a Poisson process.

17.6 Examples of Exponential Distribution

Call center: If customer arrivals follow a Poisson process with an average rate of 2 customers per minute, the time between consecutive customer arrivals follows an Exponential distribution with $\beta=2$ minutes$^{-1}$.
Reliability engineering: The lifetime of electronic components that have a constant failure rate is often modeled using the Exponential distribution. If a component has a failure rate of 0.01 per hour, its lifetime follows an Exponential distribution with $\beta=0.01$ hour$^{-1}$.

17.7 Special Case 2: Chi-Square Distribution with Parameter “Degrees of Freedom”

The Chi-Square ($\chi^2$) distribution is another special case of the Gamma distribution. A Chi-Square distribution with $\nu$ degrees of freedom is equivalent to a Gamma distribution with:

Shape parameter: $\alpha = \frac{\nu}{2}$
Rate parameter: $\beta = \frac{1}{2}$

Therefore, if $X \sim \text{Gamma}(\alpha=\frac{\nu}{2}, \beta=\frac{1}{2})$, then $X \sim \chi^2(\nu)$.

The $\chi^2$ distribution arises in statistics in hypothesis testing, particularly in goodness-of-fit tests and tests for independence. It's also fundamental in confidence interval estimation for population variance.

17.8 Examples of Chi-Square Distribution

Goodness-of-Fit Test: When testing if observed frequencies in different categories match expected frequencies (e.g., in a survey), the test statistic often follows a Chi-Square distribution. The degrees of freedom depend on the number of categories and any constraints imposed.
Variance Estimation: In a sample from a normally distributed population, the quantity $\frac{(n-1)s^2}{\sigma^2}$ follows a Chi-Square distribution with $n-1$ degrees of freedom, where $s^2$ is the sample variance and $\sigma^2$ is the population variance. This is used for constructing confidence intervals for the population variance.

Gamma Distribution Explained: Statistics & ML Applications