Learn what a Probability Density Function (PDF) is and how it describes continuous random variables. Essential for statistics, machine learning, and AI.

9.4 Probability Density Function (PDF)

A Probability Density Function (PDF) is a fundamental concept in statistics used to describe the distribution of continuous random variables. Unlike discrete variables, which can only take on specific, separate values, continuous random variables can assume any value within a given range. The PDF quantifies the relative likelihood for a continuous random variable to take on a given value.

The core idea behind a PDF is that it represents the density of probability over a range. The probability that a continuous random variable falls within a specific interval is determined by calculating the area under the curve of its PDF over that interval.

Key Properties of a Valid PDF

For a function $f(x)$ to be a valid Probability Density Function, it must satisfy the following key properties:

1. Non-Negativity

The probability density function must always be non-negative for all possible values of the random variable $x$. This means the graph of the PDF never dips below the x-axis.

$$ f(x) \ge 0 \quad \text{for all } x $$

This property is intuitive because probabilities themselves cannot be negative.

2. Total Area Under the Curve Equals 1

The total probability across the entire domain (all possible values) of the continuous random variable must sum up to exactly 1. This is represented by the integral of the PDF over its entire range.

$$ \int_{-\infty}^{\infty} f(x) , dx = 1 $$

This integral represents the entire probability space. In practice, the limits of integration are often adjusted to the actual range over which the random variable is defined.

3. Probability as Area Under the Curve

The probability that a continuous random variable $X$ falls within a specific interval $[a, b]$ is calculated by finding the area under the PDF curve between $a$ and $b$.

$$ P(a \le X \le b) = \int_{a}^{b} f(x) , dx $$

It's crucial to understand that $f(x)$ itself is not a probability. Instead, it describes the density of probability at point $x$. Probability is only meaningful when considering an interval, and it's represented by the accumulated density (the area).

4. Smooth and Continuous Curve

A PDF is typically represented by a smooth and continuous curve. This reflects the nature of continuous random variables, which can take any real value within a specified range, without jumps or gaps in their distribution.

5. Defined Over a Continuous Domain

The PDF is defined and operates over a continuous range of values. It describes how probability is distributed across an interval, rather than at isolated, discrete points. For any single specific point $x_0$, the probability of the random variable being exactly equal to $x_0$ is zero: $P(X = x_0) = 0$. This is because there are infinitely many possible values in any interval, making the probability of hitting any single value infinitesimally small.

Summary of PDF Conditions

To be a valid PDF, a function $f(x)$ must satisfy:

Non-negativity: $f(x) \ge 0$ for all $x$.
Normalization: The total integral over the domain is 1: $\int_{-\infty}^{\infty} f(x) , dx = 1$.
Probability of an Interval: The probability $P(a \le X \le b)$ is the integral of $f(x)$ from $a$ to $b$: $P(a \le X \le b) = \int_{a}^{b} f(x) , dx$.

Examples of PDFs

Common examples of probability density functions include:

Uniform Distribution: A constant PDF over a specific interval $[a, b]$.
Normal Distribution (Gaussian Distribution): A bell-shaped curve, commonly used to model many natural phenomena.
Exponential Distribution: Often used to model the time until an event occurs in a Poisson process.

PDF vs. PMF

It's important to distinguish a PDF from a Probability Mass Function (PMF).

PDF: Used for continuous random variables. Probabilities are calculated by integrating the function over an interval. $P(X=x) = 0$ for any specific $x$.
PMF: Used for discrete random variables. Probabilities are assigned to specific values (e.g., $P(X=k)$). The sum of probabilities for all possible discrete values equals 1.

Interview Questions on PDFs

What is a probability density function (PDF)?
How is a PDF different from a probability mass function (PMF)?
Can a PDF value exceed 1? Why or why not? (Yes, the value of $f(x)$ can exceed 1, but the area under the curve between any two points must be between 0 and 1. This is because $f(x)$ represents density, not probability itself.)
Why must the total area under a PDF curve equal 1?
What does it mean when the PDF of a variable is zero at a point? (It means the probability of the continuous random variable taking on that exact specific value is zero.)
How do you calculate the probability that a random variable falls within a certain interval using a PDF? (By integrating the PDF over that interval.)
What are the key conditions a function must satisfy to be a valid PDF? (Non-negativity and total area under the curve equals 1).
Explain the significance of the non-negativity condition in PDFs. (Probabilities cannot be negative.)
In what types of problems would you use a PDF instead of a PMF? (Problems involving measurements, time, height, weight, temperature, etc., where values can take any value within a range.)
Give an example of a real-world situation where a PDF is used. (Modeling the distribution of heights of adults in a population, the time between customer arrivals at a store, or the lifespan of a light bulb.)

Probability Density Function (PDF) in Statistics & ML