Explore the essential properties of Probability Density Functions (PDFs) for discrete and continuous random variables in statistics and machine learning.

9.6 Properties of Probability Density Functions (PDFs)

A Probability Density Function (PDF) describes the likelihood of a random variable taking specific values. The nature of the PDF differs significantly based on whether the random variable is discrete or continuous.

PDF for Discrete Random Variables

For a discrete random variable $X$, a PDF, denoted as $f_X(x)$, gives the probability that the variable $X$ takes on a specific value $x$. This is expressed as:

$P(X = x) = f_X(x)$

For $f_X(x)$ to be a valid probability mass function (PMF) for a discrete random variable, it must satisfy the following two conditions:

Non-negativity: $f_X(x) \geq 0$ for all possible values of $x$.
Normalization: The sum of probabilities over all possible values of $X$ must equal 1. $\sum_{x} f_X(x) = 1$

Example: Discrete PDF

Consider a bag containing 6 balls with the following weights (in kg):

Balls 1, 2, 3: 0.5 kg
Balls 4, 5: 0.25 kg
Ball 6: 0.3 kg

Let the random variable $X$ represent the weight of a randomly selected ball. The discrete PDF for $X$ is calculated as follows:

$f_X(0.5) = P(X = 0.5) = \frac{3}{6} = \frac{1}{2}$
$f_X(0.25) = P(X = 0.25) = \frac{2}{6} = \frac{1}{3}$
$f_X(0.3) = P(X = 0.3) = \frac{1}{6}$

Verification of PDF conditions:

Non-negativity: All calculated probabilities (1/2, 1/3, 1/6) are greater than or equal to 0.
Normalization: The sum of probabilities is $\frac{1}{2} + \frac{1}{3} + \frac{1}{6} = \frac{3}{6} + \frac{2}{6} + \frac{1}{6} = \frac{6}{6} = 1$.

Thus, the function satisfies the conditions for a valid discrete PDF.

PDF for Continuous Random Variables

For a continuous random variable $Y$, a PDF, denoted as $f_Y(y)$, describes the relative likelihood for the variable to take on a given value. Unlike discrete variables, the probability of a continuous variable taking on an exact value is zero. Instead, the PDF is used to calculate the probability that the variable falls within a specific interval.

The PDF for a continuous random variable must satisfy the following conditions:

Non-negativity: $f_Y(y) \geq 0$ for all values of $y$ in its domain.
Normalization: The total area under the PDF curve over its entire domain (from $-\infty$ to $+\infty$) must equal 1. $\int_{-\infty}^{\infty} f_Y(y) dy = 1$

The probability that a continuous random variable $Y$ lies between two values $a$ and $b$ is calculated by integrating the PDF from $a$ to $b$:

$P(a < Y < b) = \int_{a}^{b} f_Y(y) dy$

Example: Continuous PDF

Let a random variable $Y$ have the PDF:

$f_Y(y) = 10y^2(1 - y)$, for $0 < y < 2$

We want to find the probability $P(Y < 0.2)$.

Solution:

To find $P(Y < 0.2)$, we integrate the PDF from the lower bound of its domain (0) up to 0.2:

$P(Y < 0.2) = \int_{0}^{0.2} 10y^2(1 - y) dy$

First, expand the integrand: $10y^2 - 10y^3$

Now, perform the integration: $\int (10y^2 - 10y^3) dy = \frac{10y^3}{3} - \frac{10y^4}{4} + C = \frac{10}{3}y^3 - \frac{5}{2}y^4 + C$

Evaluate the definite integral from 0 to 0.2: $P(Y < 0.2) = \left[ \frac{10}{3}y^3 - \frac{5}{2}y^4 \right]_{0}^{0.2}$

$P(Y < 0.2) = \left( \frac{10}{3}(0.2)^3 - \frac{5}{2}(0.2)^4 \right) - \left( \frac{10}{3}(0)^3 - \frac{5}{2}(0)^4 \right)$

$P(Y < 0.2) = \left( \frac{10}{3}(0.008) - \frac{5}{2}(0.0016) \right) - (0)$

$P(Y < 0.2) = \left( \frac{0.08}{3} - \frac{0.008}{2} \right)$

$P(Y < 0.2) = 0.02666... - 0.004$

$P(Y < 0.2) = 0.02266...$

(Note: The raw content provided a slightly different calculation leading to 0.023. The calculation above shows the step-by-step process for clarity.)

Key Takeaways

Discrete PDF (PMF): The sum of probabilities for all possible values must equal 1 ($\sum f_X(x) = 1$). Probabilities are assigned to specific, distinct values.
Continuous PDF: The integral of the PDF over its entire domain must equal 1 ($\int f_X(x) dx = 1$). Probabilities are associated with intervals, not single points.
Non-negativity: In both cases, the PDF (or PMF) must always be non-negative ($f(x) \geq 0$) over its entire domain.
Calculation Method: Use summation for discrete random variables and integration for continuous random variables to find probabilities.

Probability Distribution Function (PDF)
Probability Mass Function (PMF) - used for discrete variables
Cumulative Distribution Function (CDF)

Interview Questions

What is a Probability Density Function (PDF)?
How does a discrete PDF differ from a continuous PDF?
What conditions must a valid discrete PDF (PMF) satisfy?
Explain how to verify if a function is a valid PDF.
Why is the area under the PDF curve equal to 1 for continuous variables?
Can the value of a PDF at a single point be interpreted as probability in continuous variables? Explain why or why not.
How do you calculate the probability of an interval in a continuous PDF?
Give an example of a discrete PDF in a real-life scenario.
What is the importance of non-negativity in a PDF?
How do you use integration in PDF-based probability calculations for continuous variables?

Properties of Probability Density Functions (PDFs)

On this page