Properties of Probability Density Functions (PDFs)
Explore the essential properties of Probability Density Functions (PDFs) for discrete and continuous random variables in statistics and machine learning.
9.6 Properties of Probability Density Functions (PDFs)
A Probability Density Function (PDF) describes the likelihood of a random variable taking specific values. The nature of the PDF differs significantly based on whether the random variable is discrete or continuous.
PDF for Discrete Random Variables
For a discrete random variable $X$, a PDF, denoted as $f_X(x)$, gives the probability that the variable $X$ takes on a specific value $x$. This is expressed as:
$P(X = x) = f_X(x)$
For $f_X(x)$ to be a valid probability mass function (PMF) for a discrete random variable, it must satisfy the following two conditions:
- Non-negativity: $f_X(x) \geq 0$ for all possible values of $x$.
- Normalization: The sum of probabilities over all possible values of $X$ must equal 1. $\sum_{x} f_X(x) = 1$
Example: Discrete PDF
Consider a bag containing 6 balls with the following weights (in kg):
- Balls 1, 2, 3: 0.5 kg
- Balls 4, 5: 0.25 kg
- Ball 6: 0.3 kg
Let the random variable $X$ represent the weight of a randomly selected ball. The discrete PDF for $X$ is calculated as follows:
- $f_X(0.5) = P(X = 0.5) = \frac{3}{6} = \frac{1}{2}$
- $f_X(0.25) = P(X = 0.25) = \frac{2}{6} = \frac{1}{3}$
- $f_X(0.3) = P(X = 0.3) = \frac{1}{6}$
Verification of PDF conditions:
- Non-negativity: All calculated probabilities (1/2, 1/3, 1/6) are greater than or equal to 0.
- Normalization: The sum of probabilities is $\frac{1}{2} + \frac{1}{3} + \frac{1}{6} = \frac{3}{6} + \frac{2}{6} + \frac{1}{6} = \frac{6}{6} = 1$.
Thus, the function satisfies the conditions for a valid discrete PDF.
PDF for Continuous Random Variables
For a continuous random variable $Y$, a PDF, denoted as $f_Y(y)$, describes the relative likelihood for the variable to take on a given value. Unlike discrete variables, the probability of a continuous variable taking on an exact value is zero. Instead, the PDF is used to calculate the probability that the variable falls within a specific interval.
The PDF for a continuous random variable must satisfy the following conditions:
- Non-negativity: $f_Y(y) \geq 0$ for all values of $y$ in its domain.
- Normalization: The total area under the PDF curve over its entire domain (from $-\infty$ to $+\infty$) must equal 1. $\int_{-\infty}^{\infty} f_Y(y) dy = 1$
The probability that a continuous random variable $Y$ lies between two values $a$ and $b$ is calculated by integrating the PDF from $a$ to $b$:
$P(a < Y < b) = \int_{a}^{b} f_Y(y) dy$
Example: Continuous PDF
Let a random variable $Y$ have the PDF:
$f_Y(y) = 10y^2(1 - y)$, for $0 < y < 2$
We want to find the probability $P(Y < 0.2)$.
Solution:
To find $P(Y < 0.2)$, we integrate the PDF from the lower bound of its domain (0) up to 0.2:
$P(Y < 0.2) = \int_{0}^{0.2} 10y^2(1 - y) dy$
First, expand the integrand: $10y^2 - 10y^3$
Now, perform the integration: $\int (10y^2 - 10y^3) dy = \frac{10y^3}{3} - \frac{10y^4}{4} + C = \frac{10}{3}y^3 - \frac{5}{2}y^4 + C$
Evaluate the definite integral from 0 to 0.2: $P(Y < 0.2) = \left[ \frac{10}{3}y^3 - \frac{5}{2}y^4 \right]_{0}^{0.2}$
$P(Y < 0.2) = \left( \frac{10}{3}(0.2)^3 - \frac{5}{2}(0.2)^4 \right) - \left( \frac{10}{3}(0)^3 - \frac{5}{2}(0)^4 \right)$
$P(Y < 0.2) = \left( \frac{10}{3}(0.008) - \frac{5}{2}(0.0016) \right) - (0)$
$P(Y < 0.2) = \left( \frac{0.08}{3} - \frac{0.008}{2} \right)$
$P(Y < 0.2) = 0.02666... - 0.004$
$P(Y < 0.2) = 0.02266...$
(Note: The raw content provided a slightly different calculation leading to 0.023. The calculation above shows the step-by-step process for clarity.)
Key Takeaways
- Discrete PDF (PMF): The sum of probabilities for all possible values must equal 1 ($\sum f_X(x) = 1$). Probabilities are assigned to specific, distinct values.
- Continuous PDF: The integral of the PDF over its entire domain must equal 1 ($\int f_X(x) dx = 1$). Probabilities are associated with intervals, not single points.
- Non-negativity: In both cases, the PDF (or PMF) must always be non-negative ($f(x) \geq 0$) over its entire domain.
- Calculation Method: Use summation for discrete random variables and integration for continuous random variables to find probabilities.
Related Concepts
- Probability Distribution Function (PDF)
- Probability Mass Function (PMF) - used for discrete variables
- Cumulative Distribution Function (CDF)
Interview Questions
- What is a Probability Density Function (PDF)?
- How does a discrete PDF differ from a continuous PDF?
- What conditions must a valid discrete PDF (PMF) satisfy?
- Explain how to verify if a function is a valid PDF.
- Why is the area under the PDF curve equal to 1 for continuous variables?
- Can the value of a PDF at a single point be interpreted as probability in continuous variables? Explain why or why not.
- How do you calculate the probability of an interval in a continuous PDF?
- Give an example of a discrete PDF in a real-life scenario.
- What is the importance of non-negativity in a PDF?
- How do you use integration in PDF-based probability calculations for continuous variables?
Probability Density Function (PDF) Formula Explained
Master the Probability Density Function (PDF) formula for continuous random variables. Understand its role in statistics and data analysis for AI & ML.
Discrete Probability Distributions: PMFs Explained
Explore discrete probability distributions, their definitions, and probability mass functions (PMFs). Essential for AI & ML modeling of random variables.