Continuous Probability Distributions with SciPy in Python

Explore continuous probability distributions in Python using SciPy. Learn to model continuous random variables for AI, ML, and statistical analysis.

Continuous Probability Distributions with SciPy in Python

Continuous probability distributions are fundamental tools in statistics used to model random variables that can assume any value within a defined range or interval. These distributions are vital in numerous real-world applications across physics, engineering, economics, and other scientific fields, particularly when dealing with measurements, time intervals, or any continuously varying phenomena.

Python's scipy.stats library offers powerful utilities for working with continuous probability distributions, simplifying the computation of probability density functions (PDFs), cumulative distribution functions (CDFs), and various other statistical measures.

What Are Continuous Probability Distributions?

A continuous distribution models random variables that can take on an infinite number of values within a given range. Unlike discrete distributions, which deal with countable outcomes, continuous distributions are applicable to variables such as height, weight, temperature, or time.

The scipy.stats module in SciPy supports a wide array of these distributions.

Key Continuous Distributions in SciPy

The scipy.stats module provides implementations for many common continuous probability distributions. Here are some of the most frequently used ones:

1. Normal Distribution (Gaussian Distribution)

The normal distribution is characterized by its symmetrical, bell-shaped curve, defined by its mean ($\mu$) and standard deviation ($\sigma$). It is widely employed in natural and social sciences to represent real-valued random variables.

  • SciPy Object: scipy.stats.norm

Example:

from scipy.stats import norm
import numpy as np
import matplotlib.pyplot as plt

# Define parameters
mean, std_dev = 0, 1

# Generate values for the x-axis
x_values = np.linspace(-5, 5, 100)

# Calculate PDF and CDF values
pdf_values = norm.pdf(x_values, mean, std_dev)
cdf_values = norm.cdf(x_values, mean, std_dev)

# Plotting PDF and CDF
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.plot(x_values, pdf_values, label='PDF')
plt.title('Normal Distribution - PDF')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(x_values, cdf_values, label='CDF', color='red')
plt.title('Normal Distribution - CDF')
plt.xlabel('x')
plt.ylabel('Cumulative Probability')
plt.legend()

plt.tight_layout()
plt.show()

2. Exponential Distribution

The exponential distribution models the time between events in a Poisson process. It is defined by a rate parameter ($\lambda$) and is commonly used in reliability engineering and queuing theory.

  • SciPy Object: scipy.stats.expon

Example:

from scipy.stats import expon
import numpy as np
import matplotlib.pyplot as plt

# Define rate parameter
rate = 1

# Generate values for the x-axis
x_values = np.linspace(0, 10, 100)

# Calculate PDF and CDF values
# The scale parameter for expon is 1/rate
pdf_values = expon.pdf(x_values, scale=1/rate)
cdf_values = expon.cdf(x_values, scale=1/rate)

# Plotting PDF and CDF
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.plot(x_values, pdf_values, label='PDF')
plt.title('Exponential Distribution - PDF')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(x_values, cdf_values, label='CDF', color='red')
plt.title('Exponential Distribution - CDF')
plt.xlabel('x')
plt.ylabel('Cumulative Probability')
plt.legend()

plt.tight_layout()
plt.show()

3. Gamma Distribution

The gamma distribution is a generalization of the exponential distribution, characterized by a shape parameter ($k$, often denoted as a in SciPy) and a scale parameter ($\theta$, often denoted as scale in SciPy). It is useful for modeling waiting times and failure rates.

  • SciPy Object: scipy.stats.gamma

Example:

from scipy.stats import gamma
import numpy as np
import matplotlib.pyplot as plt

# Define parameters
shape_param = 2  # Corresponds to 'a' in SciPy
scale_param = 1  # Corresponds to 'scale' in SciPy

# Generate values for the x-axis
x_values = np.linspace(0, 10, 100)

# Calculate PDF and CDF values
pdf_values = gamma.pdf(x_values, a=shape_param, scale=scale_param)
cdf_values = gamma.cdf(x_values, a=shape_param, scale=scale_param)

# Plotting PDF and CDF
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.plot(x_values, pdf_values, label='PDF')
plt.title('Gamma Distribution - PDF')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(x_values, cdf_values, label='CDF', color='red')
plt.title('Gamma Distribution - CDF')
plt.xlabel('x')
plt.ylabel('Cumulative Probability')
plt.legend()

plt.tight_layout()
plt.show()

4. Beta Distribution

The beta distribution is defined on the interval [0, 1] and is frequently used for modeling probabilities and proportions, particularly in Bayesian statistics. It is parameterized by two positive shape parameters, $\alpha$ and $\beta$.

  • SciPy Object: scipy.stats.beta

Example:

from scipy.stats import beta
import numpy as np
import matplotlib.pyplot as plt

# Define parameters
alpha_param = 2
beta_param = 5

# Generate values for the x-axis
x_values = np.linspace(0, 1, 100)

# Calculate PDF and CDF values
pdf_values = beta.pdf(x_values, a=alpha_param, b=beta_param)
cdf_values = beta.cdf(x_values, a=alpha_param, b=beta_param)

# Plotting PDF and CDF
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.plot(x_values, pdf_values, label='PDF')
plt.title('Beta Distribution - PDF')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(x_values, cdf_values, label='CDF', color='red')
plt.title('Beta Distribution - CDF')
plt.xlabel('x')
plt.ylabel('Cumulative Probability')
plt.legend()

plt.tight_layout()
plt.show()

Essential Statistical Functions for Continuous Distributions in SciPy

The scipy.stats module provides a comprehensive suite of functions to interact with continuous distributions efficiently. For any continuous distribution object (e.g., norm, expon), you can typically use the following methods:

FunctionDescription
pdf(x, *args, **kwds)Calculates the Probability Density Function at point x.
cdf(x, *args, **kwds)Computes the Cumulative Distribution Function up to point x.
ppf(q, *args, **kwds)Returns the Percent Point Function (inverse of CDF) for a given probability q.
rvs(loc=0, scale=1, size=1, random_state=None, *args, **kwds)Generates random variates from the distribution.
mean(*args, **kwds)Computes the mean of the distribution.
var(*args, **kwds)Computes the variance of the distribution.
std(*args, **kwds)Computes the standard deviation of the distribution.
interval(alpha, *args, **kwds)Returns the length of the interval containing alpha probability mass.
  • *args and **kwds refer to the parameters specific to each distribution (e.g., loc and scale for norm, a and b for beta).

Example: Calculating Mean and Variance of a Normal Distribution

from scipy.stats import norm

# Mean and variance for a standard normal distribution (mean=0, std_dev=1)
mean_val = norm.mean(loc=0, scale=1)
variance_val = norm.var(loc=0, scale=1)

print(f"Mean of Standard Normal Distribution: {mean_val}")
print(f"Variance of Standard Normal Distribution: {variance_val}")

Output:

Mean of Standard Normal Distribution: 0.0
Variance of Standard Normal Distribution: 1.0

Conclusion

The scipy.stats module is an indispensable tool for statistical modeling using continuous probability distributions. Whether you are working with the Normal, Exponential, Gamma, Beta, or many other distributions, SciPy provides intuitive and powerful methods to analyze data, visualize statistical behavior, and conduct simulations. By leveraging these capabilities, you can introduce statistical precision and clarity into your real-world data analysis tasks in Python.