19.5 Lognormal Distribution Examples in AI & ML

Explore 19.5 diverse examples of the Lognormal Distribution, crucial for modeling skewed data in AI, ML, finance, biology, and network traffic.

19.5 Examples of the Lognormal Distribution

The Lognormal Distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. It is particularly useful for modeling data that exhibits positive skewness and arises from multiplicative processes. Common applications include modeling financial asset returns, biological measurements (like cell sizes or protein concentrations), and network traffic.

Understanding the Lognormal Distribution

A random variable $X$ follows a lognormal distribution if its logarithm, $\ln(X)$, follows a normal distribution. The normal distribution is defined by its mean ($\mu'$) and standard deviation ($\sigma'$). However, the lognormal distribution is often parameterized using its geometric mean and geometric standard deviation.

  • Geometric Mean ($e^{\mu'}$): This is the median of the lognormal distribution. It represents the central tendency when dealing with multiplicative effects.
  • Geometric Standard Deviation ($e^{\sigma'}$): This measures the spread or dispersion of the distribution on a multiplicative scale.

It's important to distinguish these parameters from the mean ($\mu$) and standard deviation ($\sigma$) of the underlying normal distribution of the logarithm. The formulas provided below use the parameters of the underlying normal distribution ($ \mu' $ and $ \sigma $).

Key Lognormal Distribution Formulas

Let $X$ be a random variable following a lognormal distribution. The parameters often used are $\mu'$ (the mean of $\ln(X)$) and $\sigma$ (the standard deviation of $\ln(X)$).

  • Mean ($\mathbb{E}[X]$): $$ \mathbb{E}[X] = e^{\mu' + \frac{\sigma^2}{2}} $$ Note: Some sources use $ \mu $ and $ \sigma $ directly for the parameters of the lognormal distribution, where $ \mu $ is the mean of the log-transformed variable and $ \sigma $ is the standard deviation of the log-transformed variable. If the problem statement provides "geometric mean" and "geometric standard deviation," these refer to $e^{\mu'}$ and $e^{\sigma}$ respectively. We will assume $\mu'$ and $\sigma$ are the parameters of the underlying normal distribution for consistency with common statistical software and texts.

  • Variance ($\text{Var}[X]$): $$ \text{Var}[X] = (e^{\sigma^2} - 1) e^{2\mu' + \sigma^2} $$

  • Median: $$ \text{Median}(X) = e^{\mu'} $$

  • Relationship between Geometric Mean and Median: The geometric mean of a lognormal distribution is its median.

Example 1: Variance of Daily Website Visitors

Problem: The daily website visitors of a blog follow a lognormal distribution. The logarithm of the daily visitors has a mean ($\mu'$) of 50 and a standard deviation ($\sigma$) of 1.1. Calculate the variance of the daily visitors.

Given:

  • Mean of the log-transformed data ($\mu'$) = 50
  • Standard deviation of the log-transformed data ($\sigma$) = 1.1

Solution:

  1. Calculate $\sigma^2$: $$ \sigma^2 = (1.1)^2 = 1.21 $$

  2. Plug values into the variance formula: $$ \text{Var}[X] = (e^{\sigma^2} - 1) e^{2\mu' + \sigma^2} $$ $$ \text{Var}[X] = (e^{1.21} - 1) e^{(2 \times 50) + 1.21} $$ $$ \text{Var}[X] = (e^{1.21} - 1) e^{100 + 1.21} $$ $$ \text{Var}[X] = (e^{1.21} - 1) e^{101.21} $$

  3. Compute the exponential terms:

    • $e^{1.21} \approx 3.3537$
    • $e^{101.21}$ is a very large number. Using a calculator: $e^{101.21} \approx 9.055 \times 10^{43}$
  4. Calculate the variance: $$ \text{Var}[X] \approx (3.3537 - 1) \times 9.055 \times 10^{43} $$ $$ \text{Var}[X] \approx 2.3537 \times 9.055 \times 10^{43} $$ $$ \text{Var}[X] \approx 21.31 \times 10^{43} $$ $$ \text{Var}[X] \approx 2.131 \times 10^{44} $$

Answer: The variance of the daily website visitors is approximately $2.131 \times 10^{44}$.

Note: The original scraped content had a different interpretation of the parameters. If $\mu$ and $\sigma$ were intended to be the geometric mean and geometric standard deviation respectively, the interpretation of "mean (μ)" and "geometric standard deviation (σ)" in the original problem statements might need clarification. However, based on standard lognormal parameterization for mean and variance calculations, $\mu'$ and $\sigma$ are used for the underlying normal distribution's parameters.

Example 2: Mean Population of a Village

Problem: The population of a village follows a lognormal distribution. The median population is 1,000, and the geometric standard deviation ($\sigma$) is 1.2. Find the mean (average) population.

Given:

  • Median population = 1,000
  • Geometric standard deviation ($\sigma$) = 1.2

Solution:

  1. Relate median to $\mu'$: The median of a lognormal distribution is $e^{\mu'}$. $$ \text{Median} = e^{\mu'} = 1000 $$ To find $\mu'$, we take the natural logarithm of both sides: $$ \mu' = \ln(1000) $$ $$ \mu' \approx 6.9078 $$

  2. Calculate $\sigma^2$: $$ \sigma^2 = (1.2)^2 = 1.44 $$

  3. Use the formula for the mean: The formula for the mean of a lognormal distribution is: $$ \mathbb{E}[X] = e^{\mu' + \frac{\sigma^2}{2}} $$ $$ \mathbb{E}[X] = e^{6.9078 + \frac{1.44}{2}} $$ $$ \mathbb{E}[X] = e^{6.9078 + 0.72} $$ $$ \mathbb{E}[X] = e^{7.6278} $$

  4. Compute the mean: $$ \mathbb{E}[X] \approx 2054.61 $$

Answer: The mean population of the village is approximately 2054.61.

Note: The original scraped content stated $\mu = 50$ as the geometric mean in Example 1. If "geometric mean" is taken literally as $e^{\mu'}$, then $\mu' = \ln(50)$. However, the provided formula used $\mu$ directly. For clarity and standard practice, we use $\mu'$ for the mean of the log-transformed variable. If the problem intended $\mu$ to be the geometric mean (i.e., $e^{\mu'}$), then the calculations would differ.

Summary of Key Lognormal Parameters and Formulas

  • Parameterization: A lognormal distribution is typically defined by the mean ($\mu'$) and standard deviation ($\sigma$) of the associated normal distribution of $\ln(X)$.
  • Mean: $\mathbb{E}[X] = e^{\mu' + \frac{\sigma^2}{2}}$
  • Variance: $\text{Var}[X] = (e^{\sigma^2} - 1) e^{2\mu' + \sigma^2}$
  • Median: $\text{Median}(X) = e^{\mu'}$
  • Relationship: The geometric mean of a lognormal distribution is its median ($e^{\mu'}$).

Applications of the Lognormal Distribution

The lognormal distribution is widely used in various fields due to its ability to model positively skewed data and phenomena driven by multiplicative factors:

  • Finance: Modeling asset prices, investment returns, and option pricing.
  • Biology and Medicine: Analyzing sizes of cells, bacteria, blood cells, and concentrations of biological substances.
  • Environmental Science: Modeling pollutant concentrations, rainfall amounts, and seismic magnitudes.
  • Reliability Engineering: Describing failure times of components.
  • Computer Science: Modeling network traffic, file sizes, and queuing system performance.
  • Economics: Income distributions, business revenue.

Interview Questions on Lognormal Distribution

  1. How do you calculate the variance of a lognormal distribution? The variance is calculated using the formula $\text{Var}[X] = (e^{\sigma^2} - 1) e^{2\mu' + \sigma^2}$, where $\mu'$ is the mean and $\sigma$ is the standard deviation of the underlying normal distribution of $\ln(X)$.

  2. What is the formula for the mean of a lognormal distribution? The mean is given by $\mathbb{E}[X] = e^{\mu' + \frac{\sigma^2}{2}}$, where $\mu'$ is the mean and $\sigma$ is the standard deviation of the underlying normal distribution of $\ln(X)$.

  3. How is the geometric mean related to the median in a lognormal distribution? The geometric mean of a lognormal distribution is equal to its median. Both are calculated as $e^{\mu'}$, where $\mu'$ is the mean of the logarithm of the variable.

  4. Why is the lognormal distribution suitable for modeling positively skewed data? The lognormal distribution inherently produces data with a long right tail (positive skewness) because the underlying distribution of the logarithm is symmetric. Multiplicative processes, which often lead to positive skewness, are naturally modeled by this distribution.

  5. Can you explain the steps to compute the mean population using lognormal parameters? First, find $\mu'$ by taking the natural logarithm of the median population ($\mu' = \ln(\text{Median})$). Then, calculate $\sigma^2$ from the given geometric standard deviation ($\sigma$) as $\sigma^2 = (\ln(\text{Geometric Standard Deviation}))^2$. Finally, plug $\mu'$ and $\sigma^2$ into the mean formula: $\mathbb{E}[X] = e^{\mu' + \frac{\sigma^2}{2}}$.

  6. How does the variance formula of a lognormal distribution differ from that of a normal distribution? The variance of a normal distribution is simply $\sigma^2$. The variance of a lognormal distribution is $\text{Var}[X] = (e^{\sigma^2} - 1) e^{2\mu' + \sigma^2}$. The lognormal variance depends on both the mean ($\mu'$) and standard deviation ($\sigma$) of the log-transformed variable, and it grows much faster with increasing $\sigma$ due to the exponential terms.

  7. In what fields is the lognormal distribution commonly applied? It's applied in finance (asset returns), biology (cell sizes), environmental science (pollutant levels), reliability engineering (failure times), and economics (income distributions).

  8. What does the geometric standard deviation represent in a lognormal distribution? The geometric standard deviation ($e^\sigma$) represents the multiplicative spread of the distribution. It indicates how much the data typically deviates from the median on a percentage basis. A value of $e^\sigma = 2$ means that roughly 68% of the data falls between Median/$e^\sigma$ and Median $\times e^\sigma$.

  9. How do you interpret the parameters $\mu'$ and $\sigma$ in the lognormal mean and variance formulas? $\mu'$ is the mean of the natural logarithm of the random variable ($\ln(X)$), and it determines the location of the distribution's peak. $\sigma$ is the standard deviation of $\ln(X)$, and it controls the spread or skewness of the distribution. A larger $\sigma$ leads to greater skewness and a wider spread.

  10. Can you provide a real-world example where the lognormal distribution is used and explain the calculation of its mean or variance? Consider the income distribution of a country. If incomes are found to follow a lognormal distribution, and we determine that the mean of the log of incomes ($\mu'$) is 10 (natural log units) and the standard deviation of the log of incomes ($\sigma$) is 0.5, then:

    • The median income is $e^{10} \approx $22,026$.
    • The average (mean) income is $e^{10 + 0.5^2/2} = e^{10 + 0.125} = e^{10.125} \approx $24,963$.
    • The variance of income is $(e^{0.5^2} - 1)e^{2(10) + 0.5^2} = (e^{0.25} - 1)e^{20.25} \approx (1.284 - 1)e^{20.25} \approx 0.284 \times 5.107 \times 10^8 \approx 1.45 \times 10^8$. This illustrates how even a moderate $\sigma$ can lead to a large variance when $\mu'$ is large.