Kelly's Skewness: Robust Measure of Data Asymmetry
Discover Kelly's Skewness, a percentile-based statistic for data asymmetry. Ideal for skewed distributions, it's a robust alternative to mean-based measures in ML.
4.11 Kelly's Measure of Skewness
Kelly's Coefficient of Skewness is a robust, percentile-based statistical measure used to quantify the asymmetry (skewness) in a dataset. Unlike traditional skewness measures that rely on the mean and standard deviation, Kelly's method utilizes specific percentiles, making it less sensitive to extreme values and more suitable for skewed or non-normal distributions.
What is Kelly's Coefficient of Skewness?
Kelly's Skewness focuses on the 10th, 50th (median), and 90th percentiles to determine how data is distributed around its center. It is particularly valuable for analyzing distributions where outliers or irregularities might distort other skewness metrics. This makes it a preferred choice when dealing with financial data, income distributions, or other datasets prone to extreme values.
Formula
The formula for calculating Kelly's Skewness Coefficient ($SKL$) is:
$$ SKL = \frac{P_{90} + P_{10} - 2 \times P_{50}}{P_{90} - P_{10}} $$
Where:
- $P_{10}$ = 10th percentile
- $P_{50}$ = 50th percentile (Median)
- $P_{90}$ = 90th percentile
Interpretation of Kelly's Skewness
The calculated value of $SKL$ provides insight into the symmetry of the distribution:
- $SKL > 0$: Positively skewed (right-skewed). The right tail of the distribution is longer or fatter than the left tail. The bulk of the data is concentrated on the left.
- $SKL < 0$: Negatively skewed (left-skewed). The left tail of the distribution is longer or fatter than the right tail. The bulk of the data is concentrated on the right.
- $SKL \approx 0$: Symmetric distribution. The tails on both sides of the central part of the distribution are approximately equally weighted.
Example Calculation
Let's calculate Kelly's Coefficient of Skewness for the following dataset:
Dataset: 4, 6, 8, 9, 10, 13, 14, 17, 19, 22
Step 1: Sort the Data
The dataset is already sorted in ascending order. Number of values ($n$) = 10.
Step 2: Determine Percentile Positions
To find the percentiles, we can use the following methods:
- 10th Percentile ($P_{10}$): For the 10th percentile, we find the position $0.10 \times n$. If $n=10$, this is $0.10 \times 10 = 1$. This indicates the 1st value.
- 50th Percentile ($P_{50}$): For the 50th percentile (median), we find the position $0.50 \times n$. If $n=10$, this is $0.50 \times 10 = 5$. Since $n$ is even, the median is the average of the 5th and 6th values.
- 90th Percentile ($P_{90}$): For the 90th percentile, we find the position $0.90 \times n$. If $n=10$, this is $0.90 \times 10 = 9$. This indicates the 9th value.
Step 3: Extract Percentile Values
From the sorted dataset: 4, 6, 8, 9, 10, 13, 14, 17, 19, 22
- $P_{10}$ = The 1st value = 4
- $P_{50}$ = Average of the 5th (10) and 6th (13) values = $(10 + 13) / 2$ = 11.5
- $P_{90}$ = The 9th value = 19
Step 4: Apply the Formula
$$ SKL = \frac{P_{90} + P_{10} - 2 \times P_{50}}{P_{90} - P_{10}} $$
Substitute the values:
$$ SKL = \frac{19 + 4 - 2 \times 11.5}{19 - 4} $$ $$ SKL = \frac{23 - 23}{15} $$ $$ SKL = \frac{0}{15} $$ $$ SKL = 0.0 $$
Result:
Kelly's Coefficient of Skewness is 0.0, indicating a symmetrical distribution for this particular dataset.
When to Use Kelly's Skewness
- Outlier Resistance: When you need a skewness measure that is less affected by extreme values.
- Non-Normal Distributions: Ideal for analyzing data that deviates significantly from a normal distribution.
- Percentile-Based Analysis: When percentile-based summaries are preferred or more meaningful for the data.
- Financial and Economic Data: Frequently used in fields where skewed distributions are common, such as income, stock returns, or housing prices.
When to Avoid Solely Using Kelly's Skewness
- Perfectly Normal Distributions: In distributions that are already known to be perfectly normal, traditional moment-based skewness measures (like Pearson's coefficient of skewness) might suffice and be more commonly understood.
- Small Sample Sizes: While robust, percentile calculations can be less stable with very small sample sizes.
Limitations
- Percentile Calculation Method: The exact values of percentiles can vary slightly depending on the specific method used for calculation (e.g., linear interpolation, nearest rank).
- Data Requirements: Requires a dataset where 10th, 50th, and 90th percentiles can be meaningfully calculated. It's generally not suitable for nominal or strictly categorical data.
Interview Questions
- What is Kelly's Coefficient of Skewness?
- How does Kelly's skewness differ from Pearson's and Bowley's skewness measures?
- Why is Kelly's coefficient considered robust?
- What percentiles are used in Kelly's skewness calculation?
- In which situations would you prefer Kelly's skewness over other skewness measures?
- Explain the formula for Kelly's Coefficient of Skewness.
- What does a Kelly's skewness value of 0 indicate?
- How does Kelly's method handle outliers in a dataset?
- Can Kelly's skewness be applied to ordinal data? If so, with what considerations?
- What are the main limitations of using Kelly's Coefficient of Skewness?
SEO Keywords
Kelly skewness, Percentile-based skewness, Skewness using percentiles, Kelly’s coefficient formula, P10 P50 P90 skewness, Kelly skewness example, Robust skewness measure, Non-parametric skewness, Skewness for outlier data, Kelly’s method statistics.
Bowley's Measure of Skewness: AI & Data Analysis
Learn about Bowley's Coefficient of Skewness, a quartile-based AI statistic for measuring data distribution asymmetry, robust against outliers in ML.
Understanding Skewness in Data: Interpretation & Analysis
Master skewness interpretation in statistics. Learn how asymmetry reveals data characteristics, identifies outliers, and impacts ML model suitability.