Measures of Central Tendency: Mean, Median, Mode Explained
Understand measures of central tendency like mean, median, and mode. Learn how these statistical values summarize data for AI & machine learning.
Measures of Central Tendency
Measures of central tendency are statistical values that represent the center or typical value of a dataset. They provide a single value that summarizes the data, indicating where most of the data points tend to cluster. The most common measures of central tendency are the mean, median, and mode.
Mean
The mean, often referred to as the average, is calculated by summing all the values in a dataset and then dividing by the total number of values.
Formula
$$ \text{Mean} (\bar{x}) = \frac{\sum_{i=1}^{n} x_i}{n} $$
Where:
- $\sum_{i=1}^{n} x_i$ is the sum of all values in the dataset.
- $n$ is the total number of values in the dataset.
Example
Consider the following dataset: ${10, 12, 15, 18, 20}$
The sum of the values is $10 + 12 + 15 + 18 + 20 = 75$. The number of values is $n=5$.
Therefore, the mean is: $$ \bar{x} = \frac{75}{5} = 15 $$
The mean is sensitive to outliers (extreme values).
Median
The median is the middle value in a dataset that has been ordered from least to greatest. If the dataset has an even number of values, the median is the average of the two middle values.
Calculation
- Order the data: Arrange the dataset in ascending or descending order.
- Find the middle value:
- If the number of data points ($n$) is odd, the median is the value at the $\frac{n+1}{2}$ position.
- If the number of data points ($n$) is even, the median is the average of the values at the $\frac{n}{2}$ and $\frac{n}{2} + 1$ positions.
Example
Consider the following dataset: ${10, 12, 15, 18, 20}$
- The data is already ordered.
- The number of values is $n=5$ (odd). The middle position is $\frac{5+1}{2} = 3$.
- The value at the 3rd position is 15.
Therefore, the median is 15.
Consider another dataset: ${10, 12, 15, 18, 20, 22}$
- The data is already ordered.
- The number of values is $n=6$ (even). The middle positions are $\frac{6}{2} = 3$ and $\frac{6}{2} + 1 = 4$.
- The values at the 3rd and 4th positions are 15 and 18.
- The median is the average of these two values: $\frac{15 + 18}{2} = \frac{33}{2} = 16.5$.
The median is less affected by outliers than the mean.
Mode
The mode is the value that appears most frequently in a dataset. A dataset can have one mode (unimodal), more than one mode (multimodal), or no mode if all values appear with the same frequency.
Example
Consider the following dataset: ${10, 12, 15, 15, 18, 20, 20, 20, 22}$
The value 20 appears three times, which is more frequent than any other value.
Therefore, the mode is 20.
Consider the dataset: ${10, 12, 15, 18, 20}$ All values appear only once, so there is no mode.
Consider the dataset: ${10, 12, 12, 15, 15, 18, 20}$ Both 12 and 15 appear twice.
Therefore, this dataset is bimodal with modes 12 and 15.
The mode is useful for identifying the most common outcome in a categorical or discrete dataset.
Visualize Data: Graphs & Tables for AI/ML Variables
Unlock insights with graphical representation of variables in AI & ML. Learn how graphs and tables simplify complex data for better understanding and analysis.
Understanding the Mean: Central Tendency in Data Analysis
Learn how to calculate the mean, a fundamental statistical measure of central tendency. Discover its definition, formula, and importance in data analysis for LLMs and AI.