Learn how the Chi-Square Goodness of Fit test evaluates observed categorical data against expected distributions in AI and Machine Learning.

22.4.2 Chi-Square Goodness of Fit Test

The Chi-Square Goodness of Fit Test is a statistical hypothesis test used to determine whether observed categorical data matches an expected distribution. It helps assess if a sample's distribution aligns with a particular theoretical or hypothesized distribution.

When to Use the Chi-Square Goodness of Fit Test

This test is appropriate when:

You are working with one categorical variable.
You want to compare observed frequencies of this variable against expected frequencies derived from a specific hypothesis or known distribution.

Common scenarios include:

Testing if a categorical variable is evenly distributed across its categories (e.g., assuming each outcome is equally likely).
Verifying if the observed frequencies of a categorical variable in a sample align with known population proportions.

Example: Checking if a six-sided die is fair by comparing the observed counts of each face appearing in a series of rolls against the expected count (where each face should appear with equal probability).

Chi-Square Goodness of Fit Formula

The core of the test involves calculating a chi-square ($\chi^2$) statistic, which quantifies the discrepancy between observed and expected frequencies.

The formula is:

$$ \chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i} $$

Where:

$\chi^2$: The calculated chi-square statistic.
$k$: The total number of categories in the variable.
$O_i$: The observed frequency (actual count) in category $i$.
$E_i$: The expected frequency (count) in category $i$, based on the hypothesized distribution.
$\sum$: The summation symbol, indicating that the calculation is performed for each category and then summed up.

Steps to Perform the Chi-Square Goodness of Fit Test

State the Hypotheses:
- Null Hypothesis ($H_0$): The observed data fits the expected distribution.
- Alternative Hypothesis ($H_a$): The observed data does not fit the expected distribution.
Define Expected Frequencies: Determine the expected count for each category based on your hypothesis. For example, if you hypothesize a fair six-sided die, the expected frequency for each face is the total number of rolls divided by 6.
Collect Observed Frequencies: Gather the actual counts (observed frequencies) for each category from your sample data.
Calculate the Chi-Square Statistic: Use the formula above to compute the $\chi^2$ value.
Determine Degrees of Freedom (df): The degrees of freedom are calculated as: $$ df = k - 1 $$ Where $k$ is the number of categories.
Compare and Decide:
- Using a Critical Value: Compare the calculated $\chi^2$ statistic to a critical value from a chi-square distribution table (based on your chosen significance level, $\alpha$, and the degrees of freedom).
- Using a p-value: Calculate the p-value associated with your calculated $\chi^2$ statistic and degrees of freedom.
- Decision Rule:
  - If the calculated $\chi^2$ is greater than the critical value, reject $H_0$.
  - If the p-value is less than your chosen significance level (commonly $\alpha = 0.05$), reject $H_0$.
  - If the calculated $\chi^2$ is less than or equal to the critical value, fail to reject $H_0$.
  - If the p-value is greater than or equal to your chosen significance level, fail to reject $H_0$.
Interpret the Results:
- Rejecting $H_0$ means there is statistically significant evidence that the observed data does not fit the hypothesized distribution.
- Failing to reject $H_0$ means there is not enough statistical evidence to conclude that the observed data deviates from the hypothesized distribution.

Example: Testing Dice Fairness

Let's test if a six-sided die is fair.

Hypothesis:
- $H_0$: The die is fair (each side has a probability of 1/6).
- $H_a$: The die is not fair.
Data: Suppose you roll the die 120 times and observe the following counts:
- 1: 15 times
- 2: 25 times
- 3: 18 times
- 4: 22 times
- 5: 17 times
- 6: 23 times
Expected Frequencies: If the die is fair, each side should appear approximately $120 / 6 = 20$ times.
- Expected for each side = 20
Calculate $\chi^2$:
- Category 1: $\frac{(15 - 20)^2}{20} = \frac{(-5)^2}{20} = \frac{25}{20} = 1.25$
- Category 2: $\frac{(25 - 20)^2}{20} = \frac{(5)^2}{20} = \frac{25}{20} = 1.25$
- Category 3: $\frac{(18 - 20)^2}{20} = \frac{(-2)^2}{20} = \frac{4}{20} = 0.20$
- Category 4: $\frac{(22 - 20)^2}{20} = \frac{(2)^2}{20} = \frac{4}{20} = 0.20$
- Category 5: $\frac{(17 - 20)^2}{20} = \frac{(-3)^2}{20} = \frac{9}{20} = 0.45$
- Category 6: $\frac{(23 - 20)^2}{20} = \frac{(3)^2}{20} = \frac{9}{20} = 0.45$
- Total $\chi^2 = 1.25 + 1.25 + 0.20 + 0.20 + 0.45 + 0.45 = 3.80$
Degrees of Freedom: $df = 6 \text{ categories} - 1 = 5$.
Decision: Using a significance level of $\alpha = 0.05$, the critical value for $\chi^2$ with 5 degrees of freedom is approximately 11.070.
- Our calculated $\chi^2$ (3.80) is less than the critical value (11.070).
- Alternatively, the p-value for $\chi^2 = 3.80$ with $df=5$ is approximately 0.578. Since 0.578 > 0.05, we fail to reject the null hypothesis.
Conclusion: There is not enough evidence to suggest that the die is biased. The observed deviations from the expected frequencies are likely due to random chance.

Key Considerations and Assumptions

Independence: The observations within each category must be independent of each other.
Sample Size: The sample size should be large enough. A common rule of thumb is that the expected frequency for each category should be at least 5. If some expected frequencies are less than 5, consider combining adjacent categories (if meaningful) or using alternative tests.
Categorical Data: The test is designed for categorical data.

Common Interview Questions

What is the Chi-Square Goodness of Fit Test?
When do you use a Chi-Square Goodness of Fit Test?
How do you calculate the chi-square statistic for the goodness of fit test?
What is the formula for the chi-square goodness of fit test?
How do you determine the degrees of freedom in a goodness of fit test?
How do you interpret the p-value in a chi-square goodness of fit test?
What are the assumptions behind the chi-square goodness of fit test?
Can you explain the steps to perform a chi-square goodness of fit test?
Give an example of how to use the chi-square goodness of fit test in real life.
What does it mean to reject or fail to reject the null hypothesis in a goodness of fit test?

Chi-Square Goodness of Fit Test for AI & ML Data