20.5 Test Statistics: Z-Test & Formulas for ML
Explore 20.5 test statistics, including Z-test formulas, for hypothesis testing in machine learning. Understand data analysis and draw valid conclusions.
20.5 Test Statistics
This document outlines common statistical test statistics, their formulas, and their applications. Understanding these statistics is crucial for hypothesis testing and drawing meaningful conclusions from data.
Common Test Statistics and Their Formulas
1. Z-Test Statistic
The Z-test is used for large sample sizes or when the population standard deviation ($\sigma$) is known. It compares the sample mean ($\bar{X}$) to the population mean ($\mu$) under the null hypothesis.
Formula:
Z = (X̄ - μ) / (σ / √n)
Where:
- $\bar{X}$ = Sample mean
- $\mu$ = Population mean (under the null hypothesis)
- $\sigma$ = Population standard deviation
- $n$ = Sample size
When to Use:
- When the sample size is large (typically $n \ge 30$).
- When the population standard deviation ($\sigma$) is known.
2. One-Sample t-Test Statistic
The one-sample t-test is used for small sample sizes when the population standard deviation is unknown. It estimates the population standard deviation using the sample standard deviation ($s$).
Formula:
t = (X̄ - μ) / (s / √n)
Where:
- $\bar{X}$ = Sample mean
- $\mu$ = Population mean (under the null hypothesis)
- $s$ = Sample standard deviation
- $n$ = Sample size
When to Use:
- When the sample size is small (typically $n < 30$).
- When the population standard deviation ($\sigma$) is unknown and must be estimated by the sample standard deviation ($s$).
- Assumes the underlying population is approximately normally distributed.
3. Two-Sample t-Test Statistic (Independent Samples)
This t-test is used to compare the means of two independent groups. It is used when the population standard deviations are unknown and the samples are small. There are two main versions: assuming equal variances and assuming unequal variances (Welch's t-test). The formula below assumes unequal variances, which is generally more robust.
Formula (Assuming Unequal Variances):
t = (X̄₁ - X̄₂) / √( (s₁² / n₁) + (s₂² / n₂) )
Where:
- $\bar{X}_1$, $\bar{X}_2$ = Sample means for group 1 and group 2, respectively.
- $s_1^2$, $s_2^2$ = Sample variances for group 1 and group 2, respectively.
- $n_1$, $n_2$ = Sample sizes for group 1 and group 2, respectively.
When to Use:
- To compare the means of two independent groups.
- When population standard deviations are unknown.
- When sample sizes are small.
- Assumes the underlying populations are approximately normally distributed.
4. Chi-Square ($\chi^2$) Test Statistic
The Chi-Square test statistic is primarily used for categorical data to assess the association between categorical variables or to compare observed frequencies with expected frequencies.
Formula (for Goodness-of-Fit or Test of Independence):
χ² = Σ [ (Observed - Expected)² / Expected ]
Where:
Observed
: The observed frequency in each category.Expected
: The expected frequency in each category (calculated based on the null hypothesis).Σ
: Summation over all categories.
When to Use:
- To test if there is a statistically significant difference between observed frequencies and expected frequencies (Goodness-of-Fit test).
- To test for independence between two categorical variables (Test of Independence).
- Data must be in frequencies or counts.
5. F-Test Statistic (ANOVA)
The F-test statistic is used in Analysis of Variance (ANOVA) to compare the means of three or more groups. It tests whether there are any statistically significant differences between the means of independent groups.
Formula:
F = Variance Between Groups / Variance Within Groups
More formally, it's the ratio of the Mean Square Between (MSB) to the Mean Square Within (MSW):
F = MSB / MSW
Where:
- Variance Between Groups (MSB): Measures the variability of the means of the different groups around the overall mean.
- Variance Within Groups (MSW): Measures the average variability of the data within each group, pooled across all groups.
When to Use:
- To compare the means of three or more groups simultaneously.
- To determine if at least one group mean is significantly different from the others.
- Assumes that the data in each group are normally distributed and that the variances of the groups are equal (homoscedasticity).
Frequently Asked Questions
-
What is the formula for the Z-test statistic and when is it used? The Z-test statistic is calculated as $Z = (\bar{X} - \mu) / (\sigma / \sqrt{n})$. It's used for large samples ($n \ge 30$) or when the population standard deviation ($\sigma$) is known.
-
Explain the difference between a one-sample t-test and a two-sample t-test. A one-sample t-test compares the mean of a single sample to a known or hypothesized population mean. A two-sample t-test compares the means of two independent samples to determine if they are significantly different from each other.
-
How do you calculate the two-sample t-test statistic? For independent samples with unequal variances, the formula is $t = (\bar{X}_1 - \bar{X}_2) / \sqrt{((s_1^2 / n_1) + (s_2^2 / n_2))}$.
-
What is the chi-square test statistic used for and how is it calculated? The chi-square ($\chi^2$) statistic is used for categorical data to assess goodness-of-fit or independence. It's calculated by summing the squared differences between observed and expected frequencies, divided by the expected frequencies: $\chi² = \Sigma [ (Observed - Expected)² / Expected ]$.
-
Describe the F-test statistic and its role in ANOVA. The F-test statistic is the ratio of the variance between groups to the variance within groups ($F = \text{Variance Between Groups} / \text{Variance Within Groups}$). In ANOVA, it's used to test if the means of three or more groups are significantly different.
-
When should you use a Z-test instead of a t-test? You should use a Z-test when the population standard deviation ($\sigma$) is known, or when the sample size is sufficiently large ($n \ge 30$) so that the sample standard deviation is a reliable estimate of the population standard deviation. Otherwise, a t-test is preferred.
-
What assumptions are made when performing a t-test? Key assumptions for t-tests include:
- The data are continuous (interval or ratio scale).
- The data are randomly sampled from the population.
- The data are approximately normally distributed (especially important for small sample sizes).
- For two-sample t-tests, the independence of the two samples is assumed. For the standard two-sample t-test, equal variances between groups are also assumed (though Welch's t-test relaxes this).
-
How do sample size and variance affect test statistics?
- Sample Size ($n$): A larger sample size generally leads to a more precise estimate of the population parameters. For Z-tests and t-tests, increasing $n$ decreases the standard error ($ \sigma/\sqrt{n} $ or $ s/\sqrt{n} $) in the denominator, making the test statistic larger (more extreme) for a given difference in means. This increases the power to detect a significant effect.
- Variance ($s^2$ or $\sigma^2$): Higher variance within the sample or population means greater variability in the data. This increases the standard error in the denominator of Z and t-test statistics, leading to smaller (less extreme) test statistics. Consequently, higher variance reduces the power to detect significant differences.
-
Explain how the chi-square statistic tests independence in categorical data. In a test of independence, the chi-square statistic compares the observed counts in each cell of a contingency table with the counts that would be expected if the two categorical variables were truly independent. A large $\chi^2$ value indicates a substantial difference between observed and expected counts, suggesting that the variables are likely dependent. The calculation quantifies this discrepancy across all cells.
-
Can you provide an example where an F-test would be appropriate? An F-test would be appropriate when comparing the effectiveness of three different teaching methods on student test scores. You would use ANOVA (with the F-test) to determine if there is a statistically significant difference in the average test scores across the three teaching methods. For example, Method A, Method B, and Method C.
Parameters vs. Test Statistics in AI & ML
Understand the crucial difference between population parameters and test statistics in AI/ML inference and hypothesis testing. Learn how they drive model evaluation.
20.6 Estimation: Inferring Population Parameters with AI
Learn about 20.6 Estimation, a key statistical technique for using AI and sample data to infer population parameters. Understand how machine learning models estimate unknown values.