Statistics for AI & Machine Learning: Concepts & Applications
Master key statistical concepts, scales, probability, distributions, and inferential stats vital for AI and Machine Learning applications in business. Learn data types, nominal & ordinal data.
Statistics
This document provides a comprehensive overview of key statistical concepts, scales of measurement, probability theorems, distributions, and inferential statistics, particularly relevant to business applications.
1. Types of Data
1.1 Qualitative Data / Categorical Data
- Nominal Data: Categories without any inherent order.
- Example: Gender (Male, Female, Other), Colors (Red, Blue, Green).
- Ordinal Data: Categories with a meaningful order, but the difference between categories is not quantifiable.
- Example: Education Level (High School, Bachelor's, Master's, PhD), Customer Satisfaction (Poor, Fair, Good, Excellent).
- Binomial Data: A special case of nominal data with only two categories.
- Example: Yes/No, True/False, Pass/Fail.
1.2 Quantitative Data
- Interval Data: Ordered data where the difference between values is meaningful and constant, but there is no true zero point.
- Example: Temperature in Celsius or Fahrenheit.
- Ratio Data: Ordered data with a meaningful and constant difference between values, and a true zero point.
- Example: Height, Weight, Income, Age.
1.3 Data by Number of Variables
- Univariate Data: Data involving a single variable.
- Bivariate Data: Data involving two variables.
- Multivariate Data: Data involving more than two variables.
1.4 Data by Time Dimension
- Time Series Data: Data collected over a period of time, where the order of observations matters.
- Example: Monthly sales figures, Daily stock prices.
- Cross-Sectional Data: Data collected at a specific point in time from multiple entities.
- Example: Survey data collected from different individuals in a single day.
2. Scales of Measurement in Business Statistics
- 2.1 Nominal Scale: Categorical data without order.
- 2.2 Ordinal Scale: Categorical data with order.
- 2.3 Interval Scale: Numerical data with order and equal intervals, but no true zero.
- 2.4 Ratio Scale: Numerical data with order, equal intervals, and a true zero.
3. Measures of Central Tendency
These measures describe the typical or central value of a dataset.
- Mean: The average of all values. $$ \text{Mean} = \frac{\sum_{i=1}^{n} x_i}{n} $$
- Median: The middle value in an ordered dataset.
- Mode: The most frequently occurring value in a dataset.
4. Measures of Dispersion
These measures describe the spread or variability of data.
- Box Plot: A graphical representation of the distribution of data through quartiles. It shows the median, quartiles, and potential outliers.
5. Relationship between AM, GM, and HM
- 3.1 Arithmetic Mean (AM): The sum of values divided by the number of values.
- 3.2 Geometric Mean (GM): The nth root of the product of n values, typically used for average growth rates. $$ \text{GM} = \sqrt[n]{x_1 \times x_2 \times \dots \times x_n} $$
- 3.3 Harmonic Mean (HM): The reciprocal of the arithmetic mean of the reciprocals of the values, used for average rates. $$ \text{HM} = \frac{n}{\sum_{i=1}^{n} \frac{1}{x_i}} $$
- 3.4 Relationship between AM, GM, and HM: For a set of positive numbers, $AM \ge GM \ge HM$. Equality holds only when all numbers are equal.
6. Skewness – Measures and Interpretation
Skewness measures the asymmetry of the probability distribution of a real-valued random variable about its mean.
- 4.1 What is Skewness? A measure of the asymmetry of the probability distribution of a random variable.
- 4.2 Tests of Skewness: Various statistical tests can determine if a distribution is significantly skewed.
- 4.3 Skewness of Karl Pearson’s Measure: A common method to quantify skewness.
- 4.4 Positive and Negative Skewness:
- 4.5 Positive Skewness (Right Skew): The tail on the right side of the distribution is longer or fatter. The mean is typically greater than the median.
- 4.6 Negative Skewness (Left Skew): The tail on the left side of the distribution is longer or fatter. The mean is typically less than the median.
- 4.7 Zero Skewness (Symmetrical Distribution): The distribution is perfectly symmetrical. The mean, median, and mode are equal.
- 4.8 Measurement of Skewness: Methods include Pearson's coefficients, Bowley's coefficient, and moment-based measures.
- 4.9 Karl Pearson’s Measure: $$ \text{Pearson's Coefficient of Skewness} = \frac{\text{Mean} - \text{Mode}}{\text{Standard Deviation}} $$ or $$ \text{Pearson's Coefficient of Skewness} = 3 \times \frac{\text{Mean} - \text{Median}}{\text{Standard Deviation}} $$
- 4.10 Bowley’s Measure: Based on quartiles. $$ \text{Bowley's Coefficient of Skewness} = \frac{Q_3 + Q_1 - 2 \times Q_2}{Q_3 - Q_1} $$ where $Q_1$, $Q_2$ (Median), and $Q_3$ are the first, second, and third quartiles.
- 4.11 Kelly’s Measure: Another measure of skewness.
- 4.12 Interpretation of Skewness: Indicates the direction and degree of asymmetry.
- 4.13 Difference between Dispersion and Skewness: Dispersion measures spread, while skewness measures asymmetry.
7. What is a Regression Line?
A regression line is a line that best fits the data points on a scatter plot, typically used to model the relationship between two variables.
- 5.1 What is a Regression Line? A line that represents the best linear approximation of the relationship between dependent and independent variables.
- 5.2 Equation of Regression Line:
$$ y = a + bx $$
where:
- $y$ is the dependent variable.
- $x$ is the independent variable.
- $a$ is the y-intercept.
- $b$ is the slope of the line.
- 5.3 Graphical Representation of Regression Line: Plotted on a scatter plot showing the observed data points and the fitted line.
- 5.4 Examples of Regression Line: Predicting sales based on advertising spend.
8. Types of Regression Lines
- 6.1 Linear Regression Line: Assumes a linear relationship between variables.
- 6.2 Logistic Regression Line: Used for binary classification problems, modeling the probability of an event.
- 6.3 Polynomial Regression Line: Models a curved relationship between variables using polynomial functions.
- 6.4 Ridge and Lasso Regression: Regularization techniques used to prevent overfitting in linear regression models.
- 6.5 Non-Linear Regression Line: Models relationships that are not linear.
- 6.6 Multiple Regression Line: Extends linear regression to model relationships with more than one independent variable.
- 6.7 Exponential Regression Line: Models exponential growth or decay.
- 6.8 Pricewise Regression Line: Fits different linear segments to different parts of the data.
- 6.9 Time Series Regression Line: Regression models specifically applied to time series data.
- 6.10 Power Regression Line: Models relationships where one variable is a power function of another.
- 6.11 Applications of Regression Line: Forecasting, understanding relationships, prediction.
- 6.12 Importance of Regression Line: Quantifies relationships and allows for predictions.
- 6.13 Statistical Significance of Regression Line: Testing whether the relationship between variables is statistically significant.
- 6.14 Practice Questions on Regression Line: Exercises to reinforce understanding.
9. Probability Theorems | Theorems and Examples
- 7.1 What is Probability? The likelihood of an event occurring.
- 7.2 Probability Theorems: Fundamental rules governing probability calculations.
- 7.3 Theorem of Complementary Events: $P(A') = 1 - P(A)$, where $A'$ is the complement of event $A$.
- 7.4 Theorem of Addition: For mutually exclusive events, $P(A \cup B) = P(A) + P(B)$. For non-mutually exclusive events, $P(A \cup B) = P(A) + P(B) - P(A \cap B)$.
- 7.5 Theorem of Multiplication (Statistical Independence): For independent events, $P(A \cap B) = P(A) \times P(B)$.
- 7.6 Theorem of Total Probability: Used to calculate the probability of an event that can occur through various mutually exclusive intermediate events.
10. Tree Diagram: Meaning, Features, Conditional Probability and Examples
A tree diagram is a visual tool used to represent probabilities of sequential events.
- 8.1 What is a Tree Diagram? A graphical representation of outcomes of a sequence of events.
- 8.2 Features of Tree Diagram: Branches represent events, and nodes represent outcomes. Probabilities are assigned to branches.
- 8.3 How to Draw a Tree Diagram? Start with an initial node, branch out for each possible outcome of the first event, and continue branching for subsequent events.
- 8.4 Tree Diagram for Conditional Probability: Illustrates how the probability of an event changes based on the occurrence of a previous event.
- 8.5 Tree Diagram in Probability Theory: Useful for calculating probabilities of compound events and understanding conditional probabilities.
- 8.6 Examples of Tree Diagram: Calculating probabilities in games of chance, sequential decision-making.
11. Joint Probability | Concept, Formula, and Examples
Joint probability is the probability of two or more events occurring simultaneously.
- 9.1 What is Joint Probability in Business Statistics? The probability of two or more variables taking specific values or falling into specific categories at the same time.
- 9.2 Difference between Joint Probability and Conditional Probability: Joint probability is $P(A \cap B)$, the probability of both A and B occurring. Conditional probability is $P(A|B)$, the probability of A given that B has occurred.
- 9.3 Probability Density Function (PDF): A function describing the likelihood of a continuous random variable taking on a given value.
- 9.4 What is the Probability Density Function? A function that defines the probability distribution for a continuous random variable.
- 9.5 Probability Density Function Formula: Varies by distribution type.
- 9.6 Properties of Probability Density Function: Non-negative, the total area under the curve is 1.
- 9.7 Probability Distribution Function of Discrete Distribution: Typically represented by a probability mass function (PMF).
- 9.8 Probability Distribution Function of Continuous Distribution: Represented by a probability density function (PDF).
12. Bivariate Frequency Distribution | Calculation, Advantages, and Disadvantages
A bivariate frequency distribution shows the frequencies of occurrences for combinations of two variables.
- 10.1 Definition: A table that summarizes the relationship between two variables by showing the frequency of each combination of their values or categories.
- 10.2 Components:
- Joint Frequencies: Frequencies for specific combinations of values of the two variables.
- Marginal Frequencies: Frequencies for each value of a single variable, ignoring the other.
- Conditional Frequencies: Frequencies of one variable given a specific value of the other.
- 10.3 Construction: Create a grid with classes/values of one variable on rows and the other on columns. Tally the occurrences for each combination.
- 10.4 Graphical Representation: Scatter plots or heatmaps.
- 10.5 Calculation: Involves tallying observations into appropriate cells and then calculating marginal and conditional frequencies. Used for correlation or association analysis.
- 10.6 Advantages: Helps visualize and understand the relationship between two variables, identify patterns, and check for independence.
- 10.7 Disadvantages: Can become complex with many categories or variables; requires sufficient data to be meaningful.
13. Bernoulli Distribution in Business Statistics – Mean and Variance
The Bernoulli distribution describes the probability of success or failure in a single trial.
- 11.1 Terminologies:
- Bernoulli Trial: An experiment with only two possible outcomes (success or failure).
- Success: The desired outcome.
- Failure: The outcome other than success.
- 11.2 Formula of Bernoulli Distribution: $$ P(X=k) = p^k (1-p)^{1-k}, \quad \text{for } k \in {0, 1} $$ where $p$ is the probability of success.
- 11.3 Mean and Variance of Bernoulli Distribution:
- Mean (Expected Value): $E(X) = p$
- Variance: $Var(X) = p(1-p)$
- 11.4 Properties: Single trial, two outcomes, constant probability of success.
- 11.5 Bernoulli Distribution Graph: A simple bar chart showing probabilities for $X=0$ and $X=1$.
- 11.6 Bernoulli Trial: (See 11.1)
- 11.7 Examples: Flipping a coin (Heads/Tails), Pass/Fail on a test.
- 11.8 Applications: Modeling single events like a customer clicking an ad, a machine failing on a production line.
- 11.9 Bernoulli Distribution and Binomial Distribution: The Binomial distribution is a sum of independent Bernoulli trials.
14. Binomial Distribution in Business Statistics – Definition, Formula & Examples
The Binomial distribution describes the number of successes in a fixed number of independent Bernoulli trials.
- 12.1 Formula of Binomial Distribution:
$$ P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad \text{for } k = 0, 1, 2, \dots, n $$
where:
- $n$ is the number of trials.
- $k$ is the number of successes.
- $p$ is the probability of success in a single trial.
- $\binom{n}{k} = \frac{n!}{k!(n-k)!}$ is the binomial coefficient.
- 12.2 Properties: Fixed number of trials, independent trials, two outcomes per trial, constant probability of success.
- 12.3 Negative Binomial Distribution: Describes the number of trials needed to achieve a fixed number of successes.
- 12.4 Mean and Variance of Binomial Distribution:
- Mean: $E(X) = np$
- Variance: $Var(X) = np(1-p)$
- 12.5 Shape of Binomial Distribution: Bell-shaped and symmetric if $p=0.5$. Skewed if $p \neq 0.5$.
- 12.6 Solved Examples: Calculating the probability of getting exactly 3 heads in 5 coin flips.
- 12.7 Uses in Business Statistics: Quality control, marketing response rates, customer purchasing behavior.
- 12.8 Real-Life Scenarios: Number of defective items in a batch, number of customers who respond to a campaign.
- 12.9 Difference Between Binomial Distribution and Normal Distribution: Binomial is for discrete trials, Normal is for continuous data. Normal can approximate Binomial for large $n$.
15. Geometric Mean in Business Statistics | Concept, Properties, and Uses
The Geometric Mean is used for averaging rates of change or ratios.
- 13.1 Weighted Geometric Mean: Geometric mean where each observation is assigned a weight.
- 13.2 Properties: Sensitive to extreme values, always less than or equal to the arithmetic mean.
- 13.3 Uses: Calculating average investment returns, average growth rates, index numbers.
16. Negative Binomial Distribution: Properties, Applications, and Examples
The Negative Binomial distribution models the number of trials required to achieve a fixed number of successes.
- 14.1 Properties: Defined by parameters $r$ (number of successes) and $p$ (probability of success).
- 14.2 Probability Density Function (PDF): $$ P(X=k) = \binom{k-1}{r-1} p^r (1-p)^{k-r}, \quad \text{for } k = r, r+1, \dots $$ where $X$ is the number of trials.
- 14.3 Mean and Variance:
- Mean: $E(X) = \frac{r}{p}$
- Variance: $Var(X) = \frac{r(1-p)}{p^2}$
- 14.4 Applications in Business Statistics: Modeling customer purchasing frequency, duration of service calls.
- 14.5 Examples: The number of times a salesperson needs to make calls to achieve 5 sales, assuming a constant probability of sale per call.
17. Hypergeometric Distribution in Business Statistics: Meaning, Examples & Uses
The Hypergeometric distribution is used for sampling without replacement from a finite population.
- 15.1 Probability Density Function (PDF):
$$ P(X=k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}} $$
where:
- $N$ is the population size.
- $K$ is the number of success states in the population.
- $n$ is the number of draws (sample size).
- $k$ is the number of observed successes.
- 15.2 Mean and Variance:
- Mean: $E(X) = n \frac{K}{N}$
- Variance: $Var(X) = n \frac{K}{N} \left(1 - \frac{K}{N}\right) \left(\frac{N-n}{N-1}\right)$
- 15.3 Examples: The probability of drawing a certain number of defective items from a batch without putting them back.
- 15.4 When to Use: When sampling without replacement from a finite population, and the probability of success changes with each draw.
- 15.5 Difference Between Hypergeometric Distribution and Binomial Distribution: Binomial assumes replacement or infinite population (constant probability of success), while Hypergeometric does not.
- 15.6 Conclusion: Crucial for scenarios where sampling affects probabilities.
18. Poisson Distribution: Meaning, Characteristics, Shape, Mean, and Variance
The Poisson distribution models the number of events occurring in a fixed interval of time or space.
- 16.1 Probability Distribution Function (PDF) of Poisson Distribution: $$ P(X=k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad \text{for } k = 0, 1, 2, \dots $$ where $\lambda$ is the average rate of events.
- 16.2 Characteristics: Events occur independently, the average rate is constant.
- 16.3 Shape of Poisson Distribution: Skewed to the right, becomes more symmetric as $\lambda$ increases.
- 16.4 Mean and Variance of Poisson Distribution:
- Mean: $E(X) = \lambda$
- Variance: $Var(X) = \lambda$
- 16.5 Fitting a Poisson Distribution: Estimating $\lambda$ from observed data and using the formula to calculate probabilities.
- 16.6 Poisson Distribution as an Approximation to Binomial Distribution: When $n$ is large and $p$ is small, Poisson can approximate Binomial with $\lambda = np$.
- 16.7 Examples: Number of customer calls per hour, number of defects per square meter of fabric.
19. Gamma Distribution in Statistics
The Gamma distribution is a flexible, continuous probability distribution often used to model waiting times or the sum of exponential random variables.
- 17.1 What is Gamma Distribution: A continuous probability distribution characterized by a shape parameter and a rate or scale parameter.
- 17.2 Gamma Distribution Function: Refers to the Gamma function, $\Gamma(z) = \int_0^\infty t^{z-1}e^{-t}dt$.
- 17.3 Gamma Distribution Formula – Probability Density Function (PDF): $$ f(x; \alpha, \beta) = \frac{\beta^\alpha x^{\alpha-1} e^{-\beta x}}{\Gamma(\alpha)}, \quad \text{for } x > 0 $$ where $\alpha$ is the shape parameter and $\beta$ is the rate parameter.
- 17.4 Gamma Distribution Mean and Variance:
- Mean: $E(X) = \frac{\alpha}{\beta}$
- Variance: $Var(X) = \frac{\alpha}{\beta^2}$
- 17.5 Special Case 1: Exponential Distribution: When $\alpha = 1$, Gamma becomes Exponential, modeling time until the first event.
- 17.6 Examples of Exponential Distribution: Time until a machine breaks down, time between customer arrivals.
- 17.7 Special Case 2: Chi-Square Distribution: When $\alpha = \nu/2$ and $\beta = 1/2$ (or scale parameter $2$), Gamma becomes Chi-Square, used in hypothesis testing.
- 17.8 Examples of Chi-Square Distribution: Goodness-of-fit tests, tests of independence.
20. Normal Distribution in Business Statistics
The Normal distribution, or Gaussian distribution, is a fundamental continuous probability distribution characterized by its bell shape.
- 18.1 Probability Density Function (PDF) of Normal Distribution: $$ f(x; \mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} $$ where $\mu$ is the mean and $\sigma$ is the standard deviation.
- 18.2 Standard Normal Distribution: A normal distribution with mean $\mu=0$ and standard deviation $\sigma=1$, often denoted by $Z$.
- 18.3 Properties: Symmetric, unimodal, mean=median=mode, tails extend infinitely.
- 18.4 The Empirical Rule (68-95-99.7 Rule):
- Approximately 68% of data falls within 1 standard deviation of the mean.
- Approximately 95% falls within 2 standard deviations.
- Approximately 99.7% falls within 3 standard deviations.
- 18.5 Parameters of Normal Distribution: Mean ($\mu$) and Standard Deviation ($\sigma$).
- 18.6 Curve of Normal Distribution: A bell-shaped curve, symmetric around the mean.
- 18.7 Examples: Heights of people, measurement errors, test scores.
- 18.8 Applications in Business Statistics: Statistical inference, modeling errors, financial modeling, quality control.
21. Lognormal Distribution in Business Statistics
The Lognormal distribution describes variables whose logarithms are normally distributed.
- 19.1 Probability Density Function (PDF) of Lognormal Distribution: $$ f(x; \mu, \sigma) = \frac{1}{x\sigma\sqrt{2\pi}} e^{-\frac{(\ln x - \mu)^2}{2\sigma^2}}, \quad \text{for } x > 0 $$ where $\mu$ and $\sigma$ are the mean and standard deviation of the logarithm of the variable.
- 19.2 Lognormal Distribution Curve: Skewed to the right.
- 19.3 Mean and Variance of Lognormal Distribution:
- Mean: $E(X) = e^{\mu + \sigma^2/2}$
- Variance: $Var(X) = (e^{\sigma^2} - 1)e^{2\mu + \sigma^2}$
- 19.4 Applications: Modeling incomes, stock prices, response times, lifetimes of equipment.
- 19.5 Examples: Income distribution in a population, size of companies.
- 19.6 Difference Between Normal Distribution and Lognormal Distribution: Normal can take negative values, Lognormal is strictly positive. Lognormal is right-skewed.
22. Inferential Statistics
Inferential statistics uses sample data to draw conclusions about a population.
- 20.1 Overview of Inferential Statistics: Making inferences and predictions about a population based on sample data.
- 20.2 Degrees of Freedom: The number of values in a calculation that are free to vary.
- 20.3 Central Limit Theorem: States that the distribution of sample means will approximate a normal distribution as the sample size becomes large, regardless of the population's distribution.
- 20.4 Parameters vs. Test Statistics: Parameters are population characteristics (e.g., population mean $\mu$), while test statistics are calculated from sample data (e.g., sample mean $\bar{x}$).
- 20.5 Test Statistics: Values calculated from sample data used to test hypotheses (e.g., t-statistic, z-statistic, F-statistic).
- 20.6 Estimation: The process of estimating population parameters from sample statistics (point estimation and interval estimation).
- 20.7 Standard Error: The standard deviation of the sampling distribution of a statistic.
- 20.8 Confidence Interval: A range of values, derived from sample statistics, that is likely to contain the value of an unknown population parameter.
23. Hypothesis Testing
Hypothesis testing is a statistical method to make decisions based on data, testing a claim about a population.
- 21.1 Hypothesis Testing Guide: A structured approach to testing hypotheses.
- 21.2 Null and Alternative Hypothesis:
- Null Hypothesis ($H_0$): A statement of no effect or no difference.
- Alternative Hypothesis ($H_a$ or $H_1$): A statement of an effect or difference that contradicts the null hypothesis.
- 21.3 Statistical Significance: The likelihood that the observed results are due to random chance.
- 21.4 P-Value: The probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true.
- 21.5 Type I and Type II Errors:
- Type I Error (False Positive): Rejecting a true null hypothesis.
- Type II Error (False Negative): Failing to reject a false null hypothesis.
- 21.6 Statistical Power: The probability of correctly rejecting a false null hypothesis.
- Decisions: Based on comparing the p-value to a significance level ($\alpha$), or comparing a test statistic to a critical value.
24. Choosing the Right Statistical Test
Selecting the appropriate statistical test depends on the data type, research question, and assumptions.
- 22.1 Assumptions of Hypothesis Testing: Conditions that must be met for a test to be valid.
- 22.1.1 Skewness: Some tests assume symmetry or normality.
- 22.1.2 Kurtosis: Relates to the "tailedness" of the distribution, also relevant for normality assumptions.
- 22.2 Correlation: Measures the strength and direction of the linear relationship between two quantitative variables.
- Correlation Coefficient: A numerical measure of correlation (e.g., Pearson's $r$).
- Correlation vs. Causation: Correlation does not imply causation.
- Pearson Correlation: Measures linear association between two continuous variables.
- Covarance vs Correlation: Covariance indicates the direction of linear relationship, while correlation standardizes it.
- 22.3 Regression Analysis: Used to model the relationship between dependent and independent variables.
- 22.3.1 t-Test: Used to compare means of two groups or test the significance of regression coefficients.
- 22.3.2 ANOVAs (Analysis of Variance): Used to compare means of three or more groups.
- 22.3.2.1 One Way ANOVA: Compares means of groups based on one factor.
- 22.3.2.2 Two Way ANOVA: Compares means based on two factors and their interaction.
- 22.3.2.3 Annova in R: Implementation of ANOVA using the R programming language.
- 22.4 Chi-Square Test: Used for categorical data.
- 22.4.1 Overview of Chi-Square Test: Tests for association between categorical variables or goodness-of-fit.
- 22.4.2 Chi-Square Goodness of Fit Test: Tests if sample data fits a hypothesized distribution.
- 22.4.3 Chi-Square Test of Independence: Tests if there is a significant association between two categorical variables.
25. Graphical Representation of Variables
Visualizing data is crucial for understanding patterns, trends, and relationships.
- Graphs and Tables: Various graphical forms (histograms, scatter plots, bar charts, line graphs) and tabular formats are used to represent data.
This document is a compilation of common statistical topics. Specific applications and detailed formulas may vary based on context.
Statistical Tests & Inference for ML with SciPy
Master statistical tests and inference for machine learning using SciPy. Learn hypothesis testing, data analysis, and evidence-based decision making with Python.
Data Science Foundations: Essential Statistics Explained
Unlock the power of data science with statistics. Learn how statistical methods are crucial for data analysis, interpretation, visualization, and predictive modeling in AI.