Hypothesis Testing

Hypothesis Testing

  • Technique to make inferences about population with help of sample.
  • Help to make data-driven decisions by testing assumptions and quantifying uncertainty.
  • Key steps: - Formulate hypotheses (H0 and H1) - Select test (Z-test, T-test, chi-square test) - Check significance Level (1% or 5%) - Calculate test statistic and p-value - Make decision

Null Hypothesis and P-value Interpretation

  • Null Hypothesis (H0): No effect, No difference, everything is normal.
  • Alternative Hypothesis (H1): Contradicts null, presence of difference.
  • p-value: How consistent data is with null hypothesis. - High p-value > 0.05: Suggests data is consistent with H0. - Low p-value < 0.05: Suggests data is inconsistent with H0.
  • P-value does not measure probability of H0 being true or false.
  • P-value does not measure effect size or practical significance.

Z-test

  • Check if an average of a group is significantly different from a known value.
  • Used when sample size is large (> 30) or population variance is known.
  • Assumes data is normally distributed else we use non-parametric test.
  • Non Parametric: Mann-Whitney U test, Wilcoxon signed-rank test etc.
  • One-sample, Two-sample (compare two groups), Paired (before vs after).
  • z-score = (X̄ - μ) / (σ / √n)
  • p-value is determined from standard normal distribution using z-score.
  • Example: - Is average age of stroke patient greater than 30 years? - Is average revenue of our stores different from $5000 per month? - Is average total cholesterol totChol lesser than 150 mg/dL?

T-test

  • Check if an average of two groups are significantly different.
  • Used for small samples (< 30) and unknown population variance.
  • t-stat = (X̄ - μ) / (s / √n) ; s => sample standard deviation.
  • p-value is derived from the t-distribution using t-stat and n.
  • Example: - Do males have higher average BMI than females? - Is average sysBP different between stroke vs non-stroke patients? - Do people with diabetes have higher glucose levels than non-diabetics?

Chi-Square Test

  • Used to test association between two categorical variables.
  • test-stat = Σ((O - E)² / E); O => observed freq, E => Expected freq.
  • P-value is derived from the chi-square distribution.
  • Example: - Is heart stroke related to diabetes? - Is heart stroke related to gender?

ANOVA

  • Used to compare means across three or more groups.
  • Tests if at least one group mean is different from others.
  • Assumes normality and homogeneity of variance.
  • test-stat = MS_between / MS_within ; MS => Mean Square.
  • P-value is derived from the F-distribution.
  • Example: - Is average sysBP different across age groups (young, middle-aged, old)? - Is average glucose different across BMI categories?

Summary of Test Selection

Statistical Test Selection Guide:
TestUse CaseAssumptionsExample
Z-testCompare sample mean to known value or between two groups with large samples.Normality, Large sample size (>30) or known population variance.Is average age of stroke patient greater than 30 years?
T-testCompare means between two groups with small samples.Normality, Small sample size (<30), Unknown population variance.Do males have higher average BMI than females?
Chi-Square TestTest association between two categorical variables.Independence of observations, Expected frequency > 5.Is heart stroke related to diabetes?
ANOVACompare means across three or more groups.Normality, Homogeneity of variance, Independence.Is average sysBP different across age groups (young, middle-aged, old)?
Guide for selecting appropriate statistical test based on use case, assumptions, and example scenarios

Error Types and Power

  • Type I Error (False Positive): - Rejecting null hypothesis when it is actually true. - Probability of Type I error is denoted by α (significance level).
  • Type II Error (False Negative): - Failing to reject null hypothesis when it is actually false. - Probability of Type II error is denoted by β.
  • Power: - Probability of correctly rejecting null hypothesis when it is false. - Power = 1 - β.