Hypothesis Testing
✕Hypothesis Testing
- Technique to make inferences about population with help of sample.
- Help to make data-driven decisions by testing assumptions and quantifying uncertainty.
- Key steps:
- Formulate hypotheses (
H0andH1) - Select test (Z-test,T-test,chi-square test) - Check significance Level (1%or5%) - Calculate test statistic and p-value - Make decision
Null Hypothesis and P-value Interpretation
- Null Hypothesis (H0): No effect, No difference, everything is normal.
- Alternative Hypothesis (H1): Contradicts null, presence of difference.
- p-value: How consistent data is with null hypothesis.
- High p-value
> 0.05: Suggests data is consistent with H0. - Low p-value< 0.05: Suggests data is inconsistent with H0. - P-value does not measure probability of H0 being true or false.
- P-value does not measure effect size or practical significance.
Z-test
- Check if an average of a group is significantly different from a known value.
- Used when sample size is large (
> 30) or population variance is known. - Assumes data is normally distributed else we use non-parametric test.
- Non Parametric: Mann-Whitney U test, Wilcoxon signed-rank test etc.
- One-sample, Two-sample (compare two groups), Paired (before vs after).
- z-score =
(X̄ - μ) / (σ / √n) - p-value is determined from standard normal distribution using z-score.
- Example:
- Is average age of stroke patient greater than 30 years?
- Is average revenue of our stores different from $5000 per month?
- Is average total cholesterol
totChollesser than 150 mg/dL?
T-test
- Check if an average of two groups are significantly different.
- Used for small samples (
< 30) and unknown population variance. - t-stat =
(X̄ - μ) / (s / √n); s => sample standard deviation. - p-value is derived from the t-distribution using t-stat and n.
- Example:
- Do males have higher average BMI than females?
- Is average
sysBPdifferent between stroke vs non-stroke patients? - Do people with diabetes have higher glucose levels than non-diabetics?
Chi-Square Test
- Used to test association between two categorical variables.
- test-stat =
Σ((O - E)² / E); O => observed freq, E => Expected freq. - P-value is derived from the chi-square distribution.
- Example: - Is heart stroke related to diabetes? - Is heart stroke related to gender?
ANOVA
- Used to compare means across three or more groups.
- Tests if at least one group mean is different from others.
- Assumes normality and homogeneity of variance.
- test-stat =
MS_between / MS_within; MS => Mean Square. - P-value is derived from the F-distribution.
- Example:
- Is average
sysBPdifferent across age groups (young,middle-aged,old)? - Is averageglucosedifferent across BMI categories?
Summary of Test Selection
Statistical Test Selection Guide:
| Test | Use Case | Assumptions | Example |
|---|---|---|---|
| Z-test | Compare sample mean to known value or between two groups with large samples. | Normality, Large sample size (>30) or known population variance. | Is average age of stroke patient greater than 30 years? |
| T-test | Compare means between two groups with small samples. | Normality, Small sample size (<30), Unknown population variance. | Do males have higher average BMI than females? |
| Chi-Square Test | Test association between two categorical variables. | Independence of observations, Expected frequency > 5. | Is heart stroke related to diabetes? |
| ANOVA | Compare means across three or more groups. | Normality, Homogeneity of variance, Independence. | Is average sysBP different across age groups (young, middle-aged, old)? |
Guide for selecting appropriate statistical test based on use case, assumptions, and example scenarios
Error Types and Power
- Type I Error (False Positive): - Rejecting null hypothesis when it is actually true. - Probability of Type I error is denoted by α (significance level).
- Type II Error (False Negative): - Failing to reject null hypothesis when it is actually false. - Probability of Type II error is denoted by β.
- Power: - Probability of correctly rejecting null hypothesis when it is false. - Power = 1 - β.
