Probability Distributions

Probability Distribution

  • Mathematical function that describes the likelihood of different outcomes in a random process.
  • Types: Discrete (Binomial, Poisson) and Continuous (Normal, Exponential).
  • Understanding distributions helps in selecting proper statistical methods.
  • Common Distributions:
    1. Normal Distribution: Bell-shaped curve, defined by mean and standard deviation.
    2. Bernouli Distribution: Models a single trial with two possible outcomes: success (1) and failure (0).
    3. Binomial Distribution: Models number of successes in a fixed number of independent trials.
    4. Poisson Distribution: Models number of events in a fixed interval of time.

Normal Distribution

  • Bell-shaped curve symmetric around the mean & spread of std deviation.
  • 68-95-99.7 describes amount of data within 1, 2, 3 deviation from mean.
  • Many statistical tests assume normality of data. Example: age, height etc.
Normal distribution curve with mean and standard deviation annotations
Normal distribution curves with different mean and standard deviation

Bernouli Distribution

  • Models single trial with two possible outcomes: success(1) or failure(0).
  • Parameter p represents the probability of success.
  • Scenarios: Coin flip p=0.5, Customer purchase p=0.2, Email open p=0.3.
Bernoulli distribution graph
Bernoulli distribution with probability p

Binomial Distribution

  • Number of successes in a fixed number of independent Bernoulli trials.
  • Parameter n represents the number of trials, and p represents the probability of success in each trial.
  • Scenarios: Number of heads in 10 coin flips, number of customers who purchase out of 100 visitors.
Binomial distribution graph
Binomial distribution with parameters n and p

Poisson Distribution

  • Number of events occurring in a fixed interval of time or space.
  • Parameter λ (lambda) denotes average number of events in the interval.
  • Used for modeling rare events or counting occurrences in fixed intervals.
  • Scenarios: Number of customer arrivals at a store in an hour, number of emails received in a day.
Poisson distribution graph
Poisson distribution with parameter λ

Estimating P/Lambda

  • Collect data from past.
  • Probability aren't guessed but estimated from real world frequencies.
  • Formulae: For p: Success / Total Trial. For λ (Lambda): Average number of events per interval. For Mean and Sigma: Calculate from past data.

When to use which distribution?

If you are running a cake shop, Distribution selection guide:
DistributionUse CaseExample
NormalContinuous data, symmetricAverage daily sales amount.
BernoulliSingle binary outcomeProbability of a customer buying a cake.
BinomialFixed number of independent trialsNumber of customers who buy a cake out of 100 visitors.
PoissonCount data, rare eventsNumber of customers arriving at the shop per hour.
Distribution selection guide based on use case and example scenarios for a cake shop

PMF, PDF and CDF

Key Functions for Probability Distributions:
FunctionDescriptionApplicable To
PMF (Probability Mass Function)Gives probability of each outcome for discrete distributions.Discrete Distributions (e.g., Bernoulli, Binomial, Poisson)
PDF (Probability Density Function)For continuous distributions, describes relative likelihood of outcomes.Continuous Distributions (e.g., Normal, Exponential)
CDF (Cumulative Distribution Function)Gives probability that a random variable is less than or equal to a certain value.All Distributions
Graph showing PMF for discrete distribution and PDF for continuous distribution
PMF for discrete distribution and PDF for continuous distribution
Graph showing CDF for a distribution
CDF for a distribution showing cumulative probability

Confidence Intervals (CI)

  • Range of values that likely contain the true population parameter.
  • Calculated from sample data and provides a measure of uncertainty.
  • If true population mean is fish in a lake: - Point Estimate: We throw single spear at fish. Likely miss. - Confidence Interval: We throw a net around fish. Indicate how often net catches fish.
  • To calculate 95% CI for a Normal Distribution. We need: - Sample mean (X̄) - Standard Error (σ/√n) - Z-score for 95% confidence (1.96) CI = X̄ ± Z * SE