Model Evaluation Techniques

Evaluating ML model

  • Critical to evaluate how model performs on unseen data.
  • Appropriate metrics to be used based on problem type & business goals.
  • Evaluation Steps:
    1. Split data into training and testing sets.
    2. Train the model on the training set.
    3. Evaluate the model on the testing set.
    4. Select appropriate metrics based on problem type & business goals.
    5. Compare results across different models.

Regression Metrics

Common Regression Metrics:
MetricDescriptionFormulae
MAE: Mean Absolute ErrorAverage of absolute error.Σ(|ya - yp|)/n
MSE: Mean Squared ErrorAverage of squared errors.Σ(ya - yp)²/n
RMSE: Root Mean Squared ErrorSquare root of MSE.√MSE
R² (R-squared)Variation in dependent variable that is explained by independent variables.1 - (Σ(ya - yp)² / Σ(ya - y_mean)²)
Common regression metrics used to evaluate model performance.

Calculating Regression Metrics

  • MAE = (100 + 200 + 50 + 100)/4 = 112.5
  • Median Absolute Error = 100
  • MSE = (10000 + 40000 + 2500 + 10000)/4 = 15625
Calculating Regression Metrics Example:
YearsCompanyPositionSalaryPredicted|err|err²
5GoogleDeveloper1000110010010000
8MicrosoftData Engineer1500130020040000
1MicrosoftData Engineer11001150502500
2GoogleDeveloper80090010010000
Dataset for regression task: Predicting Salary based on features

Confusion Matrix

  • Summarizes classification performance with counts of: TP => Model correctly predicts positive. TN => Model correctly predicts negative. FP => Model predicts positive but it was negative (Type-I error). FN => Model predicts negative but it was positive (Type-II error).
  • Using it, we can calculate metrics accuracy, precision, recall, F1-score.
Confusion Matrix Example:
TP: 50FN: 10
FP: 5TN: 100
Confusion matrix for a binary classification problem.

Classification Metrics

Common Classification Metrics:
MetricDescriptionFormulae
AccuracyProportion of correct predictions.(TP + TN) / (TP + TN + FP + FN)
PrecisionOut of predicted positives, how many were correct?TP / (TP + FP)
Recall (Sensitivity)Out of total positives, how many model catch?TP / (TP + FN)
F1-ScoreHarmonic mean of precision and recall.2 * prcsn * rcl / (prcsn + rcl)
Common classification metrics used to evaluate model performance.

Calculating Classification Metrics

Calculating Classification Metrics Example:
TemperatureWindhumidityrain_actualrain_predictedpredict_type
HotStrongHighNoNoTN
MildWeakHighYesYesTP
CoolWeakNormalYesNoFN
MildStrongHighNoYesFP
CoolStrongNormalYesYesTP
HotWeakHighNoNoTN
Dataset for classification task: Predicting Rain based on weather features
Confusion Matrix for Classification Example:
TP: 2FN: 1
FP: 1TN: 2
Confusion matrix derived from the classification dataset.
Calculated Classification Metrics:
MetricValue
Accuracy(2 + 2) / (2 + 2 + 1 + 1) = 0.67
Precision2 / (2 + 1) = 0.67
Recall (Sensitivity)2 / (2 + 1) = 0.67
F1-Score2 * 0.67 * 0.67 / (0.67 + 0.67) = 0.67
Calculated classification metrics based on the confusion matrix.

How to Select Evaluation Metric

  • Accuracy can be misleading for imbalanced datasets.
  • Example: On blood test dataset with 99% healthy and 1% sick, a model that predicts all healthy would have 99% accuracy but it's useless.
  • Choice of metric should align with business goals and problem context.
  • Example on Metric Selection:
    1. Medical Dignostics: FN critical. i.e Don't miss sick patients. [Recall]
    2. Spam Detection: FP critical. i.e Don't block genuine emails. [Precision]
    3. Fraud Detection: Both FP and FN are costly. [F1-Score]
    4. General Classification: Balanced importance. [Accuracy]
    5. Terrorist Detection: FN critical. i.e Don't miss potential threats. [Recall]
    6. Same task can have different optimal metrics based on domain context.