Model Evaluation Techniques

Evaluating ML model

Evaluation Steps:

Common Regression Metrics:

Metric	Description	Formulae
MAE: Mean Absolute Error	Average of absolute error.	`Σ(\|ya - yp\|)/n`
MSE: Mean Squared Error	Average of squared errors.	`Σ(ya - yp)²/n`
RMSE: Root Mean Squared Error	Square root of MSE.	`√MSE`
R² (R-squared)	Variation in dependent variable that is explained by independent variables.	`1 - (Σ(ya - yp)² / Σ(ya - y_mean)²)`

Common regression metrics used to evaluate model performance.

Calculating Regression Metrics Example:

Years	Company	Position	Salary	Predicted	\|err\|	err²
5	Google	Developer	1000	1100	100	10000
8	Microsoft	Data Engineer	1500	1300	200	40000
1	Microsoft	Data Engineer	1100	1150	50	2500
2	Google	Developer	800	900	100	10000

Dataset for regression task: Predicting Salary based on features

Summarizes classification performance with counts of: TP => Model correctly predicts positive. TN => Model correctly predicts negative. FP => Model predicts positive but it was negative (Type-I error). FN => Model predicts negative but it was positive (Type-II error).
Using it, we can calculate metrics accuracy, precision, recall, F1-score.

Confusion Matrix Example:

TP: 50	FN: 10
FP: 5	TN: 100

Confusion matrix for a binary classification problem.

Common Classification Metrics:

Metric	Description	Formulae
Accuracy	Proportion of correct predictions.	`(TP + TN) / (TP + TN + FP + FN)`
Precision	Out of predicted positives, how many were correct?	`TP / (TP + FP)`
Recall (Sensitivity)	Out of total positives, how many model catch?	`TP / (TP + FN)`
F1-Score	Harmonic mean of precision and recall.	`2 * prcsn * rcl / (prcsn + rcl)`

Common classification metrics used to evaluate model performance.

Calculating Classification Metrics Example:

Temperature	Wind	humidity	rain_actual	rain_predicted	predict_type
Hot	Strong	High	No	No	TN
Mild	Weak	High	Yes	Yes	TP
Cool	Weak	Normal	Yes	No	FN
Mild	Strong	High	No	Yes	FP
Cool	Strong	Normal	Yes	Yes	TP
Hot	Weak	High	No	No	TN

Dataset for classification task: Predicting Rain based on weather features

Confusion Matrix for Classification Example:

TP: 2	FN: 1
FP: 1	TN: 2

Confusion matrix derived from the classification dataset.

Calculated Classification Metrics:

Calculated classification metrics based on the confusion matrix.

Accuracy can be misleading for imbalanced datasets.
Example: On blood test dataset with 99% healthy and 1% sick, a model that predicts all healthy would have 99% accuracy but it's useless.
Choice of metric should align with business goals and problem context.

Example on Metric Selection:

Medical Dignostics: FN critical. i.e Don't miss sick patients. [Recall]
Spam Detection: FP critical. i.e Don't block genuine emails. [Precision]
Fraud Detection: Both FP and FN are costly. [F1-Score]
General Classification: Balanced importance. [Accuracy]
Terrorist Detection: FN critical. i.e Don't miss potential threats. [Recall]
Same task can have different optimal metrics based on domain context.