Linear Regression

Linear Regression Introduction

  • Predicts continuous target variable based on linear relationship with features.
  • Model is represented as y = β0 + β1x1 + β2x2 + ... + βnxn.
  • Coefficients (β) are estimated using least squares method to minimize the sum of squared residuals.
Linear Regression
Linear Regression: Fitting a line to data points.

Deriving Coefficients in Linear Regression (2D)

  • For simple linear regression with one feature (x) and one target variable (y). The model is y = β0 + β1x.
  • Assume we have n data points (x1, y1), (x2, y2), ..., (xn, yn).
  • The goal is to find β0 and β1 that minimize the sum of squared residuals.
  • For x1, the predicted value by model is y_pred = β0 + β1x1.
  • The residual for x1 is e1 = y1 - y_pred = y1 - (β0 + β1x1) = y1 - β0 - β1x1.
  • The sum of squared residuals is RSS = Σ(ei²) = Σ(yi - β0 - β1xi)².
  • To minimize RSS, we take partial derivatives w.r.t β0 and β1 and set them to zero.
  • Partial Derivatives w.r.t. β0:
    1. ∂RSS/∂β0 = 2*Σ(yi - β0 - β1xi) * (-1) = 0 or Σ(yi - β0 - β1xi) = 0 or Σyi - n*β0 - β1Σxi = 0 or nβ0 = (Σyi - β1Σxi) or β0 = (Σyi - β1Σxi) / n => β0 = Ȳ - β1X̄
    Partial Derivatives w.r.t. β1:
    1. ∂RSS/∂β1 = 2*Σ(yi - β0 - β1xi) * (-xi) = 0 or Σ(yi - β0 - β1xi) * xi = 0 or Σ(xi*yi) - β0Σxi - β1Σ(xi)² = 0 => β1Σ(xi)² = Σ(xi*yi) - β0Σxi
  • On solving these two equations, we get: β1 = COV(X, Y) / VAR(X) β0 = Ȳ - β1X̄

Hyperparameters in Linear Regression

Linear Regression Hyperparameters and their Effects:
HyperparameterDescriptionEffect on Model
fit_interceptWhether to calculate the intercept for this model.True => model includes intercept (β0), False => model does not include intercept.
positiveWhen set to True, forces the coefficients to be positive.True => coefficients constrained to be positive (better interpretability), False => coefficients can take any value (allows for more flexibility).
Linear Regression has no hyperparameters to tune, but regularization techniques can be applied to prevent overfitting.

Regularization in Linear Regression

  • Regularization adds a penalty term to the loss function to prevent overfitting.
Regularization Techniques for Linear Regression:
TechniqueDescriptionEffect on Model
Lasso Regression (L1)Adds a penalty equal to the absolute value of the magnitude of coefficients.Encourages sparsity, can perform feature selection by shrinking some coefficients to zero.
Ridge Regression (L2)Adds a penalty equal to the square of the magnitude of coefficients.Encourages smaller coefficients, reduces model complexity, can handle multicollinearity.
Elastic NetCombines L1 and L2 penalties.Balances between Ridge and Lasso, can handle correlated features and perform feature selection.
Common regularization techniques for linear regression include Ridge (L2), Lasso (L1), and Elastic Net.