Skip to main content

Advanced Machine Learning

Recall that in an ordinary multiple linear regression, we have a set of p predictor variables measuring some response variable (Y) to fit a model like:

$$ Y = \beta_0 + \beta_1 X_1 ... \beta_p X_p + \epsilon $$

Where beta represents the average effect of a unit increase in the predictor Xn (the nth predictor variable) and epsilon is the error term. The value for these beta coefficients is chosen using the least square method, which minimizes the sum of squared residuals (RSS).

$$ RSS = \Sigma (y_i - \hat y_i)^2 $$

Least Absolute Shrinkage and Selection Operator (LASSO)

Lasso regression is a regularization technique for linear regression models. Regularization is a statistical method to reduce errors caused by overfitting on training data.

The process:

  • Start with full model (all possible features)
  • "Shrink" some coefficients to 0 (exactly)
  • Non-zero coefficents indicate "selected" features

Traditional ridge regression: 

Total Cost = Measure of Fit [RSS(w)] + 

xgboost

SVM