Skip to main content

Advanced Machine Learning

Recall that in an ordinary multiple linear regression, we have a set of p predictor variables measuring some response variablesvariable (Y) to fit a model like:

$$ Y = \beta_0 + \beta_1 X_1 ... \beta_nbeta_p X_nX_p + \epsilon $$

Where beta represents the average effect of a unit increase in the predictor Xn (the nth predictor variable) and epsilon is the error term. The value for these beta coefficients is chosen using the least square method, which minimizes the sum of squared residuals (RSS).

$$ RSS = \Sigma (y_i - \hat y_i)^2 $$

Least Absolute Shrinkage and Selection Operator (LASSO)

Lasso regression is a regularization technique for linear regression models. Regularization is a statistical method to reduce errors caused by overfitting on training data.

The process:

  • Start with full model (all possible features)
  • "Shrink" some coefficients to 0 (exactly)
  • Non-zero coefficents indicate "selected" features

Traditional ridge regression: 

Total Cost = Measure of Fit [AKA RSS(w)] + 

xgboost

SVM