Advanced Machine Learning

Recall that in an ordinary multiple linear regression, we have a set of p predictor variables measuring some response variable (Y) to fit a model like:

$$ Y = \beta_0 + \beta_1 X_1 ... \beta_p X_p + \epsilon $$

Where beta represents the average effect of a unit increase in the predictor X_n (the n^thpredictor variable) and epsilon is the error term. The value for these beta coefficients is chosen using the least square method, which minimizes the sum of squared residuals (RSS).

$$ RSS = \Sigma (y_i - \hat y_i)^2 $$

Least Absolute Shrinkage and Selection Operator (LASSO)

Lasso regression is a regularization technique for linear regression models. Regularization is a statistical method to reduce errors caused by overfitting on training data.

The process:

Start with full model (all possible features)
"Shrink" some coefficients to 0 (exactly)
Non-zero coefficents indicate "selected" features

Traditional ridge regression:

Total Cost = Measure of Fit [RSS(w)] +

Advanced Machine Learning

Least Absolute Shrinkage and Selection Operator (LASSO)

xgboost

SVM