Advanced Machine Learning

Recall that in an ordinary multiple linear regression, we have a set of p predictor variables measuring some response variable (Y) to fit a model like:

$$ Y = \beta_0 + \beta_1 X_1 ... \beta_p X_p + \epsilon $$

Where beta represents the average effect of a unit increase in the predictor X_n (the n^thpredictor variable) and epsilon is the error term. The value for these beta coefficients is chosen using the least square method, which minimizes the sum of squared residuals (RSS), or the squared difference in observed minus expected outcome value.

$$ RSS = \Sigma (y_i - \hat y_i)^2 $$

Least Absolute Shrinkage and Selection Operator (LASSO)

When variables are highly correlated then coefficient estimates can have large variances leading to poor predictive accuracy.

Lasso regression is a regularization technique for linear regression models. Regularization is a statistical method to reduce errors caused by overfitting on training data. Instead of trying to minimize RSS, Lasso uses the equation:

$$ RSS + \lambda \Sigma | \beta_n | $$

The process:

Start with full model (all possible features)
"Shrink" some coefficients to 0 (exactly)
Non-zero coefficents indicate "selected" features

Traditional ridge regression:

Total Cost = Measure of Fit [RSS(w)] +

Advanced Machine Learning

Least Absolute Shrinkage and Selection Operator (LASSO)

xgboost

SVM