# Midterm Cheat Sheet
**Linear Regression** [![image-1666104229251.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104229251.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104229251.png) [![image-1666104239088.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104239088.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104239088.png) [![image-1666104245811.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104245811.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104245811.png) [![image-1666104252727.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104252727.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104252727.png) [![image-1666104260540.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104260540.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104260540.png) [![image-1666104271056.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104271056.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104271056.png) [![image-1666104276418.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104276418.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104276418.png) [![image-1666104282605.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104282605.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104282605.png) [![image-1666104287023.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104287023.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104287023.png) Predicting a CI new obs adds a 1 to se(y): 𝛽0 + 𝛽2x +/- t\*[![image-1666104297248.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104297248.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104297248.png) **Multiple Linear Regression and Estimation** [![image-1666104323618.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104323618.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104323618.png) [![image-1666104327905.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104327905.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104327905.png) [![image-1666104332246.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104332246.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104332246.png) [![image-1666104336362.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104336362.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104336362.png) [![image-1666104345652.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104345652.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104345652.png) [![image-1666104350789.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104350789.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104350789.png) [![image-1666104355280.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104355280.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104355280.png) [![image-1666104359234.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104359234.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104359234.png) [![image-1666104363437.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104363437.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104363437.png) [![image-1666290673808.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666290673808.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666290673808.png) [![image-1666290691336.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666290691336.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666290691336.png) 𝐻0 : 𝛽1 = 𝛽2 = 𝛽3 = β‹― = 𝛽𝑝 = 0 v.s. 𝐻1 : not all π›½π‘˜ = 0, π‘˜ = 1, … , 𝑝 [![image-1666104389032.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104389032.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104389032.png)[![image-1666104395117.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104395117.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104395117.png) rejection rule of 𝑑 >= t(1 βˆ’ alpha/2; 𝑛 βˆ’ 𝑝 βˆ’ 1) [![image-1666104407108.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104407108.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104407108.png) [![image-1666104410593.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104410593.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104410593.png) [![image-1666104415116.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104415116.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104415116.png) [![image-1666104423017.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104423017.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104423017.png)
**Model Fitting: Inference** [![image-1666104576887.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104576887.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104576887.png) dfΞ© = n - p, and dfπœ” = n – q [![image-1666138531298.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666138531298.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666138531298.png) Reject the null hypothesis if F > FΞ± p - q, n – p [![image-1666104643517.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104643517.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104643517.png) [![image-1666138340542.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666138340542.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666138340542.png) **Dummy Variables and Analysis of Covariance** Consider a Xi2 for which is 0 for – and 1 for +: [![image-1666104607845.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104607845.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104607845.png) An interaction between Xi1 and Xi2: [![image-1666104615922.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104615922.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104615922.png) A model with multiple categorical variables: [![image-1666104625980.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104625980.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104625980.png) [![image-1666104633030.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104633030.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104633030.png)
**Regression Diagnostics** Assumptions: β€’ Error: ~ N(0, SD2I); β—¦ Independent β—¦ Equal Variance β—¦ Normally Distributed β€’ Model: E\[y\] = XΞ² is correct β€’ Unusual observations Leverage Points: data point with unusual x-value [![image-1666104773344.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104773344.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104773344.png) The Hat Matrix – n\*n matrix hii is the leverage of the ith case leverage > 2p’/n should be looked at closely Outliers: Unusual observation on x or y axis [![image-1666104790374.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104790374.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104790374.png) Calculate the t-test and compare abs with limit: abs(qt(.05/(n\*2), df = n - pprime - 1, lower.tail = T)) Influential Points: causes changes to regression Difference in Fits: [![image-1666104815733.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104815733.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104815733.png) with a threshold of [![image-1666104825747.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104825747.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104825747.png) Where p’ is the number of parameters Cook's Distance: [![image-1666104834542.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666104834542.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666104834542.png) with a threshold of Di > 4/n should be looked at Di > .5 possible influence Di >= 1 very influential Error: a plot of e\_hat should β€’ have constant variance β€’ have no clear pattern β€’ H0: residuals are normal Shapiro-Wilk normality test H0: Residuals are normally distributed Bonferroni Correction: Divide alpha by n
**Variable Selection** Backwards Elimination: 1. Start model with all the predictors 2. Remove the predictor with highest p-value greater than alpha 3. Refit the model 4. Remove the remaining least significant predictor provided its p-value is greater than alpha 5. Repeat 3 and 4 until all "non-significant" predictors are removed Cutoff p significance can be 15-20% for testing Forward Selection: 1. Start model with no predictors 2. For predictors not in the model, check the p-value if they are added to the model. We choose the one with lowest p-value less than alpha 3. Continue until no new predictors can be added Stepwise regression: A combination of the two Selection Criteria: Akaike Information Criterion (AIC): β€’ -2 max log-likelihood + 2p' β€’ n\*log(RSS/n) + 2p' Bayes Information Criterion (BIC): β€’ -2 max log-likelihood + p' log(n) β€’ n\*log(RSS/n) + log(n) \* p' Adjusted R2: R2 = 1 – RSS/SSY [![image-1666202623504.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666202623504.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666202623504.png) Mallow’s Cp Statistic: Avg MSE of prediction [![image-1666202639730.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666202639730.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666202639730.png)If a p-predictor fits then: [![image-1666202661358.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666202661358.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666202661358.png) We desire models with small p and Cp around or less than p
**R Code Snippets**
\# Model with only beta\_0 sr\_lm0 <- lm(y ~ 1, data=sr) \# Full model sr\_lm1 <- lm(y ~ ., data=sr) sr\_syy <- sum((savings$sr - mean(savings$sr))^2) sr\_rss <- deviance(sr\_lm1) \# F = ((SYY -RSS)/((n-1) - (n-2))) / (RSS / (n - 1)) sr\_num <- (sr\_syy - sr\_rss)/(df.residual(sr\_lm0) - df.residual(sr\_lm1)) sr\_den <- sr\_rss / df.residual(sr\_lm1) sr\_f <- sr\_num / sr\_den \# dfΞ© = n - p, and dfπœ” = n - q pf(sr\_f, df.residual(sr\_lm0) - df.residual(sr\_lm1), df.residual(sr\_lm1), lower.tail = F) \# Ξ²=(XI X)βˆ’1 XIY beta <- solve(t(x)%\*%x)%\*%(t(x)%\*%y) \# Pearson's cor(lin\_reg$fitted.values, lin\_reg$residuals, method="pearson") \# Stratify variables by a factor by(depress, depress$publicassist, summary) \# Welsh's Two Sample T-test \# For difference in means t.test(assist$cesd, noassist$cesd) \# or t.test(data.y ~ factor) \# CI of LS means based on covariates library(lsmeans) lsmeans(reg, ~Type) \# Apply a mean function to an array \# split on a factor tapply(assist$cesd, assist$assist, mean) \# When a regression factor has \# more than two categories reg <- lm(Pulse1 ~ Height + Sex + Smokes + as.factor(Exercise)) \# Cook's Distance cook <- cooks.distance(reg) cook\[cook > 4/n\] \# Shapiro Test for normallity shapiro.test(reg$residuals) \# Studentized residuals stud <- rstudent(reg) \# Threshold for lower tail of \# studentized resids with correction lim = abs(qt(.05/(n\*2), df = n - pprime - 1, lower.tail = T)) stud\[which(abs(stud) > lim)\] \# Hat values hat <- hatvalues(reg) lev <- 2 \* pprime / n hat\[hat > lev\] \# Forward selection forward <- ~ year + unemployed + femlab + marriage + birth + military m0 <- lm(divorce ~ 1, data = usa) reg.forward.AIC <- step(m0, scope = forward, direction = "forward", k = 2) n <- nrow(usa) \# AIC = n\*log(RSS/n) + 2p' n\*log(162.1228/n)+2\*6 extractAIC(reg.forward.AIC, k=2) \# BIC reg.forward.BIC <- step(m0, scope = forward, direction = "forward", k = log(n)) extractAIC(reg.forward,k=log(n)) \# BIC = n\*log(RSS/n) + p'\*log\*n) n\*log(162.1228/n)+6\*log(n) library(leaps) leaps <- regsubsets(divorce ~ .) rs <- summary(leaps) par(mfrow=c(1,2)) plot(2:7, rs$cp, xlab="No. of parameters", ylab="Cp Statistic") abline(0,1)
[![image-1666105013557.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666105013557.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666105013557.png) [![image-1666105036225.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666105036225.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666105036225.png)[![image-1666105020734.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/scaled-1680-/image-1666105020734.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-10/image-1666105020734.png)