# Mutiple Linear Regression and Estimation

Multiple Linear Regression analysis can be looked upon as an extension of simple linear regression analysis to the situation in which more than one independent variables must be considered.

The general model with response Y and regressors X<sub>1</sub>, X<sub>2</sub>,... X<sub>p</sub>:

[![image-1663279551985.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663279551985.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663279551985.png)

Suppose we observe data for *n* subjects with *p* variables. The data might be presented in a matrix or table like:

[![image-1663279646413.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663279646413.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663279646413.png)

We could then write the model as:

[![image-1663279874599.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663279874599.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663279874599.png)

We can think of Y and E as (n x 1) vectors when transposed, and β as a ((p + 1) x 1) vector. p +1 is number of predictors + the intercept. Thus X would be:

[![image-1663279974357.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663279974357.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663279974357.png)

The general linear regression may be written as:

[![image-1663280050486.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663280050486.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663280050486.png)

Or **y<sub>i</sub> = x<sub>i</sub><sup>|</sup>\*β + ∈<sub>i</sub>**

The model is represented as the systematic structure plus the random variation with n dimensions = (p + 1) + { n - (p + 1 ) }

#### Ordinal Least Squares Estimators

The least squares estimate β\_hat of β is chosen by minimizing the residual sum of squares function:

[![image-1663281298522.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663281298522.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663281298522.png)

By differentiating with respect to β<sub>i</sub> and solve by setting equal to 0:

[![image-1663281366843.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663281366843.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663281366843.png)

The least squares estimate of β\_hat of β is given by:

[![image-1663281452357.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663281452357.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663281452357.png)

and if the inverse exists:

[![image-1663281474240.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663281474240.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663281474240.png)

#### Fitted Values and Residuals  


The fitted values are represetned by Y\_hat = X\*β\_hat

[![image-1663282777062.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663282777062.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663282777062.png)

where the **hat matrix**  is defined as H = X(X<sup>|</sup>X)<sup>-1</sup>X<sup>|</sup>

The residual sum of squares (RSS):

[![image-1663282850648.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663282850648.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663282850648.png)

#### Gauss-Markov Conditions

In order for estimates of β to have some desirable statistical properties, we need a set of assumptions referred to as the Gauss-Markov conditions; for all i, j = 1... n

1. E\[∈<sub>i</sub>\] = 0
2. E\[∈<sub>i</sub><sup>2</sup>\] = 𝜎<sup>2</sup>
3. E\[∈<sub>i</sub>∈<sub>j</sub>\] = 0, where i != j

Or we can write these in matrix notation as: E\[∈\] = 0, E\[∈∈<sup>|</sup>\] = 𝜎<sup>2</sup>\*I

The GM conditions imply that E\[Y\] = Xβ and cov(Y) = E\[(Y-Xβ)(Y-Xβ)<sup>|</sup>\] = E\[∈∈<sup>|</sup>\] = ∈

Under the GM assumptions, the LSE are the Best Linear Unbiased Estimators (BLUE). In this expression, "best" means <span style="text-decoration: underline;">minimum variance</span> and linear indicates that the estimators are <span style="text-decoration: underline;">linear functions of y.</span>

The LSE is a good choice but it does require the errors are uncorrelated and have equal variance. Even if the errors behave, then nonlinear or biased estimates may work better.

#### Estimating Variance

By definition:

[![image-1663284680441.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663284680441.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663284680441.png)

We can estimate variance by an average from the sample:

[![image-1663284717348.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663284717348.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663284717348.png)

Under GM conditions, s<sup>2</sup> is an <span style="text-decoration: underline;">**unbiased estimate** </span>of variance.

#### Total Sum of Squares

Total sum of squares is Syy = SSreg + RSS

The corrected total sum of squares with n-1 degrees of freedom:

[![image-1663284931783.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663284931783.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663284931783.png)

where J is an n x n matrix of 1s and H = X(X<sup>|</sup>X)<sup>-1</sup>X<sup>|</sup>

#### Regression and Residual Sum of Sqaures

**Regression** sum of squares represent the number of X variables:

[![image-1663285018389.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663285018389.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663285018389.png)

**Residual** Sum of Squares with n - (p + 1) degrees of freedom:

[![image-1663285075431.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663285075431.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663285075431.png)

#### F Test for Regression Relation

To test whether there is a regression relation between Y and a set of variables X, use the hypothesis test:

𝐻0 : 𝛽1 = 𝛽2 = 𝛽3 = ⋯ = 𝛽𝑝 = 0 v.s. 𝐻1 : not all 𝛽𝑘 = 0, 𝑘 = 1, … , 𝑝

We use the test statistic:

[![image-1663285544018.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663285544018.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663285544018.png)

The chances of a Type I error is alpha, our degrees of freedom is n - p -1

#### The Coefficient of Determination

Recall this measures the model of fit by the proportionate reduction of total variation in Y associated with the use of the set of X variables.

R<sup>2</sup> = SSreg / Syy = 1 - RSS / Syy

In a multi-linear regression we must adjust the coefficient by the associated degrees of freedom:

[![image-1663285802285.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663285802285.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663285802285.png)

Add more independent variables to the model can only increase R<sup>2</sup>, but R<sup>2</sup><sub>alpha</sub> may become smaller when more independent variables are introduced, as the decrease in RSS may be offset by the loss of degrees of freedom.

#### T Tests and Intervals  


Tests for β<sub>k</sub> are pretty standard:

[![image-1663286153070.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663286153070.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663286153070.png)

With the rejection rule of 𝑡 &gt;= t(1 − 𝛼/2; 𝑛 − 𝑝 − 1)

And likewise confidence limits for β<sub>k</sub> and 1 - alpha confidence

[![image-1663286260560.png](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/scaled-1680-/image-1663286260560.png)](https://bookstack.mitchellhenschel.com/uploads/images/gallery/2022-09/image-1663286260560.png)