Skip to main content

Mutiple Linear Regression and Estimation

Multiple Linear Regression analysis can be looked upon as an extension of simple linear regression analysis to the situation in which more than one independent variables must be considered.

The general model with response Y and regressors X1, X2,... Xp:

image-1663279551985.png

Suppose we observe data for n subjects with p variables. The data might be presented in a matrix or table like:

image-1663279646413.png

We could then write the model as:

image-1663279874599.png

We can think of Y and E as (n x 1) vectors when transposed, and β as a ((p + 1) x 1) vector.  p +1 is number of predictors + the intercept. Thus X would be:

image-1663279974357.png

Rhe general linear regression may be written as:

image-1663280050486.png

Or     yi = xi|*β + ∈i

The model is represented as the systematic structure plus the random variation with n dimensions = (p + 1) + { n - (p + 1 ) }

Ordinal Least Squares Estimators

The least squares estimate β_hat of β is chosen by minimizing the residual sum of squares function:

image-1663281298522.png

By differentiating with respect to βi and solve by setting equal to 0:

image-1663281366843.png

The least squares estimate of β_hat of β is given by: 

image-1663281452357.png

and if the inverse exists:

image-1663281474240.png

Fitted Values and Residuals

The fitted values are represetned by Y_hat = X*β_hat

image-1663282777062.png

where the hat matrix is defined as H = X(X|X)-1X|

The residual sum of squares (RSS): 

image-1663282850648.png

Gauss-Markov Conditions

In order for estimates of β to have some desirable statistical properties, we need a set of assumptions referred to as the Gauss-Markov conditions; for all i, j = 1... n

  1. E[∈i] = 0
  2. E[∈i2] = 𝜎2
  3. E[∈ij] = 0, where i != j

Or we can write these in matrix notation as:  E[∈] = 0, E[∈∈|] = 𝜎2*I

The GM conditions imply that E[Y] = Xβ and cov(Y) = E[(Y-Xβ)(Y-Xβ)|] = E[∈∈|] = ∈