Module 14: Topics in Linear Regression
Assumption in Linear Regression:
- Independence between observations
- Normally distributed Y for any fixed value of X
- Homoscedasticity - the variance of Y is the same for any value of X
- Linearity between X and Y (obviously)
Also, the model is also only generalizable to the population within the range of observed values of X.
The way we collect data or which data we collect determines independence. Examples of violated independence:
- Data comes from individuals that are related closely
- Multiple observations were collected over time for the same subject
In order to test linearity, we would expect a graph of the residuals to be randomly scattered around the horizontal line. If there is a pattern to the residuals, the trend cannot be assumed to be linear.
When talking about homoscedasticity, we require variance of error terms to be similar across independent variables.