Models for Two-Way Contingency Tables
Recall in the last section Generalized Linear Models (GLMs) were introduced as an extension of the traditional linear model, it eases the assumptions in the following ways:
- Drops the normality assumption
- The response variable is allowed to follow any distribution of the exponential family (binomial, Poisson, negative minomial, gamma, multi-nomial, etc
- Assumes the variance of the response depends on a function of the mean, called a variance function
- The mean of the population is allowed to depend on the linear combination of the predictors through a link function, g
In SAS
- PROC GENMOD
- The GENMOD is a procedure for analyzing generalized linear modes
- PROC LOGISTIC
- The LOGISTIC procedure is constructed for logistic regression and provides useful information
as diagnostic plots, odds ratios and other measures specific to logistic regression models.
- The LOGISTIC procedure is constructed for logistic regression and provides useful information
- PROC CATMOD
- The CATMOD procedure is a procedure designed to fit models to functions of categorical response variables.
All of these procedures report the deviance. PROC LOGISTIC reports AIC and BIC, and it can be calculated with information from PROC CATMOD.
Estimation in Generalized Linear Models
GLMs are estimated with the Maximum Likelihood (ML) method. This chooses the value which makes the observed data the most probable (it is equivalent to the least squares method).
Example: Let τ be the prevalence of a disease in some population. Suppose that a random sample of size 100 is selected and we observe Y = 40 individuals with the disease.
Use the data(Y) to obtain an estimate of τ_hat(Y), assuming τ has good statistical properties. By "good" it means the estimate has little to no bias and small variance.
In this example, if τ = .5 then we would write the likelihood function as:
Pτ (Y = 40) = 100C40 .540 (1 - .5)100 - 40
As a function of τ, the function Pτ(Y = ?) is called the likelihood function