Features of GLM and Marginal Methods
In many biomedical applications outcomes are binary, ordinal or a count. In such cases we consider extension of generalized linear models for analyzing discrete longitudinal data. These non-linear models require that a linear transformation of the mean response can be modeled in a regression setting. The non-linearity raises issues with the interpretation of the regression coefficients.
We let Yi denote the response variable for the ith subject, and:
is a p*1 vector of covariates. A generalized linear model for for Yi needs the following three-part specification:
1. A Distributional Assumption
Generalized linear models assume that the response variable has a probability distribution belonging to the exponential family (normal, bernoulli, binomial or Poisson). A feature of the exponential family is the variance can be expressed as:
Where phi is a dispersion parameter and v(μi) is the variance function. For example:
- Variance function of normal distribution: v(μ) = 1
- Variance function of Bernoulli: v(μ) = μ(1 - μ)
2. A Link Function
The link function g(.) applies to the mean and then links the covariates to the transformed mean η such that:
For example, the canonical link functions for some common distributions are:
3. A Systematic Component
The systematic component specifies the effects of the covariates Xi on the mean of Yi can be expressed in terms of the following linear predictor:
Note that the term 'linear' refers to the regression parameters.
Binary response
Let Yi denote a binary response variable with two categories such as presence or absence of a disease. The probability distribution is Bernoulli with Pr(Yi = 1) = μi and Pr(Yi = 0) = (1 - μi). Using the logit as the link function we have:
Where μi / (1 - μi) are the odds of success
A unit change of Xik changes the odds of success multiplicitively by a factor of exp(βk).
The logistic regression model can be derived from the notion of a latent variable model. Suppose that Li is a latent continuous variable which follows a standard logistic distribution (0, π2/3) and that a positive response is observed only when Li exceeds some threshold τ , such that:
It can be shown that: