Gamma Regression

Consider a continuous dependent variable that is positive-valued, such as a length of a hospital stay, time waiting or the cost of a bill. This type of data is continuous in nature and oftentimes skewed and a normal approximation does not hold.

The type of data above presents a constant Coefficient of Variation (CV), that is:

$$ \sqrt{var(Y_i)} \over E(Y_i) = \sigma $$

The identity induces a quadratic variance function:

$$ var(Y_i) = \sigma^2E(Y_i)^2 $$

To model such data a number of approaches proposed in the literature have proved useful, including Log-Normal models and Gamma Regression models

Log Normal models - the log transformation followed by a classical linear model is fairly successful in modeling this type of data. This approximation works best when the scale parameter (sigma) is small. Indeed, the log transformed model has approximately constant variance:
$$ log(Y) \approx log(\mu) + (Y - \mu) {{\delta log(y)} \over {\delta y}}(\mu) = log(\mu) + {{Y - \mu} \over \mu} $$
$$ Var(log(Y)) \approx {Var(Y) \over \mu^2 } = {{\sigma^2\mu^2} \over \mu^2 }= \sigma^2 $$
Gamma Regression - the Gamma regression keeps the outcome on the original scale. If one wants to work on the original scale the framework of the generalized linear model proves very fruitful.

Gamma Distributions

The Gamma family is a very flexible family of distributions with support on the positive axis. The family is indexed by two parameters. One way to parameterize which is used in SAS and focused on in this lecture is called the mean parameterization.

A variable is said to follow the Gamma distribution if the density has the form:

$$ f_Y(y) = {1 \over {\Gamma (v)y} }({{yv \over \mu}})^v exp(-{{yv} \over \mu}), 0 < y < /\infty $$

where

$$ \Gamma(v) = \int_0^\infty X^{v-1}exp(-x)dx $$