Multiple Imputation and Weighting Methods
If no missing data is present our statistical methods provide valid inference only if the following assumptions are met:
- For Generalized Estimating Equations, the mean function is correctly specified
- For likelihood-based methods, the probability density function including the mean and variance are correctly specified
Missing data can seriously compromise inferences from randomized clinical trials, especially when handled incorrectly, but inference is still possible with the correct methods.
Missing values in longitudinal studies may occur intermittently when individuals miss one or more planned visits, or drop out early.
Types of Missing Data
- Missing Completely at Random (MCAR) - Missingness is independent of both observed and unobserved data. More formally, the probability of missing data in Y is unrelated to the value of Y itself or any other variable X. However, it does allow for the possibility that missingness is Y is related to missingness in some other variable X.
- Ex. In determining predictors of income, MCAR assumption would be violated if people who reported income were on average younger than the people who did report it.
- Missing at Random (MAR) - Missingness is independent of missing responses after controlling for other variables X. Formally: P(Y missing | Y,X) = P(Y missing | X)
- Ex. The MAR assumption is satisfied if the probability of missing data on income depended on a person's age, but within each age group the probability of missing income was unrelated to income. Obviously, this cannot be tested as we do not know the missing values of the data.
- Missing Not at Random (MNAR) - Missing value depend on unobserved values.
- Ex. High income people are less likely to report their income.
- Also referred to as non-ignorable missing or informative dropout