Introduction to Logitudinal and Clustered Data
Correlated data occurs in a variety of situations. The four basic types:
- Repeated measurements data
- Clustered data designs
- Spatially correlated data
- Multivariate data
Repeated Measurements
Longitudinal data is a response variable collected from the same individuals over a period of time. Special cases may include cross-over designs and parallel group repeated measures design; For example, a two-period, two treatment design design where each individual received each treatment on 2 different occasions.
- Repeated observations of the response variable on individuals over multiple occasions or under different experimental conditionals allow direct study of the change of the outcome
- The most common case of repeated measurements are longitudinal data
- Longitudinal data requires special statistical techniques because repeated observations are correlated
Clustered Data
Clustered data occurs when observations are grouped in clustered based on a common factor (location, ancestry, clinical factor, etc).
Examples of clustered data include:
- Paired data:
- Ex. studies on twins where each pair serves as a natural cluster
- Familial studies:
- Ex. Study of cancer with families as clusters
- Randomized clustered clinical trials:
- In a rural area with an endemic disease, randomize whether the whole village will receive intervention, rather than individuals
Spatially Correlated Data
Examples of spatially correlated data:
- Epidemiological studies
- Studies aimed at describing the incidence and prevalence of a particular disease use spacial correlation models in an attempt to smooth out region-specific counts so as to better asses potential environmental determinants and patterns associated with the disease
- Image analysis
- Image segmentation studies where the goal is to extract information about a particular region of interest from a given image
Multivariate Data
Multivariate data occurs when two or more response variables are measured per experimental unit or individual. There are several methods that deal with multivariate data, such as discriminant analysis, principal component analysis, or factor analysis.
- Multivariate repeated measurements
- Any study where we have two or more outcome variables measured repeatedly over time
- Joint modeling of repeated measurements and event-times data
- Studies where draw joint inferences on patient outcomes and any serial trends in a potential biomarker
Explanatory Variable
- Within-unit covariates (time-dependent covariates)
- Sometime that changes over time as the outcomes changes
- Between-unit covariate (time-independent covariate)