Introduction to Longitudinal and Clustered Data

Correlated data occurs in a variety of situations. The four basic types:

  1. Repeated measurements data
  2. Clustered data designs
  3. Spatially correlated data
  4. Multivariate data

Repeated Measurements

Longitudinal data is a response variable collected from the same individuals over a period of time. Special cases may include cross-over designs and parallel group repeated measures design; For example, a two-period, two treatment design design where each individual received each treatment on 2 different occasions. Correlation obtained from the same person or cluster are usually positively correlated.

Clustered Data

Clustered data occurs when observations are grouped in clustered based on a common factor (location, ancestry, clinical factor, etc).

Examples of clustered data include:

Spatially Correlated Data

Spatially correlated data occurs when observations are associated with a specific location. The proximity of locations determines the extent that the observations are correlated.

Examples of spatially correlated data:

Multivariate Data

Multivariate data occurs when two or more response variables are measured per experimental unit or individual. There are several methods that deal with multivariate data, such as discriminant analysis, principal component analysis, or factor analysis.

Explanatory Variable

In correlated data the set of explanatory variables or covariates used to model the mean response can be broadly classified in two categories:

Dependence and Correlation

Two random variables  X and Y with marginal density function fx(X) and fy(Y) are said to be independent if and only if their joint density function can be written as the produce of the two marginals:
    fx,y(X,Y) = fx(X)*fy(Y)

Alternatively X and Y are independent if the conditional distribution of Y given X does not depend on X:
    fy(Y|X) = fy(Y)

Two variables are uncorrelated if:
    E[(Y - μY)(X - μX)] = 0

E[(Y - μY)(X - μX)] is called the covariance, which can take any positive or negative value depending on the units. To make it unit independent and get the correlation we divide it by the standard deviations of the two variables:

image.png
Correlation must be between -1 and 1

Note that independent variables are uncorrelated but variables can be uncorrelated without being independent.

Covariance Matrix

Let Yij be the jth measurement of the ith subject. We collect all observations in a vector (Yi1, Yi2, ... Yip) we define the covariance matrix as the following array of variances and covariances:
image.png
For example, Cov(Yi1, Yi2) = 𝜎12 is the covariance between the first and second repeated measure of the ith subject.

SAS Code

libname S857 'C:\Users\yorghos\Dropbox\Courses\BS857\2021\Datasets';
data lead;
set s857.tlc;
y=y0;time=0;output;
y=y1;time=1;output;
y=y4;time=4;output;
y=y6;time=6;output;
drop y0 y1 y4 y6;
run;

ODS graphics on;
Proc Glimmix data=lead;
class time TRT;
model y =time TRT time*trt;
lsmeans time*trt
/ plots=(meanplot( join sliceby=trt)); 
run;
ODS graphics off;
ods rtf close;

proc corr data=s857.tlc cov;
var y0 y1 y4 y6;
run;

/*Repeated Measures MANOVA*/
proc mixed data=lead;
class id trt time;
model y=trt time trt*time/s chisq;
repeated time/type=un subject=id r rcorr;
run;


proc mixed data=lead method=ML;
class id trt (ref='P') time(ref="0");
model y=trt time trt*time/s  ;
repeated time/type=un subject=id r rcorr ;
estimate 'TRT a time 0' int 1 trt 1 0 time 0 0 0 1 trt*time  0 0 0 1 0 0 0 0;
estimate 'TRT a time 6' int 1 trt 1 0 time 0 0 1 0 trt*time 0 0 1 0 0 0 0 0 ;
estimate 'TRT a time 4' int 1 trt 1 0 time 0 1 0 0 trt*time 0 1 0 0  0 0 0 0;
estimate 'TRT a time 1' int 1 trt 1 0 time 1 0 0 0 trt*time  1 0 0 0 0 0 0 0;

estimate 'TRT Change Time 1 - Time 0' time 1 0 0 -1 trt*time  1 0 0 -1 0 0 0 0;

run;

Revision #11
Created 19 January 2023 19:02:24 by Elkip
Updated 19 January 2023 21:21:05 by Elkip