Association Testing in Related Individuals

Family data is correlated that could lead to inflation in test statistics if not accounted for. Many genetic studies contain related individuals. Family-based studies were developed to avoid bias due to population structure; Biological family members are genetically matched.

Most behavioral and environmental factors do not alter DNA, so SNP associations are unlikely to be confounded EXCEPT confounding by ancestry.

image-1667394044217.png

Population Stratification Bias

Designs for Family-Based Studies

We'll mostly focus on case-parent trios and TDT tests.

Case-Parent Trios

In this design we collect samples of trios; Two parents and one affected child (affection status of parents are not considered). The idea is under Mendels laws, heterozygous parents transmit each allele with equal probability (1/2) regardless of population structure. If there is preferential transmission of a specific allele from parents to affected offspring it indicates that allele is associated with the disease.

Transmission Disequilibrium Test (TDT)

Tests whether a particular marker allele is transmitted to affected offspring more frequently than expected. Non-transmitted alleles act as matched controls.

H0: No association or no linkage
Ha: Variant is associated with disease/trait (and is not due to population structure)

Notation:

image-1667395369592.png

Under the null we would expect:
image-1667395321311.png
image-1667395351114.png

The test statistic follows a McNemar test:
image-1667395577062.png

We observe this is only a function of the heterozyzgous parents, homozygous parents provide no information.

Discordant Siblings

General Family Structures

Unconditional Tests

Review: For unrelated (independent) subjects we can use standard linear or logistic regression. Including related individuals in the analysis violates the assumption of independence, causing inflation in the test statistics.

image-1667397203470.png
Where Xi1 is the genotype of the ith person and beta1 is the fixed affect of the SNP. Xi2 and Xi3 are covariates.

image-1667397383331.png
We can see the mixed effects model adds a G variable to account for genetic variance component.

For association analysis of quantitative traits, if X1 is the SNP then the test H0: Beta1 = 0 is a test of association just as for the linear model for unrelated subjects. The regression estimate beta is exactly the same as in simple linear regression.

In either case G is not tested for significance. If we want to test for heritability we estimate variance of G.

Common software: GCTA, SOLAR, R GMMAT package, Genesis R/Bioconductor package.

Dichotomous Traits

In this course we focus on quantitative traits to keep things simple.



Revision #4
Created 2 November 2022 12:47:04 by Elkip
Updated 6 November 2022 15:05:09 by Elkip