Association Testing in Unrelated Individuals
In association testing we are interested in the effect of a specific allele in the population. We ask the question: "Is allele X1 more common in affected individuals than unaffected individuals?" We do not need family data to answer this, but we can use it if we have it.
Recall that Linkage Disequilibrium is the tendency for alleles at two loci closely linked on a chromosome to be associated in a population. Linkage is done within a family (referring to alleles inherited together), and LD is done within a population (alleles found at two loci together on a haplotype more often than expected). Two loci are linked when the theta score is less than .5, this does not necessarily mean they are in LD.
After many generations disease causing mutations in LD will change due to recombination.

The above graphic represents a mutation on the blue chromosome that over time and recombination through mitosis slowly decreases the LD between the mutation and the surrounding chromosome. 
Specific alleles may be functional or in LD with functional mutations.
    Functional = casual -> the mutation that actually causes the increased/decreased risk/phenotype.
Genetic Association Testing
In genetic association analysis, we are trying to identify association between genotype (SNP) and phenotypes. Some alleles may be functional or in LD with functional mutations, meaning it's not the "true cause" of a particular phenotype.
functional/casual - the mutation that actually causes the increased/decreased phenotype
H0: No association between marker genotypes and phenotype
When the null hypothesis is rejected, we conclude a marker is associated with a phenotype.
The marker may be a functional mutation or in LD with a functional mutation
- Non-synonymous changes, truncation, UTR SNPs may be more likely to be causal ("functional SNPs")
- Synonymous, intronic, IVS SNPs may be less likely to be casual
- Functional studies are usually required to determined whether an associated SNP is functional/casual
- Databases such as ENCODE make it easier to predict if variants have function but its still difficult to be sure for non-exonic variants
Association Testing for Qualitative Traits
- Can be performed within case-control or cohort/population studies
- Case-control commonly used for genetic studies of rare traits
- H0: there is no association between alleles
- Testing methods:
- Chi-square (or Fishers exact) test comparing allelic or genotypic frequencies between cases and controls
- X2 = sum((Oi - Ei)^2/Ei)
- DF = number of genes compared - 1
 
- Logistic regression
- Allows incorporation of covariates
- Convenient framework for exploration of genetic models
 
- For matched case-control studies can perform MH test, conditional logistic regression
 
- Chi-square (or Fishers exact) test comparing allelic or genotypic frequencies between cases and controls
Logistic Regression
- Best for regression under a dichotomous outcome (affected/unaffected), when linear regression is not appropriate.
- We want the response (dependent variable) Y to have two possible values, 1 for effected or 0 for non-effected.
- A regression equation can predict a number between 0 and 1 that could be interpreted as the probability of being affected or the log odds of being affected.
- Regression coefficients are the log odds ratios for each of the independent variables.
The linear logistic model has the form:
- For a dichotomous predictors the beta parameters in a logistic regression are the log-odds ratios for a disease for those with a risk factor (x = 1) vs those without (x = 0)
- For a continuous or ordinal predictor, the parameters are log-odds ratios for a one unit increase in x.
Coding Genetic Variables
We are interested in evaluating if the genotype at a specific marker is associated with being effected by a phenotype.
Consider a SNP with alleles 1 and 2. We have 3 genotypes, where we het is the heterozygote status and hom is homozygote status of allele 2:
Model: ln(odds for affected | geno) = 𝛽0 + 𝛽1Het + 𝛽2Hom
Our test has 2 degrees of freedom with H0: 𝛽1 = 𝛽2 = 0
So the odds ratio comparing genotype 12 to 11: Evaluates to be the same as the e to the power of the coefficient of heterozygote variable. We assume is the same as comparing genotype 22 to 12, since we only consider 1 allele at a time.
Evaluates to be the same as the e to the power of the coefficient of heterozygote variable. We assume is the same as comparing genotype 22 to 12, since we only consider 1 allele at a time.
We can also test if there is a difference between 12 and 22 by creating a dominant and recessive model
Below we look at an additive model where ADD denotes the number of "1" alleles, which we'll assume is more rare than "2" alleles (the reference group):
Interpretation
General model:
"The odds of disease for the 12 genotype is OR12,22 times the odds of the disease for the 22 genotype, and the odds of disease for the 11 genotype is OR11,22 times the odds of disease for the 22 genotype."
Additive model:
"The odds ratio increases multiplicatively by exp(𝛽1) for each additional 1 allele"
"The odds of disease increase by a factor of OR12,22 for each additional 1 allele"
"The odds of disease for the 12 genotype is OR12,22 times the odds of disease for the 22 genotype and the odds of disease for the 11 genotype is OR11,22 times the odds of the dieasedisease for the 12 genotype."
Determining the Type of Model Based on Beta Estimates
Compare the beta coefficients from the general model to determine the best genetic model (general, additive, dominant, recessive).
 1 is dominant because the estimates are nearly the same between heterogygous and homozygous
1 is dominant because the estimates are nearly the same between heterogygous and homozygous
2 is additive because the het geno is roughly half the hom geno
3 is recessive because there is only an effect if there is a 11 allele
4 is general because it doesn't fit any of the above criteria
 
                