Interactions in Genetic Assocation Analysis

Statistical interaction occurs between two factors if their combined effect is different than what would be expected based on their individual separate effects. In genetics, differences in risk or mean phenotype between genotypes vary according to the exposure (interacting) variable.

Simple example: Phenylketonuria (PKU) is a genetic defect that causes severe intellectual disability only in the presence of dietary phenylalalnine. Every baby is tested for this trait to see if they need a special diet.

Modeling Statistical Interaction

We need a model to form an expectation for the joint effect, which we can use to define the interaction. The definition of interaction relies on some specification of non-interactive effects.

Let G_i be the genotype for person i, and E_i be an environmental factor for a person i

H₀: β_GE = 0; No interaction, the measure of association β_G between the distribution of the phenotype Y and the genotype G does not depend on E.

Model interaction is always written as a departure from additive effects
For both logistic and linear regression, this translates to departure from additivity of effects
For logistic regression
- Interaction == departure from additivity of log(ORs)
- This is the same as departure from multiplicative OR effects
- Interaction in logistic model is mutiplicaative interaction on the scale of the odds ratio
  - OR _GE != OR_g*OR_E
  - OR_G,E=1 != OR_G,E=0

Example logistic model with interaction (CP = cleft palate):

So there are 2 interpretations of β_GE:
1. The difference between the Environment logOR when G = 1 vs when G = 0
2. The difference between the Genotype logOR when E = 1 vs when E = 0

Overall we define β_GE is the difference between the observed ln(OR) and the ln(OR) expected when the G and E act additively
When β_GE = 0 (exp(β_GE) = 1):
- no departure from additivity
- the genotypic OR in the two groups E = 1 and E = 0 are the same
- the environment odds ratios in the two groups G = 1 and G = 0 are the same

If there is interaction, we should not use summary measures (OR & genotypic means) for the marker or the environmental factor alone. We may improve power to identify genes and SNPs by testing within subgroups of the interacting exposure, or just testing for interaction.

Detecting Interactions: Power

Even in the GWAS era, there are very few well-known examples of interaction of GxE that have been identified. This is in part due to the fact that tests for interaction are not as powerful as tests for main effect association.

We need much larger sample sizes to detect interaction than main effects, at least 4x the size is needed for same effect. Due to lack of power published reports of GxE interactions may be more prone to publication bias (although usually interaction is not the primary hypothesis in a study). It is better to publish all results to avoid publication bias, but this also leads to many published interactions that have not been replicated.

This power issue is particularly a problem when a large number of markers are tested; as always multiple testing adjustment is required to preserve family-wise error rate. We'll focus on two strategies for maximizing power when conducting GxE GWAS have been proposed: Case-only designs and two-stage screening procedures

Case-Only Design

If G and E are independent (depending on design, in population or among controls) cases from a case-control study can be used to estimate the multiplicative GxE interaction.

When G and E are independent in the controls, the OR for the G-E association among cases equals the multiplicative interaction between E and G. ??Improved precision: acts like a case-control or cohort study with an infinite number of controls.??

The interaction OR is the ratio of G-E OR in cases to G-E OR in controls.

If G and E are independent in the controls then the joint probability P(GE | D) = P(G | D)*P(E | D) and the ratio of ORs is just the E-G OR among cases.

We can test for G-E interaction by looking at the G-E association in cases.

Advantages

Can be inexpensive for exploratory analyses
Useful for case tissue specimens archived from earlier studies
Power for the case-only interaction test is much greater than for interaction test in case-control study with same number of cases
When independence assumption is met, case-only design is much more powerful than case-control logistic regression test of interaction

Limitations

Provides non information about main effects
The interaction estimate can be biased if the assumption of G-E independence is violated

Two-Stage Screening Procedure

A screening procedure gives us a way to select only a small subset of the SNPs m << M to test for G-E interaction. Slide 45