Multiple Comparisons and Evaluating Significance

Prior to the GWAS era, genetic association studies were hypothesis driven; Testing markers within/near the gene or region for association. "H0: The trait X is caused/influenced by Gene A." The hypothesis (gene or genes) came from:

Chip-based Genome-wide Association Scans

Candidate
Genome-Wide

Whole Genome or Exome Sequencing

Statistical Significance

There many things to test in genetic association studies:

The multiple tests are often correlated.

Type I error: Null hypothesis of "no association" is rejected, when in fact the marker is NOT associated with that trait.
This implies research will spend a considerable amount of resources focusing on a gene or chromosomal region that is not truly important for your trait.

Type II error: Null hypothesis of "no association" is NOT rejected, when in fact the trait and marker are associated.
This implies the chromosomal region/gene is discarded; a piece of the genetic puzzle remains missing for now.

For a multiple testing problem with m tests:
image-1666377062984.png

Family-wise error rate (FWER) is the probability of at least one type I error; FWER = P(V > 0)

False discovery rate (FDR) is the expected proportion of type I errors among the rejected hypotheses; FDR = E(V/R)
    Assume V/R = 0 when R = 0

Procedures to Control FWER

The general strategy is to adjust p-value of each test for multiple testing; Then compare the adjusted p-values to alpha, so that FWER can be controlled at alpha.

Equivalently, determine the nominal p-value that is required achieve FWER alpha.

Sidák

Sidák adjusted p-value is based on the binomial distribution:

Bonferroni

A simplification of Sidák:image-1666377583820.png

Bonferroni adjusted p-value: 

Below are the individual p-values needed to reject for family-wise significance level=.05
image-1666378611019.png

minP

The probability that the minimum p-value from m tests is smaller than the observed p-value when ALL of the tests are NULL.
image-1666378838710.png

Equivelent to Sidak adjustment if all tests are independent. But for dependent tests, we don't know the distribution of the p-values under the null hypothesis, so we use permutation to determine the distribution.

Adjusted p-value is the probability that the minimum p-value in a resampled data set is smaller than the observed p-value.

This is less conservative than the above two methods, but the results are equal to Sidak when tests are significant.

Permutation is done under the assumption that the phenotype is independent of the genotypes; and phenotypes are permuted with respect to genotype.
Original
image-1666379458667.png
Permuted:
image-1666379478602.png
Genotypes from an individual are kept together to preserve LD

Permutation Procedure

Permutation is computationally expensive, and in some situations it is not possible at all (related individuals, meta-analysis results).

Alternative

Use the Bonferroni or Sidak correction with the "effective number of independent tests" instead of total number of tests. This reduces the number of tests to account for dependence among test statistics. We must approximate the equivalent number of independent tests.

For a single study you can compute the effective number of independent tests based on the genotype data.

Once you have an estimate of meff use it in the Bonferroni or Sidak correction

Another alternative: Extreme Tail Theory Approximation (not covered here)

FWER Summary

Binferroni, Sidak, minP are all single-step adjustments; i.e. all p-values are adjusted in the same manner regardless of their values. This makes them very easy to understand and compute, however it sacrifices power.

Control FWER is very stringent (< 5% chance of a single false positive)

False Discovery Rate (FDR)

Controlling P(V = 0) is too stringent when m is large and you can expect multiple true positives. For better power,  control E(V/R) instead.

E(V/R) = the expected proportion of Type I errors among rejected null hypothesises.

  1. Rank p-values pr1 <= pr2 <= ... prm
  2. Adjusted p-values
    1. p*rm = prm
    2. p*rk = min(p*rk+1, prkm/k)

FDR: Q-Value

For an individual hypothesis test, the minimum FDR at which the test may be called significant is the q-value. It indicates the expected proportion of false-positive discoveries among associations with equivalent statistical evidence.

FDR for a specific p-value threshold t:
image-1666382414483.pngwhere m0 = number of true null hypothesis

We can estimate m0 by m*pi_hat_0; where pi_hat_0 is an estiamte of the proportion of true null hypothesises (could be ~1)

image-1666382536844.png

  1. Rank p-values: pr1 <= pr2 <= ... prm
  2. Estimate the proportion of true null hypothesises pi_hat_0
    1. Using pi_hat_0 = 1 leads to conservative q-values estimates equal to the FDR adjusted p-values
    2. See suggested approach in Story and Tibshirani (2003)
  3. Compute q-values:

    image-1666382645804.png

  4. Reject null hypothesis f q-value <= alpha

Which to use?

Guidelines for Adjusting for Multiple Comparisons

Overall: we are unlikely to have sufficient power to achieve experiment-wide or genome-wide significance with a single study, large number of SNPs

Best choice is meta-analysis and/or replication using independent studies

When combining studies is not an option, report the most promising results based on p-value and other factors

Make available results from all SNP association analyses so that other investigators can attempt to confirm or repliucate your findings.

Choose a threshold BEFORE looking at the data.

Genomic Control

Genomic control was preposed to measure and adjust for modest population structure within a sample in the context of GWAS

 


Revision #5
Created 17 October 2022 23:21:53 by Elkip
Updated 21 October 2022 22:18:41 by Elkip