Skip to main content

Multiple Comparisons and Evaluating Significance

  • In 1978 Restricted Fragment Linked Polymorphisms (RFLPSs) were used for linkage analysis.
  • In 1987 the first human gentic map was created.
  • In 1989 microstellite markeers made genome-wide linkage studies possible.
  • 1990-2003 the human genome project was sequenced.
  • 2002-2006 HapMap project collected sequences in populations to discover variation across the genome.
  • 2006 onward, Genome-Wide Association Studies (GWAS)
  • 2010 onward, large scale custom arrays
  • 2010 onward, sequencing technology becomes affordable
  • Even more WGS projects...
    • ADSP 2012
    • TOPMed 2014
    • CCDG 2014

Prior to the GWAS era, genetic association studies were hypothesis driven; Testing markers within/near the gene or region for association. "H0: The trait X is caused/influenced by Gene A."

The hypothesis (gene or genes) came from:

  • Experiments in other species
  • Known associations with a related trait in humans
  • Linkage analysis localizing trait to a specific chromosomal region

Chip-based Genome-wide Association Scans

  • Hypothesis generating
    • Assumes only that there are genetic effects large enough to find
    • Asks what genes/variants are associated with my trait
  • 500k -> 5 million genes/variants across genome
    • Multiple genome-wide chips availble
    • Varying strategies for SNP selection
    • Imputation allows testing of ungenotyped SNPs
    • Typically GWAS chips have focused on common SNPs with frequency > 1%
Candidate
  • Limits testing to locations of perceived high-prior-probability
  • "If you look under the lampost you can only see what it illuminates"
Genome-Wide
  • Extreme multiple testing - requires large sample size, meta-analysis of multiple studies to overcome
  • Gives an "unbaised" view of the genome
  • Allows unexpected discoveries

Whole Genome or Exome Sequencing

  • Identifies known SNPs (that would be on a chip) but also previously undiscovered variants.
  • Attempts to assay all, or nearly all, variation in genome or exome
    • Whole exome:
      • ~1% of the genome
      • ~30 million bp
      • Number of variants observed depends on sample size and population
    • Whole genome: 3 billion bp,  > 30 million known variants in 1000G project