Advanced Search
Search Results
165 total results found
Surveillance Defined
Surveillance is the ongoing systematic collection, analysis and interpretation of outcome-specific data for use in the planning, implementation and evaluation of public health practice. Surveillance can have a negative connotation, but we can use it to: Ide...
Association Testing in Related Individuals
Family data is correlated that could lead to inflation in test statistics if not accounted for. Many genetic studies contain related individuals. Family-based studies were developed to avoid bias due to population structure; Biological family members are genet...
Haplotypes and Imputation
When multiple markers/SNPs are genotyped in a gene or gene region, the SNPs may be in linkage disequilibrium (LD). Each individual test of association with a marker is correlated with all tests for other markers in LD with that marker. So, instead of testing ...
Principal Component Analysis
The goal of supervised learning methods (regression and classification) is to predict outcome/response variable Y using a set of p features (X1, X2... Xp) measured on n observations. We train the machine on 'labeled' data to predict outcomes for unforeseen dat...
Survival Analysis II
In survival data the dependent variable is always survival time (or time until an event with the potential to censor observations). The independent variable can be any type; Continuous, ordinal or categorical. We assume the observations are independently and r...
Power and Sample Size Calculations for Association Studies
Review of errors and difference in means/proportions:Type II error is represented by beta and type I error as alphaWhere Z1 - (alpha /2)is the Z-value that creates the target value of alpha in the tail of the distribution. Power decreases as alpha increases. ...
Sampling
Types of probability samples: Simple random sample - everyone in the population has equal likelihood of being selected The most effective, but often hardest to execute Stratified random sample - we create strata based on some factor and take a rando...
Model Fit & Concepts of Interaction
Review: Logistic and Proportional Hazards Regression Model Selection You can look at changes in the deviance (-2 log likelihood change) Deviance - Residual sum of squares with normal data Problem: Deviances alone do not penalize "model complexity" (henc...
Intro to Cluster Analysis
Clutsering refers to a very broad set of techniques for finding subgroups, or clusters, in a data set. When we cluster the observations, we partition the profiles into distinct groups so that the profiles are similar within the groups but different from other ...
Classification
Classification is often used to describe modeling of a categorical outcome. In binary classification the outcome is two possible values/classes and the goal is the predict the correct class using covariates. Classification rule: A mathematical function to p...
Sequencing Data and Analysis of Rare Variants
Genotyping arrays can be obtained at pre-selected sites for each sample. Ex. Genotyping sites known to be polymorphic based on prior sequencing. Sequencing is obtaining "every" base in the exome or genome for each sample. Most of the sequence is identical acr...
Poisson Regression
We use the Poisson Regression to model a risk ratio when we are interested not in whether something occurs but how many times it occurs; Either repeated events or events in a population.Ex. number of hospitalizations, number of infections, etc. This assumes i...
Interactions in Genetic Assocation Analysis
Statistical interaction occurs between two factors if their combined effect is different than what would be expected based on their individual separate effects. In genetics, differences in risk or mean phenotype between genotypes vary according to the exposure...
Sampling Strategy
We could determine if a intervention is meeting the target through a census, sampling plan ,or LQAS. LQAS LQAS is a primary classification tool Lots are classified as performing "acceptable, unacceptable, low or high" The goal is to shift resources...
Two-Way ANOVA
We've learned about One-Way Analysis of Variance (ANOVA) previously, it is a regression model for one continuous outcome and one categorical variable. It allows us to compare the means of the groups to detect significant differences. A one-way fixed-effects ...
Statistical Modeling
Statistical association analysis is not all about significance. There is much to consider when deciding covariates and choosing a model to represent the relationship. Regression Modeling If we have a small number of variables we can manual assess confoundi...
Missing Data
Missing data is common in epidemiology studies, and always observed in longitudinal studies. Inadequate handling of missing data may cause bias or lead to inefficient analyses. If an estimate is incomplete, we can remove it without introduce bias. If a variab...
Meta Analysis
GWAS is performed on millions of SNPs. Because of multiple testing, we use very stringent thresholds for statistical significance. This can greatly reduce power and may not be sufficient to detect associated SNPs. Combining information across studies will impr...
Introduction to Longitudinal and Clustered Data
Correlated data occurs in a variety of situations. The four basic types: Repeated measurements data Clustered data designs Spatially correlated data Multivariate data Repeated Measurements Longitudinal data is a response variable collected from the s...
Introduction to Clinical Trials
A clinical trial is defined as a prospective study comparing the effect or value of an intervention against a control in subjects. The core components of clinical trials are the population, intervention, control and outcome. The great majority of clinical tria...