Skip to main content

Two-Way ANOVA

We've learned about One-Way Analysis of Variance (ANOVA) previously, it is a regression model for one continuous outcome and one categorical variable. It allows us to compare the means of the groups to detect significant differences.

A one-way fixed-effects ANOVA is an extension of the two-sample t-test. There are factors, categorical variables, and levels, individual groups of the factor which represent different populations. Balanced design contain the same number of individuals in each level. This is a special case of linear regression. 

Assumptions of one-way ANOVA:

  • The data are random samples from k independent populations
  • Within each population the dependent variable is normally distributed
  • The observations are independent
  • The population variance of the dependent variable is equal in all groups (homoscedasticity)

H0: The mean level is independent of the factor

Ex. The mean testosterone level is independent of smoking history. A conclusion might be "there is evidence that testosterone level is significantly different in at least two of the smoking history groups.

We test the hypothesis by decomposing the overall variance into "explained" and "residual" variances and comparing them.

image-1669936903897.png
Where N is the total sample size and k is the number of groups
SSB represents the variability due to the treatment effect. SSW represents the residual variability.

image-1669937073680.png
We compare the F statistic to the critical value of F with [(k - 1), (N - k)] degrees of freedom to test the null hypothesis of equal means.

Multiple Comparisons Procedures

To find which means are different from each other we should make all pairwise comparisons (k*(k-1)/2). This introduces the multiple tests issues and increases likeliness of type I and II errors. Multiple comparison procedures take into account the total number of comparisons being made by lowering the significance level for each test so that the overall significance level is still fixed at some level alpha.

image-1669937488081.png

All  of these adjustments are appropriate when comparing pairs of means. Tukey's is the most powerful and provides exact p-values when the group sizes are equal. Scheffe's procedure is more powerful than Bonferroni's procedure, in general.

We can also write ANOVA as a linear regression model where the parameter alpha_i are the group effects (difference of effect between group i and group 1)

Two Way ANOVA

Two way ANOVA is a regression model for one continuous outcome and two categorical variables. We analyze two-way ANOVA experiments using additive and interaction models.

Critical assumptions:

  • Observations at different factor levels are samples from normal distributions
  • The variances of the different populations are the same

An experiment may also contain replication, for example consider the following tables:
image-1669938856486.pngimage-1669939027651.png
Both are the same data where the two factors are drug dosage and mutation, and we have 3 replications.

We can create summaries of the means:
image-1669938936434.png
image-1669938962004.png

Possible Models

Interaction model: The effect of each factor changes as the other factor levels vary
image-1669938523361.png

Additive model: The effect of each factor doesn't change as the other factor levels vary
image-1669938550315.png

We test different models using ANOVA to look at the global effect of the factors

Sum of Squares

The traditional one-way ANOVA uses a decomposition of the sum of squares for analysis.
image-1669939467503.png

Each sum of squares uses a number of degrees of freedom, given by number of different levels - 1
image-1669939431751.png

image-1669939548755.png

SS(y) = model SS = SS(a) + SS(b) + SS(g)

These sum of squares are computed sequentially, so the order of terms in the model matters! If the design is balances the order does not matter.

Steps

  1. Goodness of fit/Global null hypothesis
    H0: All parameters = 0
    Ha: At least one differs from 0
  2. Test interaction
    • F test is MSI/MSE ~ F(r-1)(c-1), rc(m-1)
    • The P-value is p(F(r-1)(c-1), rc(m-1) > F | H0)
    • If the p-value is smaller than our alpha then we reject the null
    • If we do not reject the null, there is no evidence that all interaction terms are not 0. Fit an additive model and repeat
  3. Test main effects
    • If we accept the null hypothesis of no interaction, it makes sense to test the significance of the main effects, to do this we pool the RSS with SS(y)
    • SS(a) and SS(b) are Type III sum of sqaures

Once a model is accepted one can do mean comparisions to identify the factor levels with different effects (Tukey's).

The balance design means the the design is orthogonal and so there is no need to recompute sum of squares or estimates for different models. If not, analyze the data as a traditional linear model.

Problems

When the design is orthogonal/balanced the decomposition SS(a) + SS(b) + SS(g) is unique, when it is not the decomposition depends on the order.
image-1669941015789.png

R Code

####### Two-way ANOVA
##### drug dataset
drug <- read.csv("anova.csv", header=T)
drug

hbf <- c(t(drug[,3:5]))
SNP <- c(rep("No",12), rep("Yes", 12))
Drug <- c(rep(0,3), rep(10,3), rep(20,3), rep(30,3),
               rep(0,3), rep(10,3), rep(20,3), rep(30,3))
data.drug <- data.frame(hbf,SNP, Drug)
head(data.drug)

### summaries
overall.mean <- mean(hbf); overall.mean
drug.means <- tapply(hbf, Drug, mean); drug.means
snps.means <- tapply(hbf, SNP, mean); snps.means
cell.means <- tapply(hbf, interaction(Drug,SNP), mean); cell.means

dim(cell.means) <- c(4,2); cell.means
cell.means <- data.frame(cell.means)
row.names(cell.means) <- levels(factor(Drug))
names(cell.means) <- levels(factor(SNP))
cell.means 

## Visualization
#install.packages("gplots")
library(gplots)
plotmeans(hbf~Drug, data=data.drug,xlab="Drug", 
		ylab="HbF",  main="Mean Plot\n with 95% CI")
plotmeans(hbf~SNP, data=data.drug,xlab="SNP", 
		ylab="HbF",  main="Mean Plot\n with 95% CI")
plotmeans(hbf~interaction(Drug,SNP), data=data.drug,
		xlab="SNP",  connect=list(1:4,5:8),ylab="HbF", 
          	main="Interaction Plot\nwith 95% CI")

interaction.plot(factor(Drug), factor(SNP), hbf, type="b",  
		xlab="Drug", ylab="hbf", main="Interaction Plot")


### two-way anova with balanced design
mod <- aov(hbf~as.factor(Drug)*SNP, data=data.drug)
summary(mod)
table(mod$fitted.values)

# TukeyHSD(mod)
mod <- lm(hbf~as.factor(Drug)*SNP, data=data.drug)
summary(mod)
anova(mod)
table(mod$fitted.values)

##### Exercise with balance design

mod.a <- aov(hbf~as.factor(Drug)*SNP, data=data.drug)
summary(mod.a)		# anova(mod.a)
mod.lma <- lm(hbf~as.factor(Drug)*SNP, data=data.drug)
anova(mod.lma)

mod.b <- aov(hbf~SNP*as.factor(Drug), data=data.drug)
summary(mod.b)		# anova(mod.b)
mod.lmb <- lm(hbf~SNP*as.factor(Drug), data=data.drug)
anova(mod.lmb)


##### Exercise with unbalance design
data.drug.1 <- read.csv("data.drug.1.csv", header=T)
mod.1a <- aov(hbf~as.factor(Drug)*SNP, data=data.drug.1)
summary(mod.1a)
mod.1lma <- lm(hbf~as.factor(Drug)*SNP, data=data.drug.1)
anova(mod.1lma)


mod.1b <- aov(hbf~SNP*as.factor(Drug), data=data.drug.1)
summary(mod.1b)
mod.1lmb <- lm(hbf~SNP*as.factor(Drug), data=data.drug.1)
anova(mod.1lmb)