Factorial ANOVA

The data for this example were presented in the associated PowerPoint slides. As usual, I would read the data from a .csv file, but am including the data here for instructional purposes.

The dependent variable shows errors in driving a Porsche 911 around a clased course while sober, after 2 beers, or legally intoxicated (factor A), and also whether rested or fatigued (factor B). Thus we have 2 factors: A (intoxication, 3 levels) and B (fatigue, 2 levels). Thus, there are 6 cells (treatment combinations or conditions), and these hypothetical data contain two drivers per cell. The design is fully between, i.e., an independent groups design.

errors <- c(2, 4, 16, 18, 18, 20, 0, 2, 2, 4, 16, 18)
alc <- c(1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3)
rest <- c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2)
driver <- data.frame(errors, alc, rest)
driver
##    errors alc rest
## 1       2   1    1
## 2       4   1    1
## 3      16   2    1
## 4      18   2    1
## 5      18   3    1
## 6      20   3    1
## 7       0   1    2
## 8       2   1    2
## 9       2   2    2
## 10      4   2    2
## 11     16   3    2
## 12     18   3    2

As you can see, there are 12 people (2 per cell) and each cell is unidquely indicated by a combination of ‘alc’ and ‘rest.’ Each person is represented by one row.

We use the ‘aov’ progam for the analysis of variance.

res2 <- aov(driver$errors~factor(driver$alc)+factor(driver$rest)+
              factor(driver$alc):factor(driver$rest))
summary(res2)
##                                        Df Sum Sq Mean Sq F value   Pr(>F)
## factor(driver$alc)                      2    512     256     128  1.2e-05
## factor(driver$rest)                     1    108     108      54 0.000325
## factor(driver$alc):factor(driver$rest)  2     96      48      24 0.001372
## Residuals                               6     12       2                 
##                                           
## factor(driver$alc)                     ***
## factor(driver$rest)                    ***
## factor(driver$alc):factor(driver$rest) ** 
## Residuals                                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We get the standard information about the analysis of variance, and it agrees with the PowerPoint slides. The analysis says that all 3 factors (main effect for alcohol, main effect for fatigue, and their interaction) are statistically significant at p < .05.

We can ask for a table of means, which will produce all of them: the grand mean, the means for the main effects, and the cell means.

model.tables(res2, "means")
## Tables of means
## Grand mean
##    
## 10 
## 
##  factor(driver$alc) 
## factor(driver$alc)
##  1  2  3 
##  2 10 18 
## 
##  factor(driver$rest) 
## factor(driver$rest)
##  1  2 
## 13  7 
## 
##  factor(driver$alc):factor(driver$rest) 
##                   factor(driver$rest)
## factor(driver$alc) 1  2 
##                  1  3  1
##                  2 17  3
##                  3 19 17

The output means agree with those in the PowerPoint slides.

We can also test for post hoc differences of interest. We don’t need to test for differences in fatige across alcohol because there are two, so if the main effect is significant, the levels must be diffferent. For alcohol, we can test for differences in levels.

posthoc <-TukeyHSD(x=res2, 'factor(driver$alc)', conf.level=0.95)
posthoc
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = driver$errors ~ factor(driver$alc) + factor(driver$rest) + factor(driver$alc):factor(driver$rest))
## 
## $`factor(driver$alc)`
##     diff       lwr      upr     p adj
## 2-1    8  4.931726 11.06827 0.0004986
## 3-1   16 12.931726 19.06827 0.0000089
## 3-2    8  4.931726 11.06827 0.0004986

All the levels of factor A are significantly different.

We can also test specific cell means for differences depending on our level of interest.

posthoc2 <- TukeyHSD(res2)
posthoc2
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = driver$errors ~ factor(driver$alc) + factor(driver$rest) + factor(driver$alc):factor(driver$rest))
## 
## $`factor(driver$alc)`
##     diff       lwr      upr     p adj
## 2-1    8  4.931726 11.06827 0.0004986
## 3-1   16 12.931726 19.06827 0.0000089
## 3-2    8  4.931726 11.06827 0.0004986
## 
## $`factor(driver$rest)`
##     diff       lwr       upr     p adj
## 2-1   -6 -7.997895 -4.002105 0.0003249
## 
## $`factor(driver$alc):factor(driver$rest)`
##                  diff        lwr        upr     p adj
## 2:1-1:1  1.400000e+01   8.371647  19.628353 0.0004904
## 3:1-1:1  1.600000e+01  10.371647  21.628353 0.0002304
## 1:2-1:1 -2.000000e+00  -7.628353   3.628353 0.7209343
## 2:2-1:1 -8.881784e-15  -5.628353   5.628353 1.0000000
## 3:2-1:1  1.400000e+01   8.371647  19.628353 0.0004904
## 3:1-2:1  2.000000e+00  -3.628353   7.628353 0.7209343
## 1:2-2:1 -1.600000e+01 -21.628353 -10.371647 0.0002304
## 2:2-2:1 -1.400000e+01 -19.628353  -8.371647 0.0004904
## 3:2-2:1  0.000000e+00  -5.628353   5.628353 1.0000000
## 1:2-3:1 -1.800000e+01 -23.628353 -12.371647 0.0001178
## 2:2-3:1 -1.600000e+01 -21.628353 -10.371647 0.0002304
## 3:2-3:1 -2.000000e+00  -7.628353   3.628353 0.7209343
## 2:2-1:2  2.000000e+00  -3.628353   7.628353 0.7209343
## 3:2-1:2  1.600000e+01  10.371647  21.628353 0.0002304
## 3:2-2:2  1.400000e+01   8.371647  19.628353 0.0004904

Here we have all the possible comparisons. The final set is the cell mean comparisons. The subscripts correspond to the levels of each of the variables. So the first comparison shows 2:1 - 1:1. If you look up at the means from the means table, you will see that the two means are 17 and 3, and the difference is thus 14, which is what appears in the ‘diff’ column. As 14 is large difference, it is not surprising that the result is significant by the ‘p adj’ column.