One-way Analysis of Variance

This program computes a one-way ANOVA in which the independent variable is amount of caffeine consumed (3 levels) and the dependent variable is test score. These hypothetical data were presented in the associated PowerPoint slideshow.

#data.file <- file.choose()             # find the file you want to read
#setwd(dirname(data.file))              # change the working directory to match file
#initial.data <- read.csv(data.file)     # read the file 
#main.data <- initial.data               # rename
#main.data                               # print the data
#str(main.data)                          # see the structure of the data file
#main.data$Group <- as.factor(main.data$Group) # R will read the numbers as integers rather than a factor, so you need to change it for the progam that follows
#attach(main.data)
#str(main.data)

I commented out all these so that they would be visible but not a problem for the creation of the file that you are examining. If you read in the data, you will get the same numbers as the ones that I input below.

Testscore <- c(75, 77, 79, 81, 83, 80, 82, 84, 86, 88, 70, 72, 74, 76, 78)
Group <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3)
main.data <- data.frame(Testscore, Group)
main.data$Group <- as.factor(main.data$Group) # need to have this a factor for analysis
str(main.data)
## 'data.frame':    15 obs. of  2 variables:
##  $ Testscore: num  75 77 79 81 83 80 82 84 86 88 ...
##  $ Group    : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 2 2 2 2 2 ...

I typically read data from Excel using the .csv file format (comma separated values). I’m going to list that code but also input inline so you can run the program whether or not you can assess the .csv data file (Caffeine.csv).

Now for the data analysis.

attach(main.data)
## The following objects are masked _by_ .GlobalEnv:
## 
##     Group, Testscore
caff.res <- lm(Testscore ~ factor(Group))

The command creates an object called caff.res, which stands for the results of the caffeine analysis. The lm stands for linear model, Testscore is the dependent variable, and Group is the independent variable. If I don’t say ‘factor’ in front of group, R will treat this as a continuous variable with 1 degree of freedom, so we need the ‘factor’ if we want to do ANOVA instead of regression. The program just creates the object caff.res. If we want to see it, we have to do something. We can ask for a summary, as follows:

summary(caff.res)
## 
## Call:
## lm(formula = Testscore ~ factor(Group))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##     -4     -2      0      2      4 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      79.000      1.414   55.86 7.14e-16 ***
## factor(Group)2    5.000      2.000    2.50   0.0279 *  
## factor(Group)3   -5.000      2.000   -2.50   0.0279 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.162 on 12 degrees of freedom
## Multiple R-squared:  0.6757, Adjusted R-squared:  0.6216 
## F-statistic:  12.5 on 2 and 12 DF,  p-value: 0.001164

This gives us quite a bit of information. Note how the first group is taken as a reference group in the analysis. The intercept has a value of 79 (see the column labeled ‘Estimate’), which is the mean of the first group. The second group is 5 larger than 79 (i.e., 84), and the third group is 5 less than 79 (i.e., 74).

We can also ask to see the typical ANOVA summary table:

anova(caff.res)
## Analysis of Variance Table
## 
## Response: Testscore
##               Df Sum Sq Mean Sq F value   Pr(>F)   
## factor(Group)  2    250     125    12.5 0.001164 **
## Residuals     12    120      10                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results in the output agree with what we computed in the PowerPoint slideshow, except that the computer finds the actual p value rather than showing us the critical value.

We can also ask R for the critical value, remembering that our degrees of freedom are 2 and 12.

qf(.95, df1=2, df2=12)
## [1] 3.885294
caff.res$fitted.values
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 
## 79 79 79 79 79 84 84 84 84 84 74 74 74 74 74

As in the slideshow, the critical value of F (at alpha=.05) is 3.89.

Note that fitted values are predicted values. In the case of ANOVA, each person in a cell (treatment) has the same fitted value, which is equal to the mean of the observations in that cell. When we get to regression, each person with a different value of the independent variable will have a different fitted or predicted value.