Multiple Categorical IVs

Objectives

How do you incorporate multiple categorical IVs in a regression equation?

What is an interaction? Give a concrete example (names of IVs & DV, context) where you would expect to see an interaction.

Why might you prefer regression to ANOVA for analyzing multiple IV experiments with unbalanced cell sizes?

 

Materials

 

If we can have one nominal or categorical independent variable, surely we can have two or more. For the most part, the multiple categorical IV case is a straightforward extension of the single categorical IV case. The exception is the interaction term(s). With multiple IVs, we can have main effects for each IV plus 2 way and higher way interactions. Interactions indicate that the effects of one variable depend upon the levels of (one or more) other variables. (If you are unfamiliar with interactions, read all of the sections in chapter 12 by Pedhazur concerning interactions. Otherwise read the sections from nonorthogonal designs, p. 481, to the end of the chapter.)

 

Nonorthogonal Designs

One nice thing about regression for analyzing ANOVA designs is that it helps to cope with the problem of unequal cell frequencies (actually, unproportional cell frequencies). If we lose people randomly from a study, we can still analyze the data. If we lose people systematically, our estimates will be biased. Always look for the reasons for the loss of people from the study before analyzing the results.

With regression estimates (least squares), we can still achieve unbiased tests of the same hypotheses as we would have given equal (proportional) cell frequencies in ANVOA. To be unbiased tests of the unweighted means in the population (i.e., m 1 = m 2), the tests must be based on the Type III (regression, last in) sums of squares with all appropriate terms included in the model. Suppose you have two IVs.

 Then your potential models are:

1. Yijk = m + a i + b j + (a b )ij + e ijk

2. Yijk = m + a i + b j + e ijk

3. Yijk = m + a i + e ijk

4. Yijk = m + b j + e ijk

5. Yijk = m + e ijk

 

Suggested sequence of steps:

 

  1. Test model 1 with all effects. (I use PROC GLM. You may use this, or any regression program and your own coding. Be aware of multiple vectors for testing interactions.) Interpret significance of interaction. If interaction is significant, interpret all effects.
  2. If interaction is not significant, estimate model 2. Interpret effects.
  3. Do not estimate models 3, 4, and 5.