## Translating SPSS to R: Mixed Repeated-Measures ANOVA

As usual, it’s been far too long since I’ve posted, but the fall semester is coming and I’ve been ramping back up on both SPSS and R lately and I’d like to get in a couple more posts to finish off this series. Thus, the return.

This post will cover a simple mixed repeated-measures ANOVA. That is, an ANOVA with both within-subjects and between-subjects factors. I’ll continue to use the Elashoff data set that I used in the last post; the data are in the file elashoff.xls. The data file is just as described in that last post, with 11 variables: subject number, the group number (the between-subjects variables) an then 9 measurements that make up the 3 x 3 within-subjects part of the design.

In the within-subjects part of the design, there were main effects of drug type and dosage, but no interactions. Adding the between-subjects effect actually generates an interaction, which looks like this:

## Basic Mixed ANOVA

### Basic ANOVA in SPSS

Again, SPSS makes the easy things easy, and the code to get SPSS to do this is straightforward:

GLM T1D1 to T3D3 BY Group

/WSFACTOR Type 3 Dose 3

/PRINT ETASQ.

This is the same as the entirely within-subjects code, but with the addition of “BY Group” which tells SPSS to include a between-subjects variable. Simple, no?

SPSS produces the same printout that shows the pairings between the individual variables and the levels in the repeated-measures design so I won’t repeat that here. In addition, since there is now a between-subjects effect, it provides a listing of the number of subjects in each between-subjects group, like this:

The next thing it produces is the sphericity output, which is now different because of the between-subjects factor. That looks like this:

And here’s our first really, really serious problem. If you have a between-subjects factor, SPSS’s computation of the Hunh-Feldt epsilon is WRONG. Yep, it’s just plain incorrect. SPSS has known about this bug for decades, but hasn’t fixed it yet. Hard to believe, but true. It’s not way off, but it’s a little bit off. R gets it right, below, so look there to see the correct number. How they get away with this year after year, version after version, is simply beyond me.

Next is the big ANOVA table with all the within-subjects factors. It’s big and ugly, but it’s all there:

Same basic effects for all the purely within-subjects parts, but now we pick up the interaction of “group” and drug type, which was evident in the graph.

Also notice that this time I requested effect size estimates. SPSS only gives partial eta squared, which is probably the worst of them all, but it’s (very slightly) better than nothing. Finally, SPSS provides the tests of between-subjects effects, which in this case isn’t very interesting, but it is necessary for completeness. Here’s that result:

And there you go. SPSS produces a lot of other output, including multivariate tests and all kinds of default contrasts, but that’s not new here.

### Basic ANOVA in R, Wide Format

This is, in fact, also a fairly simple modification of the strictly within-subjects code from last time as well. There are two main differences. First, of course, is that R has to be told explicitly that something is a factor. Second, R needs to have a default set so that it computes sums of squares properly for the mixed design. That’s the “options” statement in the code right after the “car” library is loaded.

The code looks like this (assuming as usual that the data are attached so variables are local):

## Create labelled factors for w-s factors

Type = factor(c("T1", "T1", "T1", "T2", "T2", "T2", "T3", "T3", "T3"))

Dose = factor(c("D1", "D2", "D3", "D1", "D2", "D3", "D1", "D2", "D3"))

```
```## shorter, but equivalent, way to do this

Type = factor(c(rep("T1", 3), rep("T2", 3), rep("T3",3)))

Dose = factor(rep(c("D1", "D2", "D3"), 3))

## load "car" library and set options

library(car)

options(contrasts = c("contr.sum","contr.poly"))

## Add the between-subects factor

GroupF = factor(Group, labels=c("No Sleep", "1hr Sleep"))

`## run the ANOVA and print a summary`

Anv3 = lm(cbind(T1D1, T1D2, T1D3, T2D1, T2D2, T2D3, T3D1, T3D2, T3D3)~GroupF)

Anv4 = Anova(Anv3, idata=data.frame(Type, Dose), idesign=~Type*Dose, type=3)

summary(Anv4, multivariate=F)

And again, even for this relatively simple three-way design, there’s a fair amount of output:

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity SS num Df Error SS den Df F Pr(>F) (Intercept) 79900 1 630.11 14 1775.2523 3.774e-16 *** GroupF 187 1 630.11 14 4.1499 0.0610036 . Type 426 2 599.64 28 9.9444 0.0005457 *** GroupF:Type 410 2 599.64 28 9.5669 0.0006816 *** Dose 878 2 549.89 28 22.3635 1.572e-06 *** GroupF:Dose 102 2 549.89 28 2.6068 0.0915784 . Type:Dose 46 4 554.11 56 1.1556 0.3402490 GroupF:Type:Dose 27 4 554.11 56 0.6776 0.6103418 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Mauchly Tests for Sphericity Test statistic p-value Type 0.25246 0.000130 GroupF:Type 0.25246 0.000130 Dose 0.75866 0.166077 GroupF:Dose 0.75866 0.166077 Type:Dose 0.17106 0.009729 GroupF:Type:Dose 0.17106 0.009729 Greenhouse-Geisser and Huynh-Feldt Corrections for Departure from Sphericity GG eps Pr(>F[GG]) Type 0.57223 0.004847 ** GroupF:Type 0.57223 0.005548 ** Dose 0.80558 1.232e-05 *** GroupF:Dose 0.80558 0.105049 Type:Dose 0.52518 0.330823 GroupF:Type:Dose 0.52518 0.522531 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 HF eps Pr(>F[HF]) Type 0.5898996 4.424443e-03 GroupF:Type 0.5898996 5.082647e-03 Dose 0.8946538 4.788591e-06 GroupF:Dose 0.8946538 9.867570e-02 Type:Dose 0.6200127 3.347825e-01 GroupF:Type:Dose 0.6200127 5.448314e-01

And yes, these are the correct HF epsilons. What SPSS is doing here I don’t know. So, other than that, you get the same thing out of R that you get out of SPSS, if in a somewhat less appealing format and with more complex code.

### Basic ANOVA in R, Long Format

Again, this requires conversion of the wide-format data into long format. This is accomplished with the same code as in the entirely within-subjects example, and simply requires the addition of the options set and coding of “group” into a factor as well. It looks like this:

## melt into long form

library(reshape2)

stim.lng = melt(stim, id=c("Snum", "Group"), variable.name = "Cond", value.name="Alrtns")

## code separate factors for Type and Dose

stim.lng$TypeF = factor(substr(stim.lng$Cond, 1, 2))

stim.lng$DoseF = factor(substr(stim.lng$Cond, 3, 4))

## Make Snum into a factor

stim.lng$Snum = factor(stim.lng$Snum)

## make Group into a factor

stim.lng$GroupF = factor(stim.lng$Group, labels=c(“No Sleep”, “1hr Sleep”))

## load library and set options

library(ez)

options(contrasts = c(“contr.sum”,”contr.poly”))

## run the ANOVA

ezANOVA(stim.lng, dv=Alrtns, wid=Snum, within=.(TypeF, DoseF), between=GroupF, detailed=T)

The ezANOVA package is nice, to be sure, looking almost like SPSS. Here’s the output:

$ANOVA Effect DFn DFd SSn SSd F p p<.05 ges 1 (Intercept) 1 14 79900.44444 630.1111 1775.2523364 3.774108e-16 * 0.97162069 2 GroupF 1 14 186.77778 630.1111 4.1498854 6.100363e-02 0.07410265 3 TypeF 2 28 425.93056 599.6389 9.9443647 5.456901e-04 * 0.15434053 5 DoseF 2 28 878.38889 549.8889 22.3635078 1.572069e-06 * 0.27345919 4 GroupF:TypeF 2 28 409.76389 599.6389 9.5669153 6.816385e-04 * 0.14935732 6 GroupF:DoseF 2 28 102.38889 549.8889 2.6067893 9.157840e-02 0.04202917 7 TypeF:DoseF 4 56 45.73611 554.1111 1.1555544 3.402490e-01 0.01922100 8 GroupF:TypeF:DoseF 4 56 26.81944 554.1111 0.6776118 6.103418e-01 0.01136143 $`Mauchly's Test for Sphericity` Effect W p p<.05 3 TypeF 0.2524563 0.0001300799 * 4 GroupF:TypeF 0.2524563 0.0001300799 * 5 DoseF 0.7586605 0.1660766393 6 GroupF:DoseF 0.7586605 0.1660766393 7 TypeF:DoseF 0.1710604 0.0097290347 * 8 GroupF:TypeF:DoseF 0.1710604 0.0097290347 * $`Sphericity Corrections` Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05 3 TypeF 0.5722318 4.846844e-03 * 0.5898996 4.424443e-03 * 4 GroupF:TypeF 0.5722318 5.547608e-03 * 0.5898996 5.082647e-03 * 5 DoseF 0.8055814 1.231551e-05 * 0.8946538 4.788591e-06 * 6 GroupF:DoseF 0.8055814 1.050485e-01 0.8946538 9.867570e-02 7 TypeF:DoseF 0.5251803 3.308227e-01 0.6200127 3.347825e-01 8 GroupF:TypeF:DoseF 0.5251803 5.225310e-01 0.6200127 5.448314e-01

I also like this better than the car::ANOVA raw output. The problem with long form is contrasts, as was true with strictly repeated-measures designs.

## Simple Main Effects

I once again have no desire to get into arguments about terminology for or appropriateness of such analyses. People do them and will continue to do them and I’m just going to leave it at that. The good news is that this time, there actually is a potentially meaningful interaction in the data, and it crosses a between-subjects factor with a within-subjects factor, so there is some possible work to be done in generating the code to handle it.

From visual inspection, what appears to be happening is no effect of drug type in the “no sleep” group with an effect apparent in the “1hr sleep” group. So we’ll do two repeated-measures ANOVAs, one for each group, and looking for the effect of drug type in both cases.

### Simple Main Effects in SPSS

This uses the same SORT/SPLIT idiom that we saw with between-subjects ANOVA. The code looks like this:

SORT CASES BY Group.

SPLIT FILE BY Group.

GLM T1D1 to T3D3

/WSFACTOR Type 3 Dose 3.

SPLIT FILE OFF.

And here are the relevant parts of the output. First, sphericity stuff:

And then the actual ANOVA table. Yes, this is getting pretty long:

This shows what we’d expect here, an effect of drug type for the 1hr Sleep group and no effect for the No Sleep group. (Apply all the usual horrible caveats that apply to this type of inference logic.)

### Simple Main Effects in R, Wide Format

Just as we used the same idiom we used for between-subjects in SPSS, we will use the same idiom that we used in between-subjects ANOVA in R, which is a for() loop. I know people like to try to vectorize everything in R, but if you ever have enough groups that this really matters for performance reasons, there are other issues. So, the R code looks like this:

for (Grp in levels(stim$GroupF)) {

cat("\n------ Group:", Grp, "-----\n")

SubFrame = stim[GroupF == Grp,]

Anv1 = lm(cbind(T1D1, T1D2, T1D3, T2D1, T2D2, T2D3, T3D1, T3D2, T3D3)~1, data=SubFrame)

Anv2 = Anova(Anv1, idata=data.frame(Type, Dose), idesign=~Type*Dose, data=SubFrame, type=3)

print(summary(Anv2, multivariate=F))

}

That is, iterate through the groups, select the subset of data for that group, then run the repeated-measures ANOVA for each group. It’s not super-clean and there’s probably some better way to do this with some kind of by() statement, but this works well enough for me. Here’s the output:

------ Group: No Sleep ----- Univariate Type III Repeated-Measures ANOVA Assuming Sphericity SS num Df Error SS den Df F Pr(>F) (Intercept) 36180 1 127.06 7 1993.3288 7.388e-10 *** Type 0 2 331.53 14 0.0053 0.9947373 Dose 784 2 345.03 14 15.9077 0.0002488 *** Type:Dose 19 4 364.89 28 0.3581 0.8361991 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Mauchly Tests for Sphericity Test statistic p-value Type 0.22647 0.011616 Dose 0.51826 0.139202 Type:Dose 0.01185 0.006061 Greenhouse-Geisser and Huynh-Feldt Corrections for Departure from Sphericity GG eps Pr(>F[GG]) Type 0.56385 0.95905 Dose 0.67488 0.00179 ** Type:Dose 0.37653 0.64898 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 HF eps Pr(>F[HF]) Type 0.5978545 0.9652534118 Dose 0.7785622 0.0009500507 Type:Dose 0.4572805 0.6875468299 ------ Group: 1hr Sleep ----- Univariate Type III Repeated-Measures ANOVA Assuming Sphericity SS num Df Error SS den Df F Pr(>F) (Intercept) 43907 1 503.06 7 610.9605 4.522e-08 *** Type 835 2 268.11 14 21.8123 4.996e-05 *** Dose 197 2 204.86 14 6.7209 0.008995 ** Type:Dose 54 4 189.22 28 1.9935 0.122878 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Mauchly Tests for Sphericity Test statistic p-value Type 0.14145 0.00283 Dose 0.86525 0.64777 Type:Dose 0.38241 0.82632 Greenhouse-Geisser and Huynh-Feldt Corrections for Departure from Sphericity GG eps Pr(>F[GG]) Type 0.53805 0.001702 ** Dose 0.88125 0.012432 * Type:Dose 0.77327 0.143502 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 HF eps Pr(>F[HF]) Type 0.5578137 0.001460198 Dose 1.1551317 0.008994918 Type:Dose 1.4554011 0.122878219

And again, it all agrees with the SPSS, which is what we’re looking for.

### Simple Main Effects in R, Long Format

Again, same idea as between-subjects, using a for() loop, assuming the data now in long format in the “stim.lng” data frame. Code:

for (Grp in levels(stim.lng$GroupF)) {

cat("\n------ Group:", Grp, "-----\n")

print(ezANOVA(stim.lng[stim.lng$GroupF==Grp,], dv=Alrtns, wid=Snum, within=.(TypeF, DoseF), detailed=T))

}

And output:

------ Group: No Sleep ----- Warning: You have removed one or more Ss from the analysis. Refactoring "Snum" for ANOVA. $ANOVA Effect DFn DFd SSn SSd F p p<.05 ges 1 (Intercept) 1 7 36180.50000 127.0556 1.993329e+03 7.387751e-10 * 0.9687140218 2 TypeF 2 14 0.25000 331.5278 5.278592e-03 9.947373e-01 0.0002139037 3 DoseF 2 14 784.08333 345.0278 1.590766e+01 2.487835e-04 * 0.4015620332 4 TypeF:DoseF 4 28 18.66667 364.8889 3.580999e-01 8.361991e-01 0.0157237119 $`Mauchly's Test for Sphericity` Effect W p p<.05 2 TypeF 0.2264721 0.011615672 * 3 DoseF 0.5182605 0.139201627 4 TypeF:DoseF 0.0118548 0.006060616 * $`Sphericity Corrections` Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05 2 TypeF 0.5638479 0.959053605 0.5978545 0.9652534118 3 DoseF 0.6748825 0.001789689 * 0.7785622 0.0009500507 * 4 TypeF:DoseF 0.3765304 0.648979117 0.4572805 0.6875468299 ------ Group: 1hr Sleep ----- Warning: You have removed one or more Ss from the analysis. Refactoring "Snum" for ANOVA. $ANOVA Effect DFn DFd SSn SSd F p p<.05 ges 1 (Intercept) 1 7 43906.72222 503.0556 610.960464 4.522047e-08 * 0.97414690 2 TypeF 2 14 835.44444 268.1111 21.812267 4.996251e-05 * 0.41757723 3 DoseF 2 14 196.69444 204.8611 6.720949 8.994918e-03 * 0.14442178 4 TypeF:DoseF 4 28 53.88889 189.2222 1.993541 1.228782e-01 0.04420242 $`Mauchly's Test for Sphericity` Effect W p p<.05 2 TypeF 0.1414489 0.00283008 * 3 DoseF 0.8652485 0.64777264 4 TypeF:DoseF 0.3824079 0.82631913 $`Sphericity Corrections` Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05 2 TypeF 0.5380535 0.001701661 * 0.5578137 0.001460198 * 3 DoseF 0.8812502 0.012431675 * 1.1551317 0.008994918 * 4 TypeF:DoseF 0.7732682 0.143501506 1.4554011 0.122878219

Yes, there are warning messages, but they’re not important here. Again, agrees with the previous two results, so that’s good. (Note that it is possible in R to change the number of decimals output, but then it messes up other procedures that make assumptions. This is just pure stupid. R people seem to not understand basic UI principles like “the default settings are the UI 90% of the time.”) Anyway, this works as well, with somewhat more compact if over-decimaled output.

## Interaction Contrasts

As noted in previous posts, I think this is a dramatically better way of handling the interpretation of interactions. In this particular case, the relevant interaction contrast is pretty clear, since there are only two levels of Group, meaning there’s only one contrast there, (1 -1). The contrast on drug type is (1 -2 1), since it appears that the second type is the one that’s different in the 1hr Sleep condition. So we’ll build that in both packages.

### Interaciton Contrats in SPSS

As far as I can tell, the best way to handle interactions that involve both within- and between-subjects factors is to compute new variables on the between-subjects contrasts, and then run those through the appropriate between-subjects ANOVA with the appropriate contrasts there. That’s trivial in this case because there are only two levels of the between-subjects variable. The code is thus pretty easy:

COMPUTE T1 = MEAN(t1d1, t1d2, t1d3).

COMPUTE T2 = MEAN(t2d1, t2d2, t2d3).

COMPUTE T3 = MEAN(t3d1, t3d2, t3d3).

COMPUTE T2vs13 = T1 - 2*T2 +T3.

EXECUTE.

ONEWAY T2vs13 BY Group.

And the output is straightforward:

We reject the null here and conclude the contrast characterizes the interaction successfully.

### Interaction Contrasts in R, Wide Format

It turns out that in wide format, the way to do this in R looks a lot like SPSS. The main difference is that there are ways to shortcut this in R with matrix operations. I’ll show the most SPSS-like way of doing it and the matrix version to better satsify the R purists (and myself, as this is much cleaner). Here’s the SPSS-like code:

T1 = (T1D1 + T1D2 + T1D3)/3

T2 = (T2D1 + T2D2 + T2D3)/3

T3 = (T3D1 + T3D2 + T3D3)/3

T2vs13 = T1 - 2*T2 +T3

summary(aov(T2vs13 ~ GroupF))

And here’s the vector-multiplicaiton way:

LVec = rep(c(1, -2, 1), 3)

T2vs13 = cbind(T1D1, T2D1, T3D1, T1D2, T2D2, T3D2, T1D3, T2D3, T3D3) %*% LVec

summary(aov(T2vs13 ~ GroupF))

They both produce the same output:

Df Sum Sq Mean Sq F value Pr(>F) GroupF 1 7353 7353 10.35 0.0062 ** Residuals 14 9943 710 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

And, hooray, it matches SPSS. Excellent.

### Interaction Contrasts in R, Long Format

This simply does not work in R, as far as I can tell. This is exactly the same problem that I run into with the pure repeated-measures version, and I still have no idea what the problem is or why it occurs. The most frustrating part is that R doesn’t produce any error messages or anything, it just fails to put things in the output. It’s bizarre, annoying, and makes me mistrust R, which is not where you want to be with your statistics package.

## Translating SPSS to R: Factorial Repeated Measures

As I noted in the last blog post, R is not terribly well-situated for repeated measures. This becomes increasingly true as the designs become more complicated. This post will cover a simple two-way repeated-measures ANOVA, so it won’t be all that bad, but even here there will start to be a few complications in places.

The data, which can be found in wide format in elashoff.xls, is a “famous” data set that has appeared in numerous textbooks from Elashoff (1981) “Data for the panel session in software for repeated measures analysis of variance.” It is actually a mixed design (there’s a between-subjects variable), but I’m going to ignore the between-subjects for now in the name of pedagogy. The within-subjects part of the study is a 3 x 3 design with three different types of drugs given at three different dosages. My vague recollection is that the context is that these are stimulants and the dependent measure is some measure of alertness.

The data file has 11 variables in it: the subject number, the group number (for the between-subjects group, which we’re going to ignore for now), and then the 9 observations on each subject. These are noted “TxDx” for “type x” and “dose x” respectively, so “T2D3” is drug type 2 at dosage level 3.

The results look like this:

(No, I didn’t use either SPSS or R to make the graph. I hate both of them for graphing for totally different reasons. That graph was made with DeltaGraph.)

**Basic Factorial ANOVA**

**Basic ANOVA in SPSS**

SPSS has the virtue of making easy things easy. (This is accompanied by sometimes making things of moderate difficulty hard to impossible and hard things unthinkable, but I digress.) The SPSS code to do a simple two-way repeated-measures ANOVA on these data is simply this:

`GLM T1D1 TO T3D3`

/WSFACTOR Type 3, Dose 3.

There are two features of import here. First, SPSS offers the convenience of not listing out all the variables used, allowing the use of “TO” to indicate a list from one variable to another. Second, the WSFACTOR subcommand that creates the within-subjects independent variables. These names are order-dependent, and listing them in the wrong order will cause SPSS to analyze the data incorrectly, but without issuing any kind of error message. The good news is SPSS provides some output to give the user a fighting chance to catch if this was done wrong:

This table shows how each level of each independent variable is mapped to each variable in the data file. Very handy. Next in the output is a bunch of multivariate tests, which I always ignore, and then the sphericity diagnostics, as per the one-way output, but of course now there are more of them:

The data are pretty non-spherical, particularly things on the “Type” factor. Looks like GG adjustment is appropriate on Type and the interaction, and HF on the Dose effect. Those effects are all shown in the next (fairly large) table:

Looks like both main effects are reliable and the interaction is not. Pretty unsurprising given the graph. As noted last time, SPSS also produces output for contrasts, defaulting to assuming that all factors are amenable to trend analysis. For pedagogical reasons, I rather wish SPSS wouldn’t do this, as students frequently look at this and try to report something from it even when it’s completely irrelevant. (In this particular case it might be relevant for “Dose” but is clearly meaningless for “Type” since that’s obviously categorical.) Anyway, here’s what that looks like:

Whatever. Not my favorite, but not too damaging. Now, let’s see this in R, in both wide and long format.

**Basic ANOVA in R, Wide Format**

Not surprisingly, this is more complicated, though it’s not really all *that* bad for a two-way design. The basic issue in generating the code is that that SPSS generates the factorial design for you, and with R you kind of have to build that yourself (though I suspect there’s an easier way to do this, I haven’t found it yet). Here’s the code:

`## Create a model of the within-subjects factors`

Anv1 = lm(cbind(T1D1, T1D2, T1D3, T2D1, T2D2, T2D3, T3D1, T3D2, T3D3)~1)

## Create labelled factors for w-s factors

Type = factor(c(“T1”, “T1”, “T1”, “T2”, “T2”, “T2”, “T3”, “T3”, “T3”))

Drug = factor(c(“D1”, “D2”, “D3”, “D1”, “D2”, “D3”, “D1”, “D2”, “D3”))

## shorter, but equivalent, way to do this

Type = factor(c(rep(“T1”, 3), rep(“T2”, 3), rep(“T3”,3)))

Dose = factor(rep(c(“D1”, “D2”, “D3”), 3))

## do the ANOVA. “idata” and “idesign” tell Anova() how to

## structure within-subjects factors

library(car)

Anv2 = Anova(Anv1, idata=data.frame(Type, Dose), idesign=~Type*Dose, type=3)

summary(Anv2, multivariate=F)

This produces output that isn’t quite as clean as SPSS, but does match it in all the important ways:

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity SS num Df Error SS den Df F Pr(>F) (Intercept) 79900 1 816.89 15 1467.1600 2.226e-16 *** Type 426 2 1009.40 30 6.3294 0.00509 ** Dose 878 2 652.28 30 20.1997 2.775e-06 *** Type:Dose 46 4 580.93 60 1.1809 0.32836 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Mauchly Tests for Sphericity Test statistic p-value Type 0.17795 0.000006 Dose 0.69506 0.078368 Type:Dose 0.19903 0.010585 Greenhouse-Geisser and Huynh-Feldt Corrections for Departure from Sphericity GG eps Pr(>F[GG]) Type 0.54883 0.02041 * Dose 0.76632 2.888e-05 *** Type:Dose 0.54621 0.32290 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 HF eps Pr(>F[HF]) Type 0.5597129 1.973144e-02 Dose 0.8361743 1.430901e-05 Type:Dose 0.6429432 3.258431e-01

The thing I really miss is the diagnostic about levels—if you mess this up, you get the wrong output and no way of knowing this happened. However, not a surprise that SPSS is more user-friendly than R on pretty much any score (except price, of course, and speed).

**Basic ANOVA in R, Long Format**

Step 1 is to turn the wide-format data into long-format data. This is not terribly hard in general, but there are some complications. In particular, for multi-factor one-way designs, the variable that holds the condition holds the full condition name, and that has to be broken down into separate variables. I usually do this by just extracting substrings, but this only works if you’re disciplined in your naming conventions. I am, but not everyone is, so when I get data from other people, it’s almost always extra work, but it’s not generally too bad. Anyway, here’s the R code:

`## melt into long form`

library(reshape2)

stim.lng = melt(stim, id=c("Snum", "Group"), variable.name = "Cond", value.name="Alrtns")

`## code separate factors for Type and Dose`

stim.lng$TypeF = factor(substr(stim.lng$Cond, 1, 2))

stim.lng$DoseF = factor(substr(stim.lng$Cond, 3, 4))

## Make Snum into a factor

stim.lng$Snum = factor(stim.lng$Snum)

## run ANOVA

library(ez)

ezANOVA(stim.lng, dv = Alrtns, wid = Snum, within = .(TypeF, DoseF), detailed=T)

And here’s the output:

$ANOVA Effect DFn DFd SSn SSd F p p>.05 ges 1 (Intercept) 1 15 79900.44444 816.8889 1467.159956 2.226286e-16 * 0.96312076 2 TypeF 2 30 425.93056 1009.4028 6.329444 5.089509e-03 * 0.12220314 3 DoseF 2 30 878.38889 652.2778 20.199727 2.775105e-06 * 0.22306086 4 TypeF:DoseF 4 60 45.73611 580.9306 1.180936 3.283603e-01 0.01472871 $`Mauchly's Test for Sphericity` Effect W p p>.05 2 TypeF 0.1779497 5.650424e-06 * 3 DoseF 0.6950563 7.836821e-02 4 TypeF:DoseF 0.1990287 1.058467e-02 * $`Sphericity Corrections` Effect GGe p[GG] p[GG]>.05 HFe p[HF] p[HF]>.05 2 TypeF 0.5488323 2.040935e-02 * 0.5597129 1.973144e-02 * 3 DoseF 0.7663166 2.887647e-05 * 0.8361743 1.430901e-05 * 4 TypeF:DoseF 0.5462132 3.229028e-01 0.6429432 3.258431e-01

Because in long form there are no order issues, the problem of mis-ordering the independent variables doesn’t come up, so no worries on that front. I still wish it did the d.f. adjustment for sphericity, but I’ll live.

**Simple Main Effects**

First, yes, I understand that’s not the terminology of choice for everyone. I discussed that in my post on factorial between-subjects ANOVA. I’m sticking with it. I also get that this isn’t always a great way of decomposing an interaction (In fact, it’s often seriously flawed), but I’m still going to illustrate it for completeness. I am not, however, going to do cell posthocs because they’re awful.

Second, I further understand that that is not really the optimal data set for illustrating this, because there isn’t really any evidence of an interaction. An oversight on my part, perhaps, but it’s a pretty well-known data set—let’s just go with these are the procedures that would be used were there a meaningful interaction, and all agree that they aren’t really appropriate for these data and give that a pass for pedagogical purposes. OK?

**Simple Main Effects in SPSS**

Unfortunately, the SPLIT… SORT construct that makes this easy to do in between-subjects ANOVA doesn’t work for repeated-measures data in SPSS. It’s not that bad, but it’s sort of tedious. Let’s say we wanted to look for the effects of Dose at the different levels of Type of drug. That would look like this:

`GLM T1D1 T1D2 T1D3`

/WSFACTOR Dose 3.

GLM T2D1 T2D2 T2D3

/WSFACTOR Dose 3.

GLM T3D1 T3D2 T3D3

/WSFACTOR Dose 3.

In each individual ANOVA, only a subset of the nine measures is used, and at each one, we test for an effect of Dose. Since there’s no interaction and a substantial main effect of Dose, we expect to reject the null in all three cases, and that’s what we get (in three pieces). Here’s the output for drug Type 1:

And for drug Type 2:

And for drug Type 3:

Obviously, if we were decomposing an interaction we’d be looking for some of these ANOVAs telling us to reject and others not.

**Simple Main Effects in R, Wide Format**

The basics for doing this in R are exactly the same as they are for doing this in SPSS—we do three separate ANOVAs by including only some of the variables in each one.

`Dose = factor(c("D1", "D2", "D3"))`

T1Anv = lm(cbind(T1D1, T1D2, T1D3)~1)

summary(Anova(T1Anv, idata=data.frame(Dose), idesign=~Dose, type=3), multivariate=F)

T2Anv = lm(cbind(T2D1, T2D2, T2D3)~1)

summary(Anova(T2Anv, idata=data.frame(Dose), idesign=~Dose, type=3), multivariate=F)

T3Anv = lm(cbind(T3D1, T3D2, T3D3)~1)

summary(Anova(T3Anv, idata=data.frame(Dose), idesign=~Dose, type=3), multivariate=F)

Same idea as SPSS, but wordier. Here’s the output in one big chunk:

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity SS num Df Error SS den Df F Pr(>F) (Intercept) 23585.3 1 144.67 15 2445.48 < 2.2e-16 *** Dose 386.8 2 277.21 30 20.93 2.04e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Mauchly Tests for Sphericity Test statistic p-value Dose 0.28979 0.00017161 Greenhouse-Geisser and Huynh-Feldt Corrections for Departure from Sphericity GG eps Pr(>F[GG]) Dose 0.58472 0.0001503 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 HF eps Pr(>F[HF]) Dose 0.6041376 0.0001227022 Univariate Type III Repeated-Measures ANOVA Assuming Sphericity SS num Df Error SS den Df F Pr(>F) (Intercept) 32396 1 1360.31 15 357.23 7.147e-12 *** Dose 384 2 318.62 30 18.08 7.046e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Mauchly Tests for Sphericity Test statistic p-value Dose 0.91258 0.52712 Greenhouse-Geisser and Huynh-Feldt Corrections for Departure from Sphericity GG eps Pr(>F[GG]) Dose 0.91961 1.466e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 HF eps Pr(>F[HF]) Dose 1.042018 7.04624e-06 Univariate Type III Repeated-Measures ANOVA Assuming Sphericity SS num Df Error SS den Df F Pr(>F) (Intercept) 24345.0 1 321.31 15 1136.5114 1.48e-15 *** Dose 153.3 2 637.38 30 3.6076 0.03945 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Mauchly Tests for Sphericity Test statistic p-value Dose 0.75338 0.13775 Greenhouse-Geisser and Huynh-Feldt Corrections for Departure from Sphericity GG eps Pr(>F[GG]) Dose 0.80217 0.05146 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 HF eps Pr(>F[HF]) Dose 0.8834696 0.0461275

Same answers as SPSS, of course, which is good. Now, for long-format, which is actually both harder and easier, depending on your background.

**Simple Main Effects in R, Long Format**

So, assuming the data are in a frame called “stim.lng” as in the example above, the way I think of doing this is in a for() loop. Essentially, this is exactly the same as the between-subjects version of this, which is great if you’re already comfortable with the idea of a for() loop, which not all psychology students are. Be that as it may, here’s a very simple loop which not only gets the job done but makes nice labels:

`for (Type in levels(stim.lng$TypeF)) {`

cat("\n------ Drug Type:", Type, "-----\n")

print(ezANOVA(stim.lng[stim.lng$TypeF==Type,], dv=Alrtns, wid=Snum, within=.(DoseF), detailed=T))

}

And here’s the output:

------ Drug Type: T1 ----- $ANOVA Effect DFn DFd SSn SSd F p p<.05 ges 1 (Intercept) 1 15 23585.3333 144.6667 2445.48387 4.964361e-18 * 0.9824272 2 DoseF 2 30 386.7917 277.2083 20.92966 2.039732e-06 * 0.4783079 $`Mauchly's Test for Sphericity` Effect W p p<.05 2 DoseF 0.2897867 0.0001716125 * $`Sphericity Corrections` Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05 2 DoseF 0.5847224 0.0001502653 * 0.6041376 0.0001227022 * ------ Drug Type: T2 ----- $ANOVA Effect DFn DFd SSn SSd F p p<.05 ges 1 (Intercept) 1 15 32396.0208 1360.312 357.22697 7.147465e-12 * 0.9507281 2 DoseF 2 30 384.0417 318.625 18.07964 7.046240e-06 * 0.1861588 $`Mauchly's Test for Sphericity` Effect W p p<.05 2 DoseF 0.9125831 0.5271171 $`Sphericity Corrections` Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05 2 DoseF 0.9196105 1.465652e-05 * 1.042018 7.04624e-06 * ------ Drug Type: T3 ----- $ANOVA Effect DFn DFd SSn SSd F p p<.05 ges 1 (Intercept) 1 15 24345.0208 321.3125 1136.51138 1.480283e-15 * 0.9621128 2 DoseF 2 30 153.2917 637.3750 3.60757 3.944790e-02 * 0.1378548 $`Mauchly's Test for Sphericity` Effect W p p<.05 2 DoseF 0.753376 0.1377471 $`Sphericity Corrections` Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05 2 DoseF 0.8021665 0.05145518 0.8834696 0.0461275 *

Note the identical *F*-values to the other two ways of doing this. However, this seems much easier to me than writing out three separate ones and scales better to larger designs. Of course, simple main effects aren’t my preferred way of dealing with this problem in the first place, and that’s where things start go get dicey.

**Interaction Contrasts**

Again, no, there’s not an interaction here. However, for pedagogical reasons, I’m going to go through interaction contrasts anyway, because this is where getting SPSS and R to produce the same answer is a little more complicated. Actually, with interaction contrasts, getting either package to produce an answer is not as straightforward as it might be, but both support it.

So what interaction contrast do we want? Since there isn’t really an interaction, it doesn’t matter much. Let’s say the experiment had come out differently, and not only was drug Type 2 better, but that interacted with Dose—let’s say the dose-response function was much steeper for drug Type 2. We’d want a linear contrast on dosage, which is (1 0 -1) on Dose, and we want to compare drug Type 2 with the average of the other two groups, which is (1 -2 1) on Dose. We want the outer product of those across the nine means—an interaction contrast. So, how do we make that happen in the two packages?

**Interaction Contrasts in SPSS**

In between-subjects ANOVA, SPSS’s support for interaction contrasts is utterly awful, as I mentioned in the blog post where I covered it. The good news is that it’s better for within-subjects ANOVA. It’s still not wonderful, but at least SPSS will do the cross-multiplication for you here (which it won’t for between-subjects ANOVA—don’t get me started). For a within-subjects ANOVA, we have to tell SPSS what contrasts we want, and if we only want one contrast on each, the easiest way to keep everything straight is to just repeat those contrasts in each of the statements. For the contrast we want, the syntax looks like this:

`GLM T1D1 TO T3D3`

/WSFACTOR Type 3 SPECIAL(1 1 1

1 -2 1

1 -2 1)

Dose 3 SPECIAL(1 1 1

1 0 -1

1 0 -1).

Kind of ugly. Note that repeated the contrasts twice isn’t necessary in the sense that if you wanted to put in other contrasts besides just the one I specified above, you could. You cannot, however, just put in the one you want and be done—SPSS requires that the matrix you specify on each factor be *k* by *k*, where *k* is the number of levels of the variable, and the first entry in the matrix must be all 1s. Why this is required is unclear. Anyway, here’s the relevant part of the output:

And, as expected, the test fails with a *p*-value of .455. Not a surprise since there isn’t really an interaction—but this is how you might break one down with contrasts.

Note that just as with one-way repeated-measure ANOVA, this assumes that each contrast gets its own error term, and thus we need not assume sphericity here, but we do lose some degrees of freedom in the error term. If this is a problem, then it’s possible to use the MSE from the original ANOVA and adjust the sum of squares for the contrast by dividing it by the squared vector length to produce an appropriate *F*. This is annoying to do, and it would be nice if SPSS gave the option to use the pooled error term for those who want it.

**Interaction Contrasts in R, Wide Format**

As noted in my previous post on one-way repeated measures ANOVA, R has good tactilities for vector (and matrix) multiplication, and that’s probably the easiest way to do contrasts in wide-format data: multiply the data by the contrast you want, and do a *t*-test on the resulting variable. The same strategy applies here, only now you need to take the outer product of the vectors on the two different independent variables and multiply your data matrix by the result of that. It sounds cumbersome but it’s really not that bad.

Remember that we have (1 0 -1) on Dose and (1 -2 1) on Type. We’ll take the outer product of those and just multiply by an array created by putting together all the relevant variables:

`## build the contrast vector. Contrast on Dose goes first, because`

## Dose changes fastest (opposite order of SPSS!)

LVec = c(c(1, 0, -1) %o% c(1, -2, 1))

## Multiply

L = cbind(T1D1, T1D2, T1D3, T2D1, T2D2, T2D3, T3D1, T3D2, T3D3) %*% LVec

## Do the test.

t.test(L)

And this is what you get:

One Sample t-test data: L t = 0.7663, df = 15, p-value = 0.4554 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: -4.11939 8.74439 sample estimates: mean of x 2.3125

Remember that SPSS produced F(1, 15) = 0.59, p = .455, so this does indeed produce the same result as SPSS, and rests on the same set of assumptions (no sphericity required, but no pooling of error d.f.).

**Interaction Contrasts in R, Long Format**

As far as I can tell, if you do it this way, you cannot get the same results out that you get from SPSS. There may well be some trick here that I don’t know, but so far I have been unable to get R to do this without using the pooled error term, and therefore requiring the assumption of sphericity. Whether that’s a good or a bad thing is not something I’m going to debate here, I just want to be up-front about what’s going on.

Furthermore, for reasons that I have not yet been able to determine, this method sometimes (frequently) completely fails, and R provides no information about why. I’ll admit that I’m a little flummoxed here and if there are any R folks out there who can help, I would most definitely appreciate the assistance.

The basic idea is to set up contrasts on the factors, run the ANOVA, and then ask for the summary to split the output according to those contrasts. The ezANOVA() procedure will kindly return the aov object in order to facilitate this, and then you can call summary() on that. It looks like this:

`## set up the factors with the contrasts we want, and label them`

contrasts(stim.lng$TypeF) = cbind(c(1, -2, 1))

contrasts(stim.lng$DoseF) = contr.poly(3)

TypeLb = list("2vs1&3"=1)

DoseLb = list("Linear"=1, "Quad"=2)

## run ANOVA with ez, but save out the result

ezANOVA(stim.lng, dv = Alrtns, wid = Snum, within = .(TypeF, DoseF), detailed=T)

AnvE = ezANOVA(stim.lng, dv = Alrtns, wid = Snum, within = .(TypeF, DoseF), return_aov=T)$aov

summary(AnvE, split=list(TypeF=TypeLb, DoseF=DoseLb))

This gives us results that actually match SPSS in terms of sums of squares for the main effect contrasts, but then, for no reason this is provided in the output, the interaction contrast simply do not appear at all:

Error: Snum Df Sum Sq Mean Sq F value Pr(>F) Residuals 15 816.9 54.46 Error: Snum:TypeF Df Sum Sq Mean Sq F value Pr(>F) TypeF 2 425.9 213.0 6.329 0.00509 ** TypeF: 2vs1&3 1 422.9 422.9 12.569 0.00131 ** Residuals 30 1009.4 33.6 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Error: Snum:DoseF Df Sum Sq Mean Sq F value Pr(>F) DoseF 2 878.4 439.2 20.200 2.78e-06 *** DoseF: Linear 1 876.0 876.0 40.291 5.30e-07 *** DoseF: Quad 1 2.3 2.3 0.108 0.745 Residuals 30 652.3 21.7 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Error: Snum:TypeF:DoseF Df Sum Sq Mean Sq F value Pr(>F) TypeF:DoseF 4 45.7 11.434 1.181 0.328 TypeF:DoseF: 2vs1&3.Linear 1 TypeF:DoseF: 2vs1&3.Quad 1 Residuals 60 580.9 9.682

Again, I’m not sure why, but this is another symptom of poor support for repeated measures in R. I’m positive that there is some way to get R to produce the relevant numbers—probably by computing all the relevant sums of squares manually—but I highly doubt your average first-year psychology graduate student is likely to be willing to go that route if the SPSS alternative is available.

## Translating SPSS to R: One-way Repeated Measures

There’s just no two ways about this: R is just not all that well-configured for repeated measures. Most of the ways of handling repeated measures in R are cumbersome, and some of them just flat-out fail under certain conditions. It’s not a great situation.

Generally speaking, people in the R community do not appear much concerned about this. The argument is usually that mixed linear models is a better way to go because it doesn’t require the same set of restrictive assumptions that repeated measures ANOVA requires. This is fundamentally an argument about sphericity (or, more technically, compound symmetry). If there were no downsides whatsoever to linear mixed models, I’d be OK with this, but it turns out there are some drawbacks that I’ll talk about in some later blog post. Furthermore, repeated measures ANOVA is still far more common in experimental psychology circles, and that’s where I run, so I not only need to be able to do them, I need to be able to teach new graduate students how to do them and how to understand them. R seems designed to make this particularly difficult. In general, R is more difficult to teach with than SPSS, but here’s where it gets particularly ugly.

As usual, I’d like to start with SPSS. However, before I can get into that, I need to spend some time talking about data file formatting. Yes, nuts and bolts stuff gets in there right away. The data come from an experiment (probably fictional—I don’t actually remember where these data came from originally) in which kids were shown the same cartoon on three different days. For those of you who don’t (or didn’t) have small children, kids’ capacity to watch the same thing over and over again and still find it funny is one of the truly amazing feats of childhood. So, the study was done to see if kids would rate the same cartoon less funny after subsequent viewings. Eight kids saw the same cartoon for three consecutive days, and rated how funny it was each time.

So, first we run into the question of “how should the data file be structured?” For between-subjects data, pretty much everybody does it the same way: each line of the data file contains the data for each subject, respectively. This makes sense, since each subject only contributes one observation. But for within-subject data, this isn’t clear. Should it be one observation per line, or one subject per line? It turns out these two ways to do it even have names: one observation per line is called “long” format and one subject per line is called “wide” format.

Wide format looks like this (and is in the file cartoon.xls):

Subj |
Day1 |
Day2 |
Day3 |

1 | 6 | 5 | 2 |

2 | 5 | 5 | 4 |

3 | 5 | 6 | 3 |

4 | 6 | 5 | 4 |

5 | 7 | 3 | 3 |

6 | 4 | 2 | 1 |

7 | 4 | 4 | 1 |

8 | 5 | 7 | 2 |

So, each line of data has the subject number, and then three observations per subject. Each variable (i.e., “Day2”) represents one measurement on each subject. Here’s what the “long” version of this looks like:

Subj |
Trial |
Rating |

1 | Day1 | 6 |

2 | Day1 | 5 |

3 | Day1 | 5 |

4 | Day1 | 6 |

5 | Day1 | 7 |

6 | Day1 | 4 |

7 | Day1 | 4 |

8 | Day1 | 5 |

1 | Day2 | 5 |

2 | Day2 | 5 |

3 | Day2 | 6 |

4 | Day2 | 5 |

5 | Day2 | 3 |

6 | Day2 | 2 |

7 | Day2 | 4 |

8 | Day2 | 7 |

1 | Day3 | 2 |

2 | Day3 | 4 |

3 | Day3 | 3 |

4 | Day3 | 4 |

5 | Day3 | 3 |

6 | Day3 | 1 |

7 | Day3 | 1 |

8 | Day3 | 2 |

Here, there are 24 lines of data with a variable for subject number (which now has recurring values), a variable representing which day the measurement is, and the rating given.

So, what’s the big deal? Well, it turns out that SPSS (and numerous other stat packages) assume that repeated-measures data is represented in wide format. R can handle both, but there are some advantages to wide-format data even in R, and fortunately it’s pretty easy to go from wide to long in R.

So, the rest of this is going to be done assuming the data are in wide format, as in cartoon.xls.

**Basic ANOVA in SPSS**

SPSS makes this quite easy. The basic command for this is the GLM command (stands for “general linear model”) and it’s fairly straightforward:

`GLM Day1 Day2 Day3`

/WSFACTOR Trial 3.

The first line is the command name followed by the list of variables to be included, and the second line defined the repeated measures (or “WS” for “within subjects”) factor, gives that factor a name, and tells SPSS how many levels it has.

It produces rather a lot of output. I’m going to skip the “multivariate tests” output (because if you’re going there, then I’m fully with the “use mixed linear models” argument). The first relevant part of the output is the sphericity diagnostics:

I’m not a big fan of Mauchly’s test because it has low power when sample size is small, but SPSS also produces epsilon estimates so it’s OK. Next is the actual ANOVA table:

It would be nice if it had total SS on it so that it’s easier to compute effect sizes (you can ask SPSS to provide partial eta squared, but that’s not really my preferred measure). Nonetheless, it has pretty much all the other stuff you’d want. There is another piece of output that is provided by default by SPSS, which is contrast output. SPSS assumes, rightly or wrongly, that if you don’t specify what contrasts you want, that you want polynomial contrasts on all your independent variables. Thus, you get this whether you want it or not:

Notice that the error term has only 7 degrees of freedom, and that each contrast has its own error term. This is a particular choice on the part of the SPSS developers. What this means is that the error terms are unique and thus do not require the sphericity assumption. It also means that these tests exactly match what you get if you do your repeated measures contrasts by *t*-test (more on that in a bit).

Some people prefer pooling the error term, that is, using the error term from the original RM ANOVA (the one with 14 df), but that requires the sphericity assumption, and the test doesn’t match the contrast t-test. This is the kind of thing reasonable people can disagree about (again, I’m not in the business of telling people how to adjudicate arguments like that on this blog—for now). It would be nice if SPSS gave the option to include the tests computed that way. Note that technically you can compute it yourself by hand if you want to, but you have to make sure you normalize the contrast vectors to make this work.

**Basic ANOVA in R, Wide Format**

R is not in the business of making this easy. Essentially, the way you do this in R is to set up a linear model with the “lm()” function, and then tell SPSS that’s it’s really a repeated measures ANOVA using the “Anova()” command out of the “car” package. It’s a little more laborious, but for a one-way it’s not really that bad. Again, assuming the data has been read in and attached, the code looks something like this (with annotations in the code this time):

`## load “car” to get the Anova() function`

library(car)

```
```## Create a model of the within-subjects factors

Anv1 = lm(cbind(Day1, Day2, Day3)~1)

## Create a labeled factor for w-s factors

Trial = factor(c("Day1", "Day2", "Day3"))

## do the ANOVA. “idata” and “idesign” tell Anova() how to

## structure within-subjects factors

Anv2 = Anova(Anv1, idata=data.frame(Trial), idesign=~Trial)

`## Get out the ANOVA tables and adjustments`

summary(Anv2, multivariate=F)

Somewhat more than SPSS indeed. Here’s the output:

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity SS num Df Error SS den Df F Pr(>F) (Intercept) 408.37 1 18.625 7 153.483 5.132e-06 *** Trial 33.25 2 16.750 14 13.896 0.0004735 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Mauchly Tests for Sphericity Test statistic p-value Trial 0.66533 0.29452 Greenhouse-Geisser and Huynh-Feldt Corrections for Departure from Sphericity GG eps Pr(>F[GG]) Trial 0.74925 0.001856 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 HF eps Pr(>F[HF]) Trial 0.9077505 0.0007808673

Unlike SPSS, no contrasts are provided by default, but all the relevant output matches. Unlike with SPSS, R doesn’t compute what the adjusted degrees of freedom are, so you have to compute those by hand. Not hard, but slightly annoying—isn’t computing stuff kind of the point of doing it in software in the first place? But I digress.

So, the basic ANOVA is not too bad to do in R with wide-format data. Let’s take a look at contrasts in both packages.

**Custom Contrasts in SPSS**

There are two relatively straightforward ways to do custom contrasts in SPSS: as *t*-tests and as part of the GLM. First, the contrasts. By default it provides polynomials so let’s do something else. Let’s try two: (-2 1 1) to compare the first viewing with the mean of the later viewings (that is, is it only funny once?), and then (0 1 -1) to compare the second and third viewings. To do those as t-test in SPSS, we just compute the relevant variables and test whether or not those equal zero:

`COMPUTE Day1vs23 = -2*Day1 + Day2 + Day3.`

COMPUTE Day2vs3 = Day2 - Day3.

EXECUTE.

```
```

`T-TEST`

/VAR Day1vs23 Day2vs3 /TESTVAL 0.

Why on earth SPSS requires the EXECUTE statement is beyond me, but I’ve complained about that before so I’ll leave it alone for now. Here’s the output:

Apparently, even for the kids, the cartoon is getting less funny over repeated viewings; even the second and third viewing are different. Anyway, that gets the job done. It is also possible, however, to do the contrasts right within the GLM itself. To do that, you need to tell SPSS to run contrasts in the WSFACTOR subcommand. It looks like this:

`GLM Day1 TO Day3`

/WSFACTOR Trial 3 SPECIAL(1 1 1

-2 1 1

0 1 -1).

You see the “-2 1 1” vector and the “0 1 -1” vectors there in the command. Why the “1 1 1” vector is required is a mystery to me, since you’re not allowed to put anything else in there. Bizarre. Anyway, the first part of the output is the same as the previous GLM command, but the contrast output is different:

Note the *p*-values are the same as the previous output, and those *F*s are the square of the *t*s from the previous output. SPSS will also supply the partial eta squared values for the contrasts if you ask it to with at “/PRINT ETASQ” command. Again, not my preferred measure of effect size.

**Custom Contrasts in R, Wide Format**

The “Anova()” procedure in R doesn’t support contrasts at all, so you more or less have to do it by *t*-test. The good news is that R understands matrix multiplication, so you can at least make the code short. First, the SPSS way, writing out the contrast coefficients implicitly:

`Day1vs23 = -2*Day1 + Day2 + Day3`

t.test(Day1vs23)

Day2vs3 = Day2 - Day3

t.test(Day2vs3)

There is a slightly quicker way to do this that relies on R’s ability to do matrix multiplication. I prefer this because it lets you write out the contrast vectors explicitly:

`Day1vs23 = cbind(Day1, Day2, Day3) %*% c(-2, 1, 1)`

t.test(Day1vs23)

Day2vs3 = cbind(Day1, Day2, Day3) %*% c(0, 1, -1)

t.test(Day1vs23)

And here are the results (they’re identical for both code snippets above):

One Sample t-test data: Day1vs23 t = -3.8129, df = 7, p-value = 0.006603 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: -5.468036 -1.281964 sample estimates: mean of x -3.375 One Sample t-test data: Day2vs3 t = 3.6602, df = 7, p-value = 0.008068 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 0.7521863 3.4978137 sample estimates: mean of x 2.125

So, as it should, R agrees with SPSS on these tests. All is well in the universe, right? Well, maybe, unless (a) you have long-format data, or (b) you’d like to use pooled error terms in your contrast tests. In the case of (b), those tests will not necessarily agree with SPSS, though of course SPSS will let you compute those tests if you really want them, as all the appropriate information is on the printout. So, let’s take a look at R with long-format data.

**Conversion to Long Format in R**

The good news is that if you want long-format data in R and you have wide-format data, it’s pretty easy to transform the data. (This is also possible in SPSS, but it’s moderately cumbersome to do it and I have to look it up every time.) Let’s say the data is in a data frame called “cartoon.” I’m going to put it into long format with a command called “melt()” from the “reshape2” library. The commands look like this:

`detach(cartoon)`

library(reshape2)

cartoon.lng = melt(cartoon, id.vars="Subj", variable.name="Trial", value.name="Rating")

Note that the “detach()” isn’t technically necessary. This produces a data frame called “cartoon.lng” that looks exactly like the long-format table earlier in the post.

**Basic ANOVA in R, Long Format**

R offers multiple ways to handle the ANOVA in wide format. The simplest way is to use the “ezANOVA()” function out of the “ez” package. Essentially this is really just a wrapper for the “car::Anova()” function we used last time, but it makes things slightly easier, particularly for more complex designs. For this one-way design, the simplicity gain isn’t that large. The code looks like this:

`library(ez)`

cartoon.lng$SubjF = factor(cartoon.lng$Subj)

ezANOVA(cartoon.lng, dv = Rating, wid = SubjF, within = Trial, detailed=T)

So, that code loads the “ez” library, creates a new variable in the “cartoon.lng” data frame that is the “Subj” variable represented as a factor, then does the ANOVA with “Rating” as the dependent variable (hence “dv”), notes that the variable that identifies subject (the “wid”) is the variable “SubjF” and the within-subjects variable (“within”) is “Trial”. Here’s the output:

$ANOVA Effect DFn DFd SSn SSd F p p<.05 ges 1 (Intercept) 1 7 408.375 18.625 153.48322 5.131880e-06 * 0.9202817 2 Trial 2 14 33.250 16.750 13.89552 4.734931e-04 * 0.4845173 $`Mauchly's Test for Sphericity` Effect W p p<.05 2 Trial 0.6653301 0.2945177 $`Sphericity Corrections` Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05 2 Trial 0.7492489 0.00185552 * 0.9077505 0.0007808673 *

And of course the output matches the wide-format use of “car::Anova()” since it’s actually using that procedure to do the actual ANOVA. Note that this automatically provides an estimate of effect size; that’s the “ges” on the printout. This is the “generalized eta squared” measure from Bakeman (2005), which is not really a standard in Psychology but is a defensible choice and is certainly not unheard-of.

The minor down side to ezANOVA is that it doesn’t let you do contrasts. Well, actually, you can, but to do so you need to have ezANOVA return the lm object, which it will do, but to understand how to use that you need to do what’s in the next section anyway, so I’ll go through that.

**Custom Contrasts in R, Wide Format**

This is where I dislike doing things in the long format in R, because the only straightforward way to do the contrasts provides you with what you have to do by hand in SPSS, that is, pooled error terms.

First, it is actually possible to do the ANOVA with long-format data without using “ezANOVA().” However, if you do it this way, you do not get sphericity diagnostics, which seems like a really bad idea to me. (Look, I get that the logic is that mixed linear models don’t make this assumption, but how do you know how important that assumption is if you don’t get information about it? Seems backward to me.) Anyway, you do this in R by specifying the within-subjects error term in the specification of the model. It looks like this:

`Anv = aov(Rating ~ Trial + Error(SubjF/Trial), data=cartoon.lng)`

summary(Anv)

That whole “Error(SubjF/Trial)” is the specification of the within-subjects error term. Here’s the output:

Error: SubjF Df Sum Sq Mean Sq F value Pr(>F) Residuals 7 18.62 2.661 Error: SubjF:Trial Df Sum Sq Mean Sq F value Pr(>F) Trial 2 33.25 16.625 13.9 0.000473 *** Residuals 14 16.75 1.196 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

How can you not love that? The *p*-value to six decimal places, but the *F*-value to only two. Really? (Question for anyone who happens to read this from the R community: are there journals out there that request six decimal places for the *p*-value? If so, how many decimal places do they want for the *F*? Because in psychology the convention is for two decimal places in the *F*-value and no more than 3 for the *p*-value.) And while the relevant stuff matches the other SPSS and R outputs, as I noted, there’s no Mauchly test or sphericity epsilons there.

So that’s not great, but what about contrasts? Well, you do it pretty much how you do contrasts in between-subjects ANOVA in R: you set up the contrast, run the ANOVA, and you ask R to split up the output in the summary. It looks like this:

`contrasts(cartoon.lng$Trial) = cbind(c(-2, 1, 1), c(0, 1, -1))`

Anv = aov(Rating ~ Trial + Error(SubjF/Trial), data=cartoon.lng)

summary(Anv, split=list(Trial=list("Day1vs23"=1, "Day2vs3"=2)))

I know, pretty, isn’t it? Here’s the output:

Error: SubjF Df Sum Sq Mean Sq F value Pr(>F) Residuals 7 18.62 2.661 Error: SubjF:Trial Df Sum Sq Mean Sq F value Pr(>F) Trial 2 33.25 16.625 13.90 0.000473 *** Trial: Day1vs23 1 15.19 15.188 12.69 0.003120 ** Trial: Day2vs3 1 18.06 18.062 15.10 0.001648 ** Residuals 14 16.75 1.196 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So, there, contrasts in long-format repeated-measures ANOVA. You’ll note that these results do **not** agree with the results generated by SPSS or by the t-tests we did on the wide-format data. As I mentioned before, these use the pooled error term, which means you have to be especially careful with sphericity here.

Note that you can get the same results from SPSS with a little hand computation. If you go back to the “L1” line in the SPSS output when we did this contrast, note that the SS for the contrast was 91.125. The squared length of the (-2 1 1) vector is six, so if we take that SS and divide by six, we get 15.188, which is the SS for the contrast in the wide format for R, so we could compute the same 12.69 *F*-value by dividing by the (pooled) MSE of 1.196. So this method is available to SPSS users, though it’s kind of a hassle since you have to do a lot of the computations by hand.

Frankly, I’m not a fan of doing contrasts this way, which means if I’m using R, I pretty much have to stick to wide-format data. That’s basically OK in this context, but it can get a little cumbersome for ANOVAs with many within-subjects factors. That’ll be the subject for the next post…

## Translating SPSS to R: Non-orthogonal (or unbalanced) Factorial Between-Subjects ANOVA

First, hello, yes, the blog series is back. I got a little sidetracked actually teaching the course and then didn’t pick it up last summer, and then the cycle repeated itself. Part of the reason is also that I hit something of a snag along the way and was unable to get SPSS and R to produce the same answers for certain analyses, and it took me a while to figure out what was going on. Now that I believe I understand the problem (and the solution to it), it’s time to plunge back in.

So, now, the situation: non-orthogonal (or unbalanced) factorial between-subjects ANOVA. That alone is quite a mouthful. What that means is a two (or more) way ANOVA with one observation per subject where the number of subjects in each cell is not equal.

For most people who might read this blog (all three of you), the implications of this are generally pretty well-known: it means the different ANOVA effects are not statistically independent of each other. This means that something generally has to be done to correct. This is a bit of a controversial point in the statistics universe. Many statistical purists (and, by my reading of it, most of the R community) think the right approach to this problem is to use Type II sums of squares. There is another contingent, however—and most psychologists, as far as I can tell—that argues that Type III sums of squares are the way to go. This is the kind of thing that generates name-calling on statistical Web sites (particularly from the R side).

I am not interested in having this argument here. I just care about getting the statistics software to produce whichever analysis is preferred by the analyst. It seems to me this ought to be the focus of discussions on the software side of things, but that doesn’t seem to be how it usually plays out. Regardless, I’m going to present both without discussion of the pros and cons of the different approaches, just the “how to” part.

And the good news is that both packages can do this, although some of the default behaviors by the two packages are somewhat odd. I’ll get to the details soon.

The data for these examples is in nonorth.xls, which has three variables: “Dep,” the dependent variable, and “A” and “B,” the two independent variables. Noting fancy, no cover story, just the data.

**Basic ANOVA and Sums of Squares**

First, in SPSS. The default behavior in SPSS is to produce Type III sums of squares. This is entirely reasonable given that this is still the majority viewpoint in the social sciences—or, at least, it’s what I was taught in grad school in Psycholgy in the early 1990s, anyway. So, the simplest command:

`UNIANOVA Dep BY A B.`

produces this ANOVA table:

And it even nicely notes that it’s using Type III. Great so far. If you’d prefer Type II sums of squares, you just tell SPSS to override the default:

`UNIANOVA Dep BY A B`

/METHOD = SSTYPE(2).

Note that it is possible to request Type I sums of squares, but nobody uses that. Anyway, the output:

Notice that the sums of squares for both A and B have increased, as you’d expect with Type II sums of squares. Very straightforward.

Now, R. Despite the fact that nobody I can find anywhere actually recommends the use of Type I sums of squares, that’s the default in R. Yes, it defaults to a behavior that nobody wants—in what universe that makes sense, I’m sure I don’t know. So, assuming the data frame is already attached, first create factors and then run the ANOVA with this code:

`AF = factor(A)`

BF = factor(B)

summary(aov(Dep ~ AF * BF))

The following output is produced:

Df Sum Sq Mean Sq F value Pr(>F) AF 1 11438 11438 42.846 1.20e-09 *** BF 3 9791 3264 12.226 4.14e-07 *** AF:BF 3 496 165 0.619 0.604 Residuals 132 35237 267 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Nowhere in the output does it identify that these are Type I sums of squares, but if you reverse the order of the factors in the “aov()” command, you do indeed get different output. That is, this code:

`summary(aov(Dep ~ BF * AF))`

yields this ANOVA table:

Df Sum Sq Mean Sq F value Pr(>F) BF 3 17521 5840 21.879 1.45e-11 *** AF 1 3707 3707 13.888 0.000287 *** BF:AF 3 496 165 0.619 0.603611 Residuals 132 35237 267 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The sums of squares for A and B change dramatically, as expected. All of this is well and good, but not very helpful, since (almost) nobody actually uses Type I sums of squares. In order to get Type II or III sums of squares, the “Anova()” command—note the capital “A”—from the “car” package is required. In addition, certain options in R must be set in order for the internal contrasts used by the system to generate the correct output, so again, the defaults produce things other than what is desired. (This one is a little more defensible, since internally R defaults to dummy coding of categorical variables, which is not a wholly unreasonable choice, but it does mess up SS computations in unbalanced ANOVAs when done that way.)

So, first, load the “car” package, set the relevant options, create an aov object, and then run that through “Anova()” with the options set:

`library(car)`

options(contrasts = c("contr.sum","contr.poly"))

Anv = aov(Dep ~ AF * BF)

Anova(Anv, type=2)

This yields output that matches SPSS (whew):

Anova Table (Type II tests) Response: Dep Sum Sq Df F value Pr(>F) AF 3707 1 13.8883 0.0002866 *** BF 9791 3 12.2259 4.137e-07 *** AF:BF 496 3 0.6194 0.6036113 Residuals 35237 132 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

It does also note that it’s using Type II sums of squares. To get Type III, the “type” parameter in “Anova()” is just changed appropriately:

`Anova(Anv, type=3)`

And that produces output that again matches SPSS:

Anova Table (Type III tests) Response: Dep Sum Sq Df F value Pr(>F) (Intercept) 31112 1 116.5482 < 2e-16 *** AF 1192 1 4.4656 0.03647 * BF 2892 3 3.6113 0.01509 * AF:BF 496 3 0.6194 0.60361 Residuals 35237 132 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

So while it is mildly irritating that R has a useless default, it’s at least not actually difficult to get it to produce something reasonable.

**Contrasts**

This is where things get a little more interesting. Let’s say that independent variable B is an interval variable so that polynomial contrast analysis is a reasonable approach. SPSS can generate this pretty easily:

`UNIANOVA Dep BY A B`

/CONTRAST(B) = POLYNOMIAL.

What’s unfortunate is that the output is so dumb. Here’s what the relevant part of the output looks like:

Hey, SPSS, how about the actual *t*-values? It doesn’t print the relevant *F*-values, either, just the *p*-values. Grr. The corresponding t-values are -1.27 for the linear contrast, -4.55 for the quadratic, and 3.01 for the cubic. (Fs would be the squares of those: 1.61, 20.71, and 9.06.)

What’s much more interesting about this is that SPSS doesn’t care what you set for the sum of squares method. If you set the sum of squares method for Type I, the *contrast output is not affected*. So it’s not sensitive to what’s being done in terms of variable entry.

Now, let’s take a look at what R does with the same basic idea. This R code first makes sure that the B factor uses polynomial contrasts, then sets up labels for those contrasts, runs the ANOVA, and prints out a summary table with the B factor split and labelled:

`contrasts(BF) = contr.poly(4)`

CondLab = list("Linear"=1, "Quad"=2, "Cubic"=3)

AnvL1 = aov(Dep ~ BF*AF)

summary(AnvL1, split=list(BF=CondLab))

Here’s the output:

Df Sum Sq Mean Sq F value Pr(>F) BF 3 17521 5840 21.879 1.45e-11 *** BF: Linear 1 4668 4668 17.486 5.24e-05 *** BF: Quad 1 7797 7797 29.206 2.94e-07 *** BF: Cubic 1 5057 5057 18.943 2.68e-05 *** AF 1 3707 3707 13.888 0.000287 *** BF:AF 3 496 165 0.619 0.603611 BF:AF: Linear 1 68 68 0.254 0.615356 BF:AF: Quad 1 75 75 0.282 0.596155 BF:AF: Cubic 1 353 353 1.322 0.252225 Residuals 132 35237 267 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Hmm, none of those match SPSS at all. If you look carefully you’ll note that the sums of squares for the three B contrasts do in fact add up to the sum of squares for B, but this is the Type I sum of squares for B, with B added first. These numbers are inflated! This is probably pretty dangerous.

Note that, because of this Type I computation, R is sensitive to the order in which variables are listed in the model specification. So, this code:

`AnvL2 = aov(Dep ~ AF*BF)`

summary(AnvL2, split=list(BF=CondLab))

produces very different results:

Df Sum Sq Mean Sq F value Pr(>F) AF 1 11438 11438 42.846 1.20e-09 *** BF 3 9791 3264 12.226 4.14e-07 *** BF: Linear 1 1 1 0.004 0.950627 BF: Quad 1 6330 6330 23.711 3.15e-06 *** BF: Cubic 1 3461 3461 12.963 0.000448 *** AF:BF 3 496 165 0.619 0.603611 AF:BF: Linear 1 68 68 0.254 0.615356 AF:BF: Quad 1 75 75 0.282 0.596155 AF:BF: Cubic 1 353 353 1.322 0.252225 Residuals 132 35237 267 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

These aren’t wrong in the sense that R is making some kind of computational error, but again, these are probably not the results you want to report in a journal article.

So, how do you get R to produce something that makes more sense? And what exactly is SPSS doing internally to get those numbers? It took me a while to figure this out. I recoded the thing as a straightforward multiple regression in SPSS and tried doing it a whole bunch of different ways in terms of order of variable entry—I assumed it was doing some kind of change in F if the term is added last kind of thing—and then hit it completely by accident when I just ran the regression with all variables in it. It’s nice and clean in R as a simple regression:

`AnvL3 = lm(Dep ~ AF * BF)`

summary(AnvL3)

Note the use of “lm()” there instead of an ANOVA command. Since all our independent variables now have a single degree of freedom, it’s just a regression, right? And that’s when I realized what SPSS is doing internally, as here’s the result:

Call: lm(formula = Dep ~ AF * BF) Residuals: Min 1Q Median 3Q Max -39.120 -11.779 2.146 11.190 45.396 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 94.779 1.872 50.643 < 2e-16 *** AF1 -6.397 1.872 -3.418 0.000838 *** BF.L -4.918 3.874 -1.270 0.206471 BF.Q -17.023 3.743 -4.548 1.21e-05 *** BF.C 10.867 3.608 3.012 0.003108 ** AF1:BF.L 2.309 3.874 0.596 0.552109 AF1:BF.Q 1.242 3.743 0.332 0.740653 AF1:BF.C 4.149 3.608 1.150 0.252225 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 16.34 on 132 degrees of freedom Multiple R-squared: 0.3814, Adjusted R-squared: 0.3486 F-statistic: 11.63 on 7 and 132 DF, p-value: 1.857e-11

If you’re not sure what this means, look again at the t-values for B, noting that “.L” denotes linear, “.Q” quadratic, and so on. These match what SPSS produces (though of course SPSS doesn’t print out the actual t-values). So, what SPSS is doing internally is simultaneous entry of all terms into a regression, and tests of the coefficients for the contrast variables. This is actually a pretty reasonable thing to do, but is completely opaque to the SPSS user and is not explained anywhere that I could find in the SPSS documentation.

Note that in an unbalanced ANVOA, this is almost certainly the right thing to do, too, since the Type I approach is potentially more than a little permissive depending on the order of entry of the variables. So, if you’re going to use R (perhaps migrating from SPSS like me), then this is probably how contrasts should be handled if the ANOVA is unbalanced.

Note that neither package produces effect sizes (e.g., eta squared, Cohen’s ƒ) for contrasts. Bummer.

Next time: one-way repeated measures. There are many ways to do this in R, but some are better than others…

## Translating SPSS to R: Factorial Between-Subjects ANOVA

Sorry for the lag between posts, but the semester’s been busy and I’ve been sick. Today’s topic is factorial between-subjects ANOVA, but with a particular constraint: the design discussed here will be balanced (or orthogonal) in that the number of subjects in each cell will be the same. The world gets much more complicated when things are not balanced and SPSS and R do not agree in how they handle that, so that situation will get its own post.

I start my students with almost, but not quite, the simplest two-way ANOVA possible, a 2 x 3. (A 2 x 2 doesn’t give much opportunity to do contrasts, which is why I don’t start there.) The data set we start with is this one: detection.xls. There is one dependent variable, “Targets” (number of targets detected by each subject), “Group” with two levels, air traffic controllers (ATCs) and regular college sophomores, and “Clutter” indexing the amount of clutter on the display with three levels: high, medium, and low. I’m not going to go into things like descriptives and box plots because the data set is clean and I’ve already covered that stuff in previous posts.

**Basic ANOVA**

The SPSS code for generating the basic two-way ANOVA and generating interaction plots is pretty simple. First, though, we’ll label the levels of the variables, then run the ANOVA, like this:

`ADD VALUE LABELS Group 0 "ATCs" 1 "Students".`

ADD VALUE LABELS Clutter 0 "High" 1 "Medium" 2 "Low".

`UNIANOVA Targets BY Group Clutter`

/PLOT = PROFILE(Clutter * Group)

/PLOT = PROFILE(Group * Clutter)

/PRINT ETASQ.

We get a nice table of factors, a very complete ANOVA table and two interaction plots that look like these:

For those that dislike partial eta squared as a measure of effect size (I’m not a huge fan myself), SPSS nicely reports all the relevant quantities so that calculation of eta-squared or omega-squared is pretty straightforward. The interaction plots are sort of stupid in that SPSS always zooms in the maximum amount to exaggerate differences, and the color choices aren’t great, but it’s not awful.

So, now R. As usual, there are a whole lot of different ways to get a simple ANOVA out of R. The most straightforward is just to use the “aov()” function for generating an ANOVA. It doesn’t produce interaction plots, but those are pretty easy to get. First we’ll create and label factors (assuming the data set is attached). The code looks like this:

`## Make factors`

ClutterF = factor(Clutter, labels=c("High", "Medium", "Low"))

GroupF = factor(Group, labels=c("ATCs", "Students"))

`## basic ANOVA`

Anv = aov(Targets ~ GroupF*ClutterF)

summary(Anv)

`## interaction plots`

interaction.plot(ClutterF, GroupF, Targets, type="b")

interaction.plot(GroupF, ClutterF, Targets, type="b")

And the output looks like this:

Df Sum Sq Mean Sq F value Pr(>F) GroupF 1 31 30.8 0.371 0.5448 ClutterF 2 4909 2454.5 29.586 2.11e-09 *** GroupF:ClutterF 2 538 269.0 3.243 0.0468 * Residuals 54 4480 83.0 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Again, there’s annoying whitespace in the tops of the R plots, but otherwise they’re no better or worse than the SPSS plots. R doesn’t automatically give us the partial eta squared values, but the ANOVA table has everything needed on it to compute them. It sure would be nice if the ANOVA table included the total sum of squares, though, as that would save some steps in the computation. On the other hand, because the table produced is a first-class object in R, it is possible to write a function that would compute relevant effect size metrics—but why should I have to do this? Why is this not even an option in R, when most journals (at least in psychology) now encourage or flat-out require reporting of effect size?

So, a slight edge to SPSS on effect size. So, what to do with these results? Well, I always instruct my students that they should deal with the interaction first, before even looking at the main effects. I teach them three ways to deal with interactions, though I don’t actually recommend they use one of them. The three ways are doing posthocs on cells, doing simple main effects, and doing interaction contrasts. I note that the last of those is probably the best approach but they should be aware of the others because they’ll see them in print and they should know how to do them. So let’s look at how to handle each of those in the two packages.

**Cell Posthocs**

Frankly, I hate these and wish people would stop doing them. If the interaction is really keyed by some pair of cells being different, there should be a decent contrast for it. Anyway, here’s how I teach my students to handle it: create a new variable representing the cells, then do a oneway ANOVA with that as the new independent variable, and look for pairwise differences there. Here’s the SPSS version of this:

`COMPUTE Cell = Clutter + 10*Group.`

EXECUTE.

ONEWAY Targets BY Cell

/POSTHOC QREGW.

I use “10” as the multiplier in creating the “Cell” variable because that way the cell numbers are indexed by row, column in the variable values. Here’s the relevant piece of the SPSS output:

In terms of interpreting the interaction, this is pretty close to useless, which is actually what I want them to learn. I like that SPSS produces this in a nice, easy to read table. The equivalent R code requires loading a library to get Ryan’s Q (as mentioned last time) and looks like this:

`library(mutoss)`

clutter$CellF = factor(Clutter + 10*Group)

regwq(Targets ~ CellF, data=clutter, alpha=.05)

And produces this output (truncated for the sake of brevity, if you can believe that):

#----REGWQ - Ryan / Einot and Gabriel / Welsch test procedure Number of hyp.: 15 Number of rej.: 10 rejected pValues adjPValues 1 1 0 0 2 3 0 0 3 5 0.0016 0.0016 4 4 0.0017 0.0017 5 6 0.0019 0.0019 6 8 0.0061 0.0061 7 7 0.0081 0.0081 8 10 0.0087 0.0087 9 12 0.0163 0.0163 10 2 3e-04 3e-04 $adjPValues [1] 0 3e-04 0 0.0017 0.0016 0.0019 0.0081 0.0061 0.3484 0.0087 0.526 0.0163 0.4646 0.5103 0.0184 Levels: 0 0.0016 0.0017 0.0019 0.0061 0.0081 0.0087 0.0163 0.0184 0.3484 0.4646 0.5103 0.526 3e-04`$rejected [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE`

`$statistic [1] 9.8254 6.3882 8.9227 5.4508 5.4855 5.4161 4.4093 4.5481 1.9790 4.3745 0.9027 3.5066 1.0416 0.9374 3.4371`

$confIntervals [,1] [,2] [,3] 0-2 28.3 NA NA 10-2 18.4 NA NA 0-12 25.7 NA NA 11-2 15.7 NA NA 10-12 15.8 NA NA 0-1 15.6 NA NA 1-2 12.7 NA NA 11-12 13.1 NA NA 10-1 5.7 NA NA 0-11 12.6 NA NA 12-2 2.6 NA NA 1-12 10.1 NA NA 11-1 3.0 NA NA 10-11 2.7 NA NA 0-10 9.9 NA NA

And, as previously mentioned, this is a terrifically awful output format. Yes, I get that posthocs are frowned upon in the statistics purist community and I agree that they are often abused and should be discouraged, but as a usability person I’m still stupefied by how needlessly hostile this presentation is.

Again, though, the good news is that perhaps students learn not to try to interpret interactions this way.

**Simple Main Effects**

Many people just call this “simple effects” and that’s fine, my old stats professor in grad school called it “simple main effects” so that’s how it reads back in my head. This procedure is common but is also inherently flawed since it relies on accepting the null of “no effect” in some conditions. As a result, I don’t really like it that much, either, but psychologists aren’t going to stop doing it anytime soon and so students should learn what it is. In this case I’ll just show the version where we split on the “Group” variable and look at whether or not there’s an effect of “Clutter.” SPSS supports this pretty well, actually:

`SORT CASES BY Group.`

SPLIT FILE BY Group.

UNIANOVA Targets BY Clutter.

SPLIT FILE OFF.

The relevant part of the output looks like this:

I do note that they shouldn’t actually directly use these results and should use the error term from the original ANOVA. Some of them even remember to do that. Now, R. As usual, I’m sure there are about a zillion ways to do this in R, probably involving “by()” or some “ply” function, but hey, the first language I learned was BASIC and the second was FORTRAN, so I’m using a for loop, dammit:

`for (Grp in levels(GroupF)) {`

cat("\n------ Group:", Grp, "-----\n")

print(summary(aov(Targets[GroupF == Grp] ~ ClutterF[GroupF == Grp])))

}

I put in the little header there to make it easier to read. Here’s the output:

------ Group: ATCs ----- Df Sum Sq Mean Sq F value Pr(>F) ClutterF[GroupF == Grp] 2 4018 2009.2 28.12 2.5e-07 *** Residuals 27 1929 71.4 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 ------ Group: Students ----- Df Sum Sq Mean Sq F value Pr(>F) ClutterF[GroupF == Grp] 2 1428 714.2 7.56 0.00247 ** Residuals 27 2551 94.5 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Great, very comparable, though I could do without all the “significance codes” garbage. I think the syntax to generate this is a little easier to understand in SPSS for the novice, but the R isn’t *that* bad, and the mere fact that it has a for loop available is a big win in my book.

**Interaction Contrasts**

OK, gentle reader, this is the part of the show that makes me want to find the SPSS programmers and do unspeakable things to them. SPSS’s handling of interaction contrasts is, to put it bluntly, horrific. In a straightforward world, I’d generate a contrast on one factor, generate the contrast on the other factor, and the software would do the cross-multiplication, and everything would be great. But no, SPSS will **not** do the cross multiplication! Here’s the best part: in within-subjects ANOVA with SPSS’s GLM procedure, it **does** do the cross-multiplication! So why, for the love of all that is good in the world, doesn’t it do this for between-subjects factors? Why? WHY?

Oh, sorry, back the the story. So, I teach my students that there are two ways to cajole SPSS into producing an interaction contrast: code the design into individual cells (like with posthocs on cells) and apply the hand-multiplied vector to that, or use the /LMATRIX subcommand supported by SPSS, which unfortunately does not work at all if the design is anything other than a two-way (seriously, I am not making that up). So, let’s say the contrast we want on “Clutter” is (-2 1 1), which is “high” vs. the mean of the other two, and of course the only contrast on “Group” is (1 -1) since it only has two levels. Coding it by cells, the SPSS code looks like this:

`UNIANOVA Targets BY Cell`

/CONTRAST(Cell) = SPECIAL(-2 1 1 2 -1 -1).

Note that this is a total minefield, because the user has to remember exactly how the “Cell” variable was coded and if the order of the coefficients is screwed up, SPSS will happily spit out a contrast—it’ll just be garbage. Always assuring for new students. Anyway, the output:

If they want to use /LMATRIX, then it looks like this:

`UNIANOVA Targets BY Group Clutter`

/LMATRIX "Reverse at high" Group*Clutter

-2 1 1 2 -1 -1.

At least there they get to put in a label. The best part there, though, is that the coefficient order is dictated by the order of the IVs in the “BY” part of the syntax, not the order in the LMATRIX command itself. Awesome. Anyway, the output is basically the same:

Plus an ANOVA table that is identical to the one above. On the plus side, it does at least spit out the contrast value, so that’s something, but it’s pretty weak sauce.This whole process is so mind-numbingly unnecessary I find it hard to believe. I just know that internally SPSS is doing the ANOVA as a multiple regression anyway—why not just replace the internal vectors with the user-supplied ones? Seriously.

Of course, this is exactly what R does. Thank you, R, for a breath of sanity. R knows that contrasts are a property of the independent variables and allows you to set them—hey, how about that! So, you just set it, tell R to label it in the output at the output step, and away you go:

`contrasts(ClutterF) = cbind(c(-2, 1, 1))`

ClutLab = list("Hi v Oth"=1)

AnvL1 = aov(Targets ~ GroupF*ClutterF)

summary(AnvL1, split=list(ClutterF=ClutLab))

And you get this output:

Df Sum Sq Mean Sq F value Pr(>F) GroupF 1 31 31 0.371 0.5448 ClutterF 2 4909 2454 29.586 2.11e-09 *** ClutterF: Hi v Oth 1 3245 3245 39.112 6.66e-08 *** GroupF:ClutterF 2 538 269 3.243 0.0468 * GroupF:ClutterF: Hi v Oth 1 538 538 6.481 0.0138 * Residuals 54 4480 83 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Hey, look, the same *F*-value without having to do the matrix multiply on your own. How about that? Was that really so hard, SPSS? Really?

**Main Effect Contrasts**

If you’re doing an interaction contrast in R, you get the main effect contrast for free—see above. If you want the equivalent main effect contrast in SPSS, that at least is relatively easy:

`UNIANOVA Targets BY Group Clutter`

/CONTRAST(Clutter) = SPECIAL(-2 1 1).

You get the same kind of output as in previous contrasts:

Thankfully, that matches the R output from the above, with that added bonus that you get the actual contrast estimate and standard error, too, if you need them. Note that I’m not advocating this particular analysis in this case, as the interaction is what’s really relevant—this is just how the code works in case the interaction is not significant (in both senses of the term) and a main effect contrast is called for.

**Main Effect Posthocs**

As previously noted, posthocs are not exactly a strength in R. SPSS, on the other hand, makes them easy. To do a Ryan’s Q posthoc on the main effect of “Clutter,” this is the appropriate SPSS code:

`UNIANOVA Targets BY Group Clutter`

/POSTHOC = Clutter(QREGW).

And here’s the output:

Usefully noting that all three marginal means are distinguishable. As noted, R sucks at posthocs. Getting Ryan’s Q for a single factor within a factorial design is less than wonderful, as you have to manually supply the error term d.f. and MSE, because the procedure doesn’t natively understand factorial designs (yes, really):

`regwq(Targets~ClutterF, data=clutter, alpha=.05, df=54, MSE=82.961)`

And here’s the relevant part of the output:

#----REGWQ - Ryan / Einot and Gabriel / Welsch test procedure Number of hyp.: 3 Number of rej.: 3 rejected pValues adjPValues 1 1 0 0 2 2 0 0 3 3 0.0025 0.0025 $adjPValues [1] 0 0 0.0025 Levels: 0 0.0025 $rejected [1] TRUE TRUE TRUE $statistic [1] 10.8265 6.3338 4.4926 $confIntervals [,1] [,2] [,3] High-Low 22.05 NA NA Medium-Low 12.90 NA NA High-Medium 9.15 NA NA

I’ll be the first to agree with the R purists that this is not the best kind of analysis to be doing, but nonetheless, there are some times when it’s all you’ve got, so doing it should be less convoluted.

**Bottom Line**

Most of this is easier and/or more straightforward in SPSS, but with one fatal weakness. The particular beef I have with R is that it doesn’t do a good job of supporting effect size computation because it doesn’t output total sum of squares. (I realize this is not hard to compute, but my point is that I shouldn’t have to.) On the other hand, SPSS’s handling of interaction contrasts is unforgivably awful. Since I firmly believe that’s often the best way to help interpret an interaction, this is a pretty serious failing. I have to give the edge to R here.

## Translating SPSS to R: Simple ANOVA

This is where things start to get more interesting. The ANOVA is one of the older tools in the experimental psychology toolbox, and as a result there is quite a lot of depth here. I’m going to cover ANOVA in multiple pieces. Today’s entry is the simplest case: between-subjects one-way ANOVA. I will discuss the basic procedures, unequal variance adjustment, contrasts, and posthocs.

Later entries will cover balanced factorial between-subjects ANOVA, unbalanced factorial between-subjects ANOVA, simple repeated-measures ANOVA, factorial repeated-measures ANOVA, and ultimately factorial mixed (that is, between- and within-subjects) ANOVA. One step at a time, though.

The simplest complication from a standard two-groups between-subjects *t*-test is a oneway ANOVA with three groups, so this is where I take my class. The example data set (synchro.xls) has two treatment conditions (“fast” and “slow”) and a control condition; the dependent variable is the number of words recalled in some kind of memory test (the details aren’t that important).

**Descriptives**

Rule #1 in my class is “always plot your data.” I covered this in the first post in the series, so I won’t go into too much detail here. Mostly what I want out are boxplots. EXAMINE handles this in SPSS, but first we’ll label the levels of the “Cond” variable:

`ADD VALUE LABELS Cond 0 "Fast" 1 "Control" 2 "Slow".`

`EXAMINE VARIABLES= Recall BY Cond`

/PLOT BOXPLOT

/COMPARE GROUP

/STATISTICS DESCRIPTIVES.

Which produces this output:

Generating the boxplot in R is pretty straightforward. Again, first we’ll define the factor with labels and generate a box plot:

`CondF = factor(Cond, labels = c(“Fast”, "Control", “Slow”))`

boxplot(Recall ~ CondF, col="lightgray")

And here’s the output:

*Peeve: *Again, I don’t know what R’s deal is with dead whitespace at the top and bottom of the graph.

Regardless, the data look fine and well-behaved, so on with the analysis.

**Basic ANOVA**

Running the default omnibus ANOVA in SPSS is very easy. There are two choices, ONEWAY and UNIANOVA. As the name suggests, ONEWAY will only handle one-way ANOVA. It’s output is a little cleaner than UNIANOVA, but I usually wind up using UNIANOVA because it’s more general. I’ll show both.

`ONEWAY Recall BY Cond`

/STATISTICS DESCRIPTIVES.

You can omit the “/STATISTICS DESCRIPTIVES” if you don’t want to see the first table. UNIANOVA gives you one extra option which is nice, the option to print an effect size measure, partial eta squared:

`UNIANOVA Recall BY Cond`

/PRINT ETASQ.

Produces this output:

We could have asked for descriptives there as well.

As is true of a great many things, R provides many ways to generate a simple oneway ANOVA. Any one of R’s many linear model procedures will work, the most conventional is to just use the “aov()” function and then ask for a table of means and the ANOVA table:

`Anv = aov(Recall ~ CondF)`

model.tables(Anv, "means")

summary(Anv)

If you don’t want the table of means, just skip the model.tables() statement. The listed code produces this output:

Tables of means Grand mean 23.46667 CondF CondF Slow Control Fast 21.2 27.4 21.8

Df Sum Sq Mean Sq F value Pr(>F) CondF 2 233.9 116.93 5.895 0.00751 ** Residuals 27 535.6 19.84 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

*Peeve: *Let me just get this out of the way now: I think the asterisk notation is stupid. It should, at best, be an option, but should not be on by default. People in the R community frequently act as they are enlightened purists (more on this later), so how can they endorse asterisks? I don’t get it.

*Peeve:* Is it really necessary to report the *p*-value to five decimal places? I guess it doesn’t hurt anything, but still.

Regardless, the default R procedure does not report any effect size measures. Computing partial eta squared by hand isn’t difficult, of course, but it’s odd that it isn’t even an option when it is becoming more and more common that journals at least request, and occasionally require, a metric of effect size. (Incidentally, partial eta squared probably isn’t the best one to use, but I’m surprised R reports nothing.) It is possible to get effect size estimates from R using one of the alternative ANOVA procedures; I’ll cover that later.

**Welch Adjustment**

For these particular data the issue of unequal variance isn’t all that huge, but this does come up. As I noted with t-tests, the O’Brien’s R and Levene tests are slightly easier to do in R; SPSS doesn’t even include them as an option. However, sometimes you have unequal variances and it’s nice when you can actually handle it. SPSS provides the Welch adjustment as part of the ONEWAY procedure but not UNIANOVA—why is that? The code looks like this:

`ONEWAY Recall BY Cond`

/STATISTICS WELCH.

And here’s the output:

Note that the regular ANOVA table is printed before this.

In R, the base package does include the adjustment in the “oneway.test()” function:

`oneway.test(Recall ~ CondF)`

Which yields the following:

`One-way analysis of means (not assuming equal variances)`

data: Recall and CondF F = 5.0716, num df = 2.000, denom df = 15.914, p-value = 0.01977

Not much different there.

**Contrasts**

The omnibus ANOVA null hypothesis is pretty vague; essentially, rejection of the null means “at least one cell mean is different than the grand mean.” That’s not very useful in most cases, but far too many people look only at that. I encourage my students to form more specific hypotheses and test those with contrasts. For this particular data set, testing the control vs. the average of the two treatments is sensible—that’s (1 -2 1) and testing the two treatments against each other is sensible—that’s (1 0 -1). In SPSS both ONEWAY and UNIANOVA support contrasts in a way that’s sensible for main effects, though they report slightly different things. Here’s ONEWAY:

`ONEWAY Recall BY Cond`

/CONTRAST= 1 -2 1

/CONTRAST=1 0 -1.

After the usual ANOVA table, it then produces contrast-relevant output:

It’s nice that it includes the unequal variance adjustment, but I wish it reported sums of squares. *Peeve:* UNINOVA reports sums of squares, but of course it doesn’t report the unequal variance adjustment. Why, SPSS, why? Anyway, here’s the UNIANOVA version:

`UNIANOVA Recall BY Cond`

/CONTRAST(cond) = SPECIAL(1 -2 1)

/CONTRAST(cond) = SPECIAL(1 0 -1).

The amount of output produced is copious. After the standard table for the omnibus ANOVA, we get these:

and then:

It’s great that it produces the contrast estimates and the confidence intervals, and of course the sum of squares.

*Peeve: *it never notes in the output what the contrast weights actually were, and there’s no way to supply a label. Again, why not include this?

Now, R. There are, of course, many ways to do contrasts in R. The one that takes the least amount of code is the “fit.contrast()” function in the “gmodels” package. Assuming we still have the aov object (named “Anv” above) floating around and the “gmodels” package loaded up, then the two contrasts can be done like this:

`fit.contrast(Anv, CondF, c(1, -2, 1), df=T)`

fit.contrast(Anv, CondF, c(1, 0, -1), df=T)

And that produces the following output:

Estimate Std. Error t value Pr(>|t|) DF CondF c=( 1 -2 1 ) -11.8 3.44996 -3.42033 0.002003601 27 Estimate Std. Error t value Pr(>|t|) DF CondF c=( 1 0 -1 ) -0.6 1.991835 -0.3012297 0.765547 27

Nice that it has the contrast weight and the contrast estimate in there, but too bad no sums of squares. In general, however, I don’t really like the “fit.contrast()” function very much because it doesn’t work the way I’d want it to for factorial designs. Probably the more conventional way to handle contrasts in R is to directly set the “contrasts” attribute of the factor variable, and then request that the summary table be split by the individual contrasts. It looks like this:

`contrasts(CondF) = cbind(c(1, -2, 1), c(1, 0, -1)))`

CondLab = list("C v T"=1, "Fast v Slow"=2)

AnvL = aov(Recall ~ CondF)

summary(AnvL, split=list(CondF=CondLab))

This produces the following output:

Df Sum Sq Mean Sq F value Pr(>F) CondF 2 233.9 116.93 5.895 0.00751 ** CondF: C v T 1 232.1 232.07 11.699 0.00200 ** CondF: Fast v Slow 1 1.8 1.80 0.091 0.76555 Residuals 27 535.6 19.84 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Very nice, though I’d still like to see the contrast estimates and confidence intervals. Also, R only works if you specify orthogonal contrasts. That is, in principle, sensible, but it does mean more work if you want to test a couple contrasts that happen to not be orthogonal.

**Posthocs**

SPSS supports a huge array of posthocs, many of which are essentially obsolete given more modern techniques, but I don’t think one has ever been taken out of SPSS, so many of them are in there. And, thankfully, they’re built right in to the ANOVA procedures. That is, both ONEWAY and UNIANOVA include an option for running posthoc procedures. As noted, there are many procedures available such as SNK and Tukey, but personally I’m a fan of the Ryan-Einot-Gabriel-Welsch procedure (also called the “Ryan’s Q” procedure):

`ONEWAY Recall BY Cond`

/POSTHOC = QREGW.

or

`UNIANOVA Recall BY Cond`

/POSTHOC = Cond(QREGW).

The two produce almost identical output after the standard ANOVA table, which is a table showing homogenous subsets:

My sense from the R forums and places like StackOverflow is that the R community frowns on posthocs in general. I get that—it’s certainly inferior to having good hypotheses in advance and using contrasts—but just because you don’t personally like something doesn’t mean there aren’t still occasions that it can be useful for someone else. I find the paternalistic attitude taken by people in the R community on subjects like this to be incredibly off-putting. Sometimes there are reasons for doing things in ways that are not preferred. We do not live in an idealized statistical world—sometimes the circumstances of real research make other demands.

Anyway, it is that kind of attitude in the R community that I believe is responsible for R being so bad at posthocs. The only posthoc built in to the base package is Tukey’s HSD. Wow, hey, the 1960s called and they’d like their posthoc back. If you’re willing to put up with Tukey, well, here’s example code and output:

`TukeyHSD(Anv, "CondF")`

That uses the lm object created by “aov()” earlier. The output is very nice, though I can’t imagine why you need the p-value to a whopping *seven* decimal places. Seriously, has anyone ever made a decision that was contingent on that small a probability, ever?

Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Recall ~ CondF) $CondF diff lwr upr p adj Control-Fast 6.2 1.261409 11.1385914 0.0117228 Slow-Fast 0.6 -4.338591 5.5385914 0.9513012 Slow-Control -5.6 -10.538591 -0.6614086 0.0238550

It’s at least easy to read and produces clear confidence intervals. Too bad it’s not as powerful (in the statistical sense of power) as other posthocs.

Now, the Ryan’s Q procedure is available in R in the “mutoss” package, but the output is simply awful. Actually, general use of it isn’t very smooth, either. The code looks like this:

`regwq(Recall ~ CondF, data=synchro, alpha=.05)`

The “data” and “alpha” arguments are mandatory, not optional. It won’t look for variable names in the current name space, only in the provided data frame, and it doesn’t default to an alpha of .05. Seriously? But wait, it gets better! Here’s the output:

#----REGWQ - Ryan / Einot and Gabriel / Welsch test procedure Number of hyp.: 3 Number of rej.: 2 rejected pValues adjPValues 1 3 0.0091 0.0091 2 1 0.0117 0.0117 $adjPValues [1] 0.0117 0.7655 0.0091 Levels: 0.0091 0.0117 0.7655 $rejected [1] TRUE FALSE TRUE $statistic [1] 4.402 0.426 3.976 $confIntervals [,1] [,2] [,3] Control-Slow 6.2 NA NA Fast-Slow 0.6 NA NA Control-Fast 5.6 NA NA $errorControl An object of class "ErrorControl" Slot "type": [1] "FWER" Slot "alpha": [1] 0.05 Slot "k": numeric(0) Slot "q": numeric(0)

*Warning: Rant ahead.* There are so many annoying pieces of this that it’s hard to know where to start. First, there’s a ton of extraneous crap there. Second, where are the confidence intervals? Third, could there be more indirection here in actually reading the printout? Look, are “Control” and “Slow” different? Turns out they are. The *p*-values listed under “$adjPValues” match up with the list of pairs under “$confIntervals.” The first *p*-value is less than .05, so we can reject, then we can look at the second pair, etc. There’s even a “$rejected” line that just gives the binary hypothesis test result, but again you have to match those up with the pairs yourself. OK, fine, the information you need is actually present, but would it have killed anyone in the R world to actually put the relevant info actually proximal with the list of pairs? This is just ludicrous. It’s not really that big a deal with only three pairs, but when you have ten pairs and you care about one of the ones in the middle of the pack, getting it right takes WAY more effort than it should.

This is reflective of a more general problem with R: there are no standards for output quality in R. Like many other open source projects, as long as the person in charge of writing the code for that specific piece can use it, well, that’s good enough. In fact, we have to thank the relevant programmer, because if s/he hadn’t done it, we’d have nothing. This is one of those cases where the adage “you get what you pay for” is right on the mark. I’m sure some R person will get all offended that I’m criticizing and will say (or at least think) “well, if you’re so smart and you can do it so much better, why don’t you go ahead and write it?” That kind attitude, which I’ve seen on R message boards before, completely misses the point. I’m trying to get new graduate students to use this thing. Output like this sends a clear message to them, and that message is “We’re going to make simple things unnecessarily hard. Go use SPSS, we don’t want you.”

Now, for you R fans out there, don’t lose heart. There’s a fairly major bit of stupidity in SPSS coming up in the next episode (the problem is interaction contrasts). R handles this about a thousand times better than SPSS, so payback is coming.

## Translating SPSS to R: t-tests

The first tests my students learn are actually using the binomial, then a permutation test, then testing means using the normal. We don’t use a statistics package for any of those, nor in fact do we use a statistics package for the first few *t*-tests that they do. But this is where things start to pick up, so this is where I try to ease them into doing tests on the computer.

Fortunately, SPSS and R aren’t all that much different when it comes to simple *t*-tests, at least not until you get into more subtle problems. The data set is this one: ttestex.xls.

**Independent Samples**

The basic SPSS code for a between-subjects *t*-test is this:

`TTEST`

/VAR Height

/GROUPS Gender(0,1).

and it produces output that looks like this:

The best thing about SPSS is that it automatically reports both the equal-variance test and the Welch-adjusted test. On the down side, the system is too dumb to realize that there are only two values for Gender and you still have to tell it which two values (zero and 1) you want.

Here’s the equivalent in R, requesting both equal and unequal variance versions:

`t.test(Height~Gender, var.equal=TRUE)`

t.test(Height~Gender, var.equal=FALSE)

and here’s the output:

`Two Sample t-test`

```
```data: Height by Gender

t = -2.5338, df = 19, p-value = 0.02024

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-7.4345615 -0.7082957

sample estimates:

mean in group 0 mean in group 1

64.00000 68.07143

Welch Two Sample t-test

`data: Height by Gender`

t = -2.5186, df = 11.915, p-value = 0.0271

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-7.5963387 -0.5465185

sample estimates:

mean in group 0 mean in group 1

64.00000 68.07143

The issue of unequal variances is something I’ll get back to later.

**Paired Samples**

Within-subjects (or “paired” or “matched” or “dependent” depending on your preference for language) is also pretty simple in both packages. For SPSS we have this:

`TTEST`

/PAIRS Initeff Finaleff.

That little blurb of syntax produces this output:

All pretty reasonable stuff. However, if someone at SPSS could explain to me why the columns here are in a different order than they are for the independent samples output, I’d really love to know. Is there any logic at all behind the change?

Anyway, the equivalent R:

`t.test(Initeff, Finaleff, paired=TRUE)`

And the output:

`Paired t-test`

```
```

`data: Initeff and Finaleff`

t = -13.8507, df = 20, p-value = 1.037e-11

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-8.876083 -6.552488

sample estimates:

mean of the differences

-7.714286

I do like that SPSS defaults to reporting the correlation, though this is pretty easy to extract in R anyway, it’s not a big deal. Note that in both packages, you could also compute a difference score variable and test whether the mean of that is zero. Here’s what that looks like in SPSS:

`COMPUTE Diff = Finaleff - Initeff.`

EXECUTE.

```
```

`TTEST`

/VAR diff

/TESTVAL 0.

I would like to take this moment to ask why the hell in SPSS it’s necessary to put “EXECUTE” after every simple COMPUTE operation. For the love of all that is good in the world, why? Anyway, here’s what you get for output:

All well and good. The equivalent R is of course more compact:

`Diff = Finaleff - Initeff`

t.test(Diff, mu = 0)

Here’s the output produced:

` One Sample t-test`

```
```

`data: Diff`

t = 13.8507, df = 20, p-value = 1.037e-11

alternative hypothesis: true mean is not equal to 0

95 percent confidence interval:

6.552488 8.876083

sample estimates:

mean of x

7.714286

Fantastic, everything matches up. Hey, with only two groups, it can’t be that complicated, can it?

Well, actually, both of these packages could do better. For one thing, neither package reports any measure of effect size, in this context usually **d** is reported. It’s not hard to compute, but why not report it? The other thing is that neither package does a good job with heteroscedasticity.

**Heteroscedasticity**

For the independent samples test, there is the assumption of equal variances to be considered. SPSS provides a Levene test of the equality of the variances in the two groups. In the example above, SPSS reported an F of .004 and a p-value of .953 for the test. To get the same test in R, you need to import the “lawstat” package so you have access to the “levene.test” function. That test actually defaults to a robust Brown-Forsythe Levene-type based on medians, which is not what SPSS uses. To get the same results as SPSS, you need to tell it to use the mean instead of the median. The R code looks like this:

`library(lawstat)`

levene.test(Height, Gender, location="mean")

and the output is this:

` classical Levene's test based on the absolute deviations from the mean ( none not applied because the location is not set to median )`

`data: Height`

Test Statistic = 0.0035, p-value = 0.9533

And that matches the SPSS output. Normally here’s where the R folks could say something snarky like “hah, we support a better test because we can test from the median, which is the default” but I think those folks are wrong anyway. Even the “improved” version of the Levene test is inferior to the O’Brien test for equality of variances (O’Brien, 1981), so that’s what I teach my students. The test involves transforming each data point and then testing the equality of the transformed data; this generally has both better Type 1 and Type 2 error rate than the Levene test (though see Moder, 2007 for a counterexample).

The formula for O’Brien’s r is this:

This, frankly, is a pain in the ass to do in SPSS, because SPSS doesn’t like to compute single variables like the mean or the variance of a variable. For this one I’ll use a different data set, raven.xls. What you have to do in SPSS is use descriptives to get out those values, then set up variables, then conditionally compute things… it’s really stupid. For this data set the dependent variable is called “Raven” and the IV is “Smoke” (smokers vs. non-smokers, I guess). So, first the descriptives:

`MEANS TABLES Raven BY Smoke /CELL MEAN COUNT VARIANCE.`

which produces this:

and those values can then be plugged in:

`COMPUTE m0 = 4.79.`

COMPUTE n0 = 14.

COMPUTE var0 = 10.335.

COMPUTE m1 = 4.86.

COMPUTE n1 = 7.

COMPUTE var1 = 2.476.

COMPUTE absdif = ABS(m0 - raven).

IF (smoke EQ 1) absdif = ABS(m1 - raven).

COMPUTE sqrdif = absdif*absdif.

COMPUTE obr = ((n0 - 1.5)*n0*sqrdif - 0.5*var0*(n0 - 1)) /

((n0 - 1) * (n0 - 2)).

IF smoke = 1

obr = ((n1 - 1.5)*n1*sqrdif - 0.5*var1*(n1 - 1)) /

((n1 - 1) * (n1 - 2)).

EXECUTE.

`T-TEST`

GROUPS=smoke(0 1)

/VARIABLES = obr.

Not exactly pretty, is it? Ultimately you get output that looks like this:

This is much times easier in R. In fact, in R, you could write a function to do this. I didn’t do it that way in an attempt to make this easier for students to understand—I’m not sure that actually helped, though, because I used some group indexing via the [] selector. Here’s the R code I provided:

`## compute vectors for means, ns, and variances`

Means = c(mean(Raven[Smoke == 0]), mean(Raven[Smoke == 1]))

Ns = c(length(Raven[Smoke == 0]), length(Raven[Smoke == 1]))

Vars = c(var(Raven[Smoke == 0]), var(Raven[Smoke == 1]))

```
```## compute absolute and squared difference scores

SqrDif = (Raven - Means[Smoke+1])^2

## compute O'Brien's R

ObR = (((Ns[Smoke+1]-1.5)*Ns[Smoke+1]*SqrDif) -

(0.5*Vars[Smoke+1]*(Ns[Smoke+1]-1))) /

((Ns[Smoke+1]-1)*(Ns[Smoke+1]-2))

`## Tests`

t.test(ObR~Smoke)

And that produces this output:

` Welch Two Sample t-test`

```
```

`data: ObR by Smoke`

t = 3.2414, df = 16.021, p-value = 0.005106

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

2.719638 12.998310

sample estimates:

mean in group 0 mean in group 1

10.33516 2.47619

It doesn’t perfectly agree with the SPSS output because of rounding in the SPSS computations—there should probably have been three decimal places in the means to get them to exactly match. However, the basic upshot is the same; both lead to the conclusion that the two variances are not, in fact, equal.