## Translating SPSS to R: Simple ANOVA

This is where things start to get more interesting. The ANOVA is one of the older tools in the experimental psychology toolbox, and as a result there is quite a lot of depth here. I’m going to cover ANOVA in multiple pieces. Today’s entry is the simplest case: between-subjects one-way ANOVA. I will discuss the basic procedures, unequal variance adjustment, contrasts, and posthocs.

Later entries will cover balanced factorial between-subjects ANOVA, unbalanced factorial between-subjects ANOVA, simple repeated-measures ANOVA, factorial repeated-measures ANOVA, and ultimately factorial mixed (that is, between- and within-subjects) ANOVA. One step at a time, though.

The simplest complication from a standard two-groups between-subjects *t*-test is a oneway ANOVA with three groups, so this is where I take my class. The example data set (synchro.xls) has two treatment conditions (“fast” and “slow”) and a control condition; the dependent variable is the number of words recalled in some kind of memory test (the details aren’t that important).

**Descriptives**

Rule #1 in my class is “always plot your data.” I covered this in the first post in the series, so I won’t go into too much detail here. Mostly what I want out are boxplots. EXAMINE handles this in SPSS, but first we’ll label the levels of the “Cond” variable:

`ADD VALUE LABELS Cond 0 "Fast" 1 "Control" 2 "Slow".`

`EXAMINE VARIABLES= Recall BY Cond`

/PLOT BOXPLOT

/COMPARE GROUP

/STATISTICS DESCRIPTIVES.

Which produces this output:

Generating the boxplot in R is pretty straightforward. Again, first we’ll define the factor with labels and generate a box plot:

`CondF = factor(Cond, labels = c(“Fast”, "Control", “Slow”))`

boxplot(Recall ~ CondF, col="lightgray")

And here’s the output:

*Peeve: *Again, I don’t know what R’s deal is with dead whitespace at the top and bottom of the graph.

Regardless, the data look fine and well-behaved, so on with the analysis.

**Basic ANOVA**

Running the default omnibus ANOVA in SPSS is very easy. There are two choices, ONEWAY and UNIANOVA. As the name suggests, ONEWAY will only handle one-way ANOVA. It’s output is a little cleaner than UNIANOVA, but I usually wind up using UNIANOVA because it’s more general. I’ll show both.

`ONEWAY Recall BY Cond`

/STATISTICS DESCRIPTIVES.

You can omit the “/STATISTICS DESCRIPTIVES” if you don’t want to see the first table. UNIANOVA gives you one extra option which is nice, the option to print an effect size measure, partial eta squared:

`UNIANOVA Recall BY Cond`

/PRINT ETASQ.

Produces this output:

We could have asked for descriptives there as well.

As is true of a great many things, R provides many ways to generate a simple oneway ANOVA. Any one of R’s many linear model procedures will work, the most conventional is to just use the “aov()” function and then ask for a table of means and the ANOVA table:

`Anv = aov(Recall ~ CondF)`

model.tables(Anv, "means")

summary(Anv)

If you don’t want the table of means, just skip the model.tables() statement. The listed code produces this output:

Tables of means Grand mean 23.46667 CondF CondF Slow Control Fast 21.2 27.4 21.8

Df Sum Sq Mean Sq F value Pr(>F) CondF 2 233.9 116.93 5.895 0.00751 ** Residuals 27 535.6 19.84 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

*Peeve: *Let me just get this out of the way now: I think the asterisk notation is stupid. It should, at best, be an option, but should not be on by default. People in the R community frequently act as they are enlightened purists (more on this later), so how can they endorse asterisks? I don’t get it.

*Peeve:* Is it really necessary to report the *p*-value to five decimal places? I guess it doesn’t hurt anything, but still.

Regardless, the default R procedure does not report any effect size measures. Computing partial eta squared by hand isn’t difficult, of course, but it’s odd that it isn’t even an option when it is becoming more and more common that journals at least request, and occasionally require, a metric of effect size. (Incidentally, partial eta squared probably isn’t the best one to use, but I’m surprised R reports nothing.) It is possible to get effect size estimates from R using one of the alternative ANOVA procedures; I’ll cover that later.

**Welch Adjustment**

For these particular data the issue of unequal variance isn’t all that huge, but this does come up. As I noted with t-tests, the O’Brien’s R and Levene tests are slightly easier to do in R; SPSS doesn’t even include them as an option. However, sometimes you have unequal variances and it’s nice when you can actually handle it. SPSS provides the Welch adjustment as part of the ONEWAY procedure but not UNIANOVA—why is that? The code looks like this:

`ONEWAY Recall BY Cond`

/STATISTICS WELCH.

And here’s the output:

Note that the regular ANOVA table is printed before this.

In R, the base package does include the adjustment in the “oneway.test()” function:

`oneway.test(Recall ~ CondF)`

Which yields the following:

`One-way analysis of means (not assuming equal variances)`

data: Recall and CondF F = 5.0716, num df = 2.000, denom df = 15.914, p-value = 0.01977

Not much different there.

**Contrasts**

The omnibus ANOVA null hypothesis is pretty vague; essentially, rejection of the null means “at least one cell mean is different than the grand mean.” That’s not very useful in most cases, but far too many people look only at that. I encourage my students to form more specific hypotheses and test those with contrasts. For this particular data set, testing the control vs. the average of the two treatments is sensible—that’s (1 -2 1) and testing the two treatments against each other is sensible—that’s (1 0 -1). In SPSS both ONEWAY and UNIANOVA support contrasts in a way that’s sensible for main effects, though they report slightly different things. Here’s ONEWAY:

`ONEWAY Recall BY Cond`

/CONTRAST= 1 -2 1

/CONTRAST=1 0 -1.

After the usual ANOVA table, it then produces contrast-relevant output:

It’s nice that it includes the unequal variance adjustment, but I wish it reported sums of squares. *Peeve:* UNINOVA reports sums of squares, but of course it doesn’t report the unequal variance adjustment. Why, SPSS, why? Anyway, here’s the UNIANOVA version:

`UNIANOVA Recall BY Cond`

/CONTRAST(cond) = SPECIAL(1 -2 1)

/CONTRAST(cond) = SPECIAL(1 0 -1).

The amount of output produced is copious. After the standard table for the omnibus ANOVA, we get these:

and then:

It’s great that it produces the contrast estimates and the confidence intervals, and of course the sum of squares.

*Peeve: *it never notes in the output what the contrast weights actually were, and there’s no way to supply a label. Again, why not include this?

Now, R. There are, of course, many ways to do contrasts in R. The one that takes the least amount of code is the “fit.contrast()” function in the “gmodels” package. Assuming we still have the aov object (named “Anv” above) floating around and the “gmodels” package loaded up, then the two contrasts can be done like this:

`fit.contrast(Anv, CondF, c(1, -2, 1), df=T)`

fit.contrast(Anv, CondF, c(1, 0, -1), df=T)

And that produces the following output:

Estimate Std. Error t value Pr(>|t|) DF CondF c=( 1 -2 1 ) -11.8 3.44996 -3.42033 0.002003601 27 Estimate Std. Error t value Pr(>|t|) DF CondF c=( 1 0 -1 ) -0.6 1.991835 -0.3012297 0.765547 27

Nice that it has the contrast weight and the contrast estimate in there, but too bad no sums of squares. In general, however, I don’t really like the “fit.contrast()” function very much because it doesn’t work the way I’d want it to for factorial designs. Probably the more conventional way to handle contrasts in R is to directly set the “contrasts” attribute of the factor variable, and then request that the summary table be split by the individual contrasts. It looks like this:

`contrasts(CondF) = cbind(c(1, -2, 1), c(1, 0, -1)))`

CondLab = list("C v T"=1, "Fast v Slow"=2)

AnvL = aov(Recall ~ CondF)

summary(AnvL, split=list(CondF=CondLab))

This produces the following output:

Df Sum Sq Mean Sq F value Pr(>F) CondF 2 233.9 116.93 5.895 0.00751 ** CondF: C v T 1 232.1 232.07 11.699 0.00200 ** CondF: Fast v Slow 1 1.8 1.80 0.091 0.76555 Residuals 27 535.6 19.84 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Very nice, though I’d still like to see the contrast estimates and confidence intervals. Also, R only works if you specify orthogonal contrasts. That is, in principle, sensible, but it does mean more work if you want to test a couple contrasts that happen to not be orthogonal.

**Posthocs**

SPSS supports a huge array of posthocs, many of which are essentially obsolete given more modern techniques, but I don’t think one has ever been taken out of SPSS, so many of them are in there. And, thankfully, they’re built right in to the ANOVA procedures. That is, both ONEWAY and UNIANOVA include an option for running posthoc procedures. As noted, there are many procedures available such as SNK and Tukey, but personally I’m a fan of the Ryan-Einot-Gabriel-Welsch procedure (also called the “Ryan’s Q” procedure):

`ONEWAY Recall BY Cond`

/POSTHOC = QREGW.

or

`UNIANOVA Recall BY Cond`

/POSTHOC = Cond(QREGW).

The two produce almost identical output after the standard ANOVA table, which is a table showing homogenous subsets:

My sense from the R forums and places like StackOverflow is that the R community frowns on posthocs in general. I get that—it’s certainly inferior to having good hypotheses in advance and using contrasts—but just because you don’t personally like something doesn’t mean there aren’t still occasions that it can be useful for someone else. I find the paternalistic attitude taken by people in the R community on subjects like this to be incredibly off-putting. Sometimes there are reasons for doing things in ways that are not preferred. We do not live in an idealized statistical world—sometimes the circumstances of real research make other demands.

Anyway, it is that kind of attitude in the R community that I believe is responsible for R being so bad at posthocs. The only posthoc built in to the base package is Tukey’s HSD. Wow, hey, the 1960s called and they’d like their posthoc back. If you’re willing to put up with Tukey, well, here’s example code and output:

`TukeyHSD(Anv, "CondF")`

That uses the lm object created by “aov()” earlier. The output is very nice, though I can’t imagine why you need the p-value to a whopping *seven* decimal places. Seriously, has anyone ever made a decision that was contingent on that small a probability, ever?

Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Recall ~ CondF) $CondF diff lwr upr p adj Control-Fast 6.2 1.261409 11.1385914 0.0117228 Slow-Fast 0.6 -4.338591 5.5385914 0.9513012 Slow-Control -5.6 -10.538591 -0.6614086 0.0238550

It’s at least easy to read and produces clear confidence intervals. Too bad it’s not as powerful (in the statistical sense of power) as other posthocs.

Now, the Ryan’s Q procedure is available in R in the “mutoss” package, but the output is simply awful. Actually, general use of it isn’t very smooth, either. The code looks like this:

`regwq(Recall ~ CondF, data=synchro, alpha=.05)`

The “data” and “alpha” arguments are mandatory, not optional. It won’t look for variable names in the current name space, only in the provided data frame, and it doesn’t default to an alpha of .05. Seriously? But wait, it gets better! Here’s the output:

#----REGWQ - Ryan / Einot and Gabriel / Welsch test procedure Number of hyp.: 3 Number of rej.: 2 rejected pValues adjPValues 1 3 0.0091 0.0091 2 1 0.0117 0.0117 $adjPValues [1] 0.0117 0.7655 0.0091 Levels: 0.0091 0.0117 0.7655 $rejected [1] TRUE FALSE TRUE $statistic [1] 4.402 0.426 3.976 $confIntervals [,1] [,2] [,3] Control-Slow 6.2 NA NA Fast-Slow 0.6 NA NA Control-Fast 5.6 NA NA $errorControl An object of class "ErrorControl" Slot "type": [1] "FWER" Slot "alpha": [1] 0.05 Slot "k": numeric(0) Slot "q": numeric(0)

*Warning: Rant ahead.* There are so many annoying pieces of this that it’s hard to know where to start. First, there’s a ton of extraneous crap there. Second, where are the confidence intervals? Third, could there be more indirection here in actually reading the printout? Look, are “Control” and “Slow” different? Turns out they are. The *p*-values listed under “$adjPValues” match up with the list of pairs under “$confIntervals.” The first *p*-value is less than .05, so we can reject, then we can look at the second pair, etc. There’s even a “$rejected” line that just gives the binary hypothesis test result, but again you have to match those up with the pairs yourself. OK, fine, the information you need is actually present, but would it have killed anyone in the R world to actually put the relevant info actually proximal with the list of pairs? This is just ludicrous. It’s not really that big a deal with only three pairs, but when you have ten pairs and you care about one of the ones in the middle of the pack, getting it right takes WAY more effort than it should.

This is reflective of a more general problem with R: there are no standards for output quality in R. Like many other open source projects, as long as the person in charge of writing the code for that specific piece can use it, well, that’s good enough. In fact, we have to thank the relevant programmer, because if s/he hadn’t done it, we’d have nothing. This is one of those cases where the adage “you get what you pay for” is right on the mark. I’m sure some R person will get all offended that I’m criticizing and will say (or at least think) “well, if you’re so smart and you can do it so much better, why don’t you go ahead and write it?” That kind attitude, which I’ve seen on R message boards before, completely misses the point. I’m trying to get new graduate students to use this thing. Output like this sends a clear message to them, and that message is “We’re going to make simple things unnecessarily hard. Go use SPSS, we don’t want you.”

Now, for you R fans out there, don’t lose heart. There’s a fairly major bit of stupidity in SPSS coming up in the next episode (the problem is interaction contrasts). R handles this about a thousand times better than SPSS, so payback is coming.