Last week, we were introduced to single factor analysis of variance (ANOVA). ANOVA can also be applied when there is more than a single treatment, provided that the design is fully crossed. In general, n-factor ANOVA (where n represents the number of treatments) can be relatively complicated in terms of the calculations, and so we are going to restrict ourselves to 2-factor ANOVA with equal sample sizes. Learning the basics of how the analysis works will prepare you to interpret more complicated analyses when you conduct them using an appropriate software package. In some cases, you may see n-factor ANOVA referred to as "n-way ANOVA".

Let's revisit the model that we employed for single-factor ANOVA (model I):

Not surprisingly, the model for 2-factor ANOVA adds variation due to a second treatment effect (β), but it also adds a third variance component, αβ, which is the variation due to interactions between the the first and second treatment variables:

Interactions between and among variables is the exciting part of biology! Main effects just aren't that interesting, but emergent properties, where the whole is more than the sum of the parts, are where the cool stuff happens!Consider the blueberry bushes in my back yard. If you have never been to my back yard, just take it for granted that they are there. If you have been in my back yard, let me know so that we can work out the details of the restraining order...

Sorry...where were we? Right. The blueberry bushes in my back yard are completely surrounded by a subclover intercrop. This has increased their yield, because there are nitrogen-fixing bacteria associated with the clover, and the regular turnover of the clover puts nitrogen into the soil. You likely would not be surprised if I were to tell you that simply adding fertilizer to the soil, without planting any clover, would also increase the yield. Following the notion of "more is better" (which I think is located on the Y-chromosome), one might assume that if I were to add fertilizer to my existing clover intercrop, my blueberry yield would go up even more. In other words, the benefits of the nitrogen additions would be additive (the effects sum together). But...it wouldn't. In fact, it would decrease by quite a bit. The effects of the fertilizer and the clover are not additive. There is an interesting interaction between the two variables, such that the overall effect cannot be predicted from knowledge of the separate effects. It turns out that the nitrogen-fixing bacteria won't fix nitrogen in the presence of nitrates or ammonia, and so the clover competes with the blueberries for the nitrogen fertilizer, and does a pretty good job of it because its roots are closer to the surface where the fertilizer was applied. Clearly a much more interesting dynamic than nitrogen + nitrogen = larger yield.

It is easiest to interpret interactions among variables graphically, and it is the only use I have ever found for the "line graph" option in Excel. Let's consider a seed predation experiment testing the effects of ant presence and mouse presence on the growth of shrubs. In a fully crossed design, 40 plots are selected with the same density of shrubs. Ten plots are fenced with hardware cloth to exclude mice, 10 plots are baited with poison to kill ants, 10 plots have both hardware cloth and bait to exclude both mice and ants, and 10 plots are untreated, allowing access by both mice and ants.

Let's look first at the effect of ants, by comparing the fenced plots that were unbaited and baited:

We can see that the presence of ants reduced the shrub density. If we look at the effects of mice, we see a different picture:

Shrub density increased slightly with mice present. If the effects of mice and ants were additive, then we should see a change that is the sum of these two effects, which would look like this:

**Question 1: Explain why with additive effects, such as those shown on the above graph,
the slopes of the lines are the same, and only the intercepts differ.**

The actual data, shown below, demonstrate that the effects of ants and mice are not additive. The effect of one changes depending on the presence or absence of the other effect. This reveals that something quite interesting and unexpected is going on:

This figure ("Chart1") and the associated data ("miceant") are included in this week's Excel workbook (which you can download HERE) so that you can see how to set up a line graph like the one displayed above in order to examine significant interactions.

With 2-factor ANOVA, we are testing several null hypotheses. We test the null hypotheses that there are no significant treatment (or random) effects, and we also test the null hypothesis that none of the effects interact. The tests that we apply to each of these varies depending upon whether the effects are all fixed-effects (model I ANOVA), all random-effects (model II ANOVA), or one fixed-effect and one random-effect (mixed-model ANOVA).

We distinguished between fixed and random
effects earlier this semester (and intentionally avoided them last week), using some fairly rigid criteria. In practice,
very few people employ model II or mixed-model ANOVA, because they use less stringent criteria for determining the
difference between fixed effects and random effects. In section 10.1(f) of your textbook (page 199 of the 5th edition) the
distinction is simplified. If a treatment was selected in a non-random fashion, it is a fixed-effect. The example used in
your textbook concerns animal feeds, and contends that unless the feeds were selected at random from a catalog, then
they should be considered as fixed-effects. I would argue that it is more important to consider how they *were*
chosen than how they were not chosen. If they were chosen based on a certain ingredient, or distribution
of ingredients, then I would agree that this is a fixed-effect. If however, they were chosen because they were the only feeds
that Tractor Supply carried, or because they were the least expensive, then I would argue that the criteria for selection
were unrelated to the question being addressed, and could be considered random effects.

For field studies, this becomes quite important, because sampling sites are often selected for biological as well as practical reasons. If you can find your study organism close to a road, why walk further afield? The reasons for selecting a site (or any other treatment) must be carefully weighed in order to determine whether one is dealing with a fixed-effect or a random-effect.

Another way of considering the designation of a particular effect as random is based upon the parameter of interest for a particular effect. If you are interested in the magnitude of a particular effect, then it should be treated as a fixed effect. If, on the other hand, you are interested in the amount of variation contributed by an effect, then it should be treated as a random effect. Let's reconsider a question posed earlier this semester about whether a river impoundment should be considered as a fixed or a random effect. If the question was about upstream versus downstream differences, and some of the impoundments in your survey were top release and some were bottom release dams, you would want to include "type of impoundment" as a random effect in your analysis. On the other hand, if you were interested in how impoundment type influences upstream versus downstream differences, and had intentionally selected an equivalent number of each type of dam to include in your dam investigation, then you would want to treat dam type as a fixed effect in your dam analysis.

We will not belabor this point further. The means by which we approach model II and mixed-model analyses will be covered, but we will assume that our intererst is in the magnitude of the effects considered, and treat all of the effects in the exercises as fixed-effects, i.e., we will be applying model I ANOVA.

As mentioned previously, in conducting a 2-factor ANOVA, we are testing several null hypotheses. Using our ant and mouse experiment as an example, we are testing the null hypothesis that the sample means from the plots with fences estimate the same population mean as plots without fences. Note that these could be separate population means for the baited and unbaited plots if there is a significant ant effect. Put another way, we are testing the null hypothesis that there is no effect of mice (α = 0). We also are testing the null hypothesis that sample means from baited plots and unbaited plots estimate the same population mean. Again, there could be 2 separate population means if there is a significant effect of mice. Similar to the null hypothesis for the mouse effect, we could express this as "no ant effect" (β = 0). The third null hypothesis being tested is that there is no significant interaction between mice and ants (αβ = 0), i.e., whatever effects are present are additive.

The figure shown above suggests that the latter null hypothesis likely will be rejected and the analysis
bears this out. This brings up an important point: **when the null hypothesis for the interaction is rejected, the
main effects cannot be interpreted.** In other words, a significant, or nonsignificant F-value (n-factor ANOVA uses the
same theoretical distribution as single-factor ANOVA) for any of the main effects has no meaning if there is a significant
interaction. The reason for this is that the interaction masks the main effects (sorry, but this is not the answer to question
3, which asks you to explain *why* the main effects are masked). Thus, when computing a 2-factor ANOVA
by hand, it would be prudent to first test the null hypothesis of no interaction. If that null hypothesis is rejected, then there
is little value in proceeding with testing the main effects, because those results cannot be properly interpreted.
Unfortunately, the calculations that we will employ require calculation of the main effects sums of squares in order to
determine the appropriate sum of squares for testing the interaction.

Because the calculations for 2-factor ANOVA are cumbersome, we will work through Example 12.1 (page 251 of the 5th edition). The data for this example, and the calculations can be found in the Excel worksheet titled...wait for it..."Example 12.1" in this week's Excel workbook. If you neglected to download it earlier, you can still find it HERE.

Your textbook refers to each of the separate treatment groups as "cells" and, for once, we will stick with their convention, although I would prefer to refer to them as "subgroups". On the Excel worksheet you can see that I have calculated the sample sizes for each cell in row 9, the cell totals in row 10, and the cell means (using the =AVERAGE() function) in row 10. In cell H10, the grand mean is calculated. Look at the formula to make certain that you understand what was done.

The next step will be to calculate the total sum of squares (*SS _{total}*), which is simply:

The next step is to calculate the sum of squares for the cells (*SS _{cells}*):

This simply subtracts the grand mean from each of the cell means, squares the difference, and sums all the squared
deviates together. This sum is then multiplied by the sample size of the groups. Obviously, this only works with equal
sample sizes. The degrees of freedom for *SS _{cells}* is determined by subtracting 1 from the
product of the number of levels for factor A (2) and the number of levels for factor B (2). Thus:

The squared deviates for *SS _{cells}* are in cells B22 to E22, and

The calculation for within-subgroups (I know...within-cells) sum of squares (*SS _{within}*) is really no
different from how it was calculated for single factor ANOVA. The squared deviates are derived from the difference
between the observation and its cell mean:

The cell sums of squares are in cells B25 to E25, and *SS _{within}* is calculated in cell H25.
The degrees of freedom are calculated as:

**Question 2: Why was the within-group sum of squares for females with no hormone treatment calculated in the Excel
example as (=4*VAR.S(B3:B7))?**

If you cannot answer Question 2 correctly, you will have to calculate *SS _{within}* the long way, or
get dinged twice for applying methods that you do not understand, because you will get it wrong on some of this week's data sets.
I'll give you a hint. The four is not the number of groups...

Note that the adding *SS _{within}* and

Where the mean squares (*MS*) are derived by dividing the *SS* by their respective degrees of freedom as
we did for single-factor ANOVA. This would estimate individual variation (ε) in the denominator, and individual
variation plus all treatment variance and variance due to interactions (ε + α + β + αβ)
in the numerator. Clearly, in order to isolate
the separate treatment effects and interaction effects, we need to partition the *SS _{cells}*.

To look at the main effects, we treat each effect ** as though it were the only effect**. In other words, we calculate the
mean of all of the observations (10) for the no hormone groups, and all the observations (10) for the hormone groups,
regardless of whether the observation belonged to a male or female. These means are used to calculate among-group sum of
squares in the same way they were calculated for a single factor ANOVA:

If the effect of the
second factor (in this case sex) is additive, it should influence both the no hormone and hormone
treatments equally, and the estimate of the effect of hormone treatment should be unbiased, because we are estimating a variance.
If, however, sex and hormone
treatment interact in their effects, this approach will produce a worthless estimate of the effect of hormone
treatment. **Note that the number of groups for treatment B is part of the calculation for the sum of squares for treatment A!**
The opposite is true
for calculating SS_{B}. This will be important to remember when answering question 4.

**Question 3: Explain why the F-values for main effects must be ignored if the F-value for the interaction is
significant? (Hint: it relates to treating each effect as though it were the only effect)**

For Example 12.1, a sample mean is calculated for the no hormone and hormone treatments (factor A) by pooling the
male and female data, the grand mean is subtracted from those sample means, the difference squared, and the squared
deviates summed together. This sum, like that for *SS _{among}* in single-factor ANOVA, must be weighted
by the number of observations, and so it is multiplied by

**Question 4: Explain how the equation in cell J4 would differ if there were 3 treatment levels for factor A.**

Here's a hint: it has to do with using the VAR.S() function to get sum of squares...which you already should have figured out from question 2...

The sum of squares for factor B (sex) follows the same principles as that for factor A, calculating means for male and female samples by pooling the hormone treatments:

The pooled means are calculated in cells N2 and O2 (look closely at the formula to see how the separated columns are included
in the average...see the comma?)
and *SS _{B}* is calculated in N4. The degrees of
freedom for

If the effects of the 2 factors, in this case, hormone treatment and sex, are additive (i.e., αβ = 0), then
*SS _{A}* and

The interaction sum of squares is calculated in cell L4, and the degrees of freedom are the product of the degrees of freedom for factor A and the degrees of freedom for factor B:

With all of the sums of squares calculated, the mean squares can be determined by dividing the appropriate sum of squares by its respective degrees of freedom, and the ANOVA table can be generated:

As you can see (or will see if you run the numbers), the variance ratios (*F _{s}*) values for
all 3 null hypotheses are calculated with
the mean squares for the null hypothesis (

To find the probabilities associated with the generated values of *F _{s}* for each test (we are using the tables,
so we only are
interested in whether

When the interaction term is not significant (i.e., αβ = 0), as is the case with our example, then
*MS _{AxB}*, which estimates αβ + ε, should (in theory) be estimating the same thing as

For the remaining data sets in your worksheet, you will conduct a 2-factor ANOVA in the same fashion that was
just outlined, and present your results as ANOVA tables. The assumptions of 2-factor ANOVA are the same as for
single-factor ANOVA, and you should proceed as though all of the assumptions are met, and as though model I
ANOVA was the appropriate choice. **Remember that, if there is a significant interaction term, you will have to
present the graph showing the interaction!** Below, for your
convenience, is a table summarizing the calculations for each component of the table for a model I ANOVA:

The worksheet titled "brown" contains data from an experiment on the foraging
behavior of brown snakes (*Boiga irregularis*) during the wet and dry seasons in Guam.
In each season (wet and dry)
traps were baited with live or dead prey to determine whether the season influenced the importance of
olfactory and visual cues. The observations are number of captures per trap night.

**Question 5: Present the ANOVA table for the analysis of the brown snake data, and include a graph of
the interactions if there is a significant interaction term.**

The response of mesocosm zooplankton communities to phosphorus addition were studied in cattle tanks. Half of the tanks contained fish, and the other half did not contain fish. The data for rotifer abundances is presented in the worksheet titled "rotifer".

**Question 6: Present the ANOVA table for the analysis of the rotifer data, and include a graph of
the interactions if there is a significant interaction term.**

As part of the same mesocosm experiment described above, the abundance of cladocerans also was examined. The data are contained in the worksheet titled "cladoceran".

**Question 7: Present the ANOVA table for the analysis of the cladoceran data, and include a graph of
the interactions if there is a significant interaction term.**

**Question 8: Choose one of the last 3 datasets, and provide an interpretation of the analysis, including a
description of the significant effects, and a biological explanation for the observed results.**

Although sound experimental design requires equal sample sizes, it often is the case that we have to deal with analysing data sets where the sample sizes are not equal. The concepts are the same, but the math is a little more difficult. Because I do not feel that there is anything to be gained conceptually by having you go through the calculations, I will simply point out that the formulas can be found in your book in section 12.2, and make use of "machine formulas". These formulas are designed to save on computing power, and so while they are convenient, the underlying concepts are not evident in their formulation.

Multiple comparisons can be applied in the event of a significant result in the same manner as we performed them last week. I will assume that you got the general concept last week, and point out only that if there is a significant interaction, the only valid comparisons that should be done are among the cell (subgroup) means.

The last thing we need to address this week is what to do if the data do not meet the assumptions of ANOVA? Your textbook describes one option in section 12.7 that remains the least problematic alternative. Last week's trick of performing the parametric test on ranked data, although widely employed, seems to be a poor alternative for 2-factor ANOVA.

As always, save your Excel workbook and Word document as *yourlastname*ex9 and submit them to me via Blackboard.

__Disclaimer__: Not surprisingly, no mice, ants, snakes, rotifers or cladocerans were harmed, inconvenienced,
or even observed in the production of these data sets. While the means and estimates of the error are based on real
scientific pursuits, the data were generated by models (the R programs can be viewed HERE) in
order to protect the innocent.

Send comments, suggestions, and corrections to: **Derek Zelmer**