Two Factor Analysis of Variance

(Chapter 12 in Zar, 2010)

Last week, we were introduced to single factor analysis of variance (ANOVA). ANOVA can also be applied when there is more than a single treatment, provided that the design is fully crossed. In general, n-factor ANOVA (where n represents the number of treatments) can be relatively complicated in terms of the calculations, and so we are going to restrict ourselves to 2-factor ANOVA with equal sample sizes. Learning the basics of how the analysis works will prepare you to interpret more complicated analyses when you conduct them using an appropriate software package. In some cases, you may see n-factor ANOVA referred to as "n-way ANOVA".

Let's revisit the model that we employed for single-factor ANOVA (model I):

Not surprisingly, the model for 2-factor ANOVA adds variation due to a second treatment effect (β), but it also adds a third variance component, αβ, which is the variation due to interactions between the the first and second treatment variables:

Interactions between and among variables is the exciting part of biology! Main effects just aren't that interesting, but emergent properties, where the whole is more than the sum of the parts, are where the cool stuff happens!

Consider the blueberry bushes in my back yard. If you have never been to my back yard, just take it for granted that they are there. If you have been in my back yard, let me know so that we can work out the details of the restraining order...

Sorry...where were we? Right. The blueberry bushes in my back yard are completely surrounded by a subclover intercrop. This has increased their yield, because there are nitrogen-fixing bacteria associated with the clover, and the regular turnover of the clover puts nitrogen into the soil. You likely would not be surprised if I were to tell you that simply adding fertilizer to the soil, without planting any clover, would also increase the yield. Following the notion of "more is better" (which I think is located on the Y-chromosome), one might assume that if I were to add fertilizer to my existing clover intercrop, my blueberry yield would go up even more. In other words, the benefits of the nitrogen additions would be additive (the effects sum together). But...it wouldn't. In fact, it would decrease by quite a bit. The effects of the fertilizer and the clover are not additive. There is an interesting interaction between the two variables, such that the overall effect cannot be predicted from knowledge of the separate effects. It turns out that the nitrogen-fixing bacteria won't fix nitrogen in the presence of nitrates or ammonia, and so the clover competes with the blueberries for the nitrogen fertilizer, and does a pretty good job of it because its roots are closer to the surface where the fertilizer was applied. Clearly a much more interesting dynamic than nitrogen + nitrogen = larger yield.

It is easiest to interpret interactions among variables graphically, and it is the only use I have ever found for the "line graph" option in Excel. Let's consider a seed predation experiment testing the effects of ant presence and mouse presence on the growth of shrubs. In a fully crossed design, 40 plots are selected with the same density of shrubs. Ten plots are fenced with hardware cloth to exclude mice, 10 plots are baited with poison to kill ants, 10 plots have both hardware cloth and bait to exclude both mice and ants, and 10 plots are untreated, allowing access by both mice and ants.

Let's look first at the effect of ants, by comparing the fenced plots that were unbaited and baited:

We can see that the presence of ants reduced the shrub density. If we look at the effects of mice, we see a different picture:

Shrub density increased slightly with mice present. If the effects of mice and ants were additive, then we should see a change that is the sum of these two effects, which would look like this:

Question 1: Explain why with additive effects, such as those shown on the above graph, the slopes of the lines are the same, and only the intercepts differ.

The actual data, shown below, demonstrate that the effects of ants and mice are not additive. The effect of one changes depending on the presence or absence of the other effect. This reveals that something quite interesting and unexpected is going on:

This figure ("Chart1") and the associated data ("miceant") are included in this week's Excel workbook (which you can download HERE) so that you can see how to set up a line graph like the one displayed above in order to examine significant interactions.

With 2-factor ANOVA, we are testing several null hypotheses. We test the null hypotheses that there are no significant treatment (or random) effects, and we also test the null hypothesis that none of the effects interact. The tests that we apply to each of these varies depending upon whether the effects are all fixed-effects (model I ANOVA), all random-effects (model II ANOVA), or one fixed-effect and one random-effect (mixed-model ANOVA).

We distinguished between fixed and random effects earlier this semester (and intentionally avoided them last week), using some fairly rigid criteria. In practice, very few people employ model II or mixed-model ANOVA, because they use less stringent criteria for determining the difference between fixed effects and random effects. In section 10.1(f) of your textbook (page 199 of the 5th edition) the distinction is simplified. If a treatment was selected in a non-random fashion, it is a fixed-effect. The example used in your textbook concerns animal feeds, and contends that unless the feeds were selected at random from a catalog, then they should be considered as fixed-effects. I would argue that it is more important to consider how they were chosen than how they were not chosen. If they were chosen based on a certain ingredient, or distribution of ingredients, then I would agree that this is a fixed-effect. If however, they were chosen because they were the only feeds that Tractor Supply carried, or because they were the least expensive, then I would argue that the criteria for selection were unrelated to the question being addressed, and could be considered random effects.

For field studies, this becomes quite important, because sampling sites are often selected for biological as well as practical reasons. If you can find your study organism close to a road, why walk further afield? The reasons for selecting a site (or any other treatment) must be carefully weighed in order to determine whether one is dealing with a fixed-effect or a random-effect.

Another way of considering the designation of a particular effect as random is based upon the parameter of interest for a particular effect. If you are interested in the magnitude of a particular effect, then it should be treated as a fixed effect. If, on the other hand, you are interested in the amount of variation contributed by an effect, then it should be treated as a random effect. Let's reconsider a question posed earlier this semester about whether a river impoundment should be considered as a fixed or a random effect. If the question was about upstream versus downstream differences, and some of the impoundments in your survey were top release and some were bottom release dams, you would want to include "type of impoundment" as a random effect in your analysis. On the other hand, if you were interested in how impoundment type influences upstream versus downstream differences, and had intentionally selected an equivalent number of each type of dam to include in your dam investigation, then you would want to treat dam type as a fixed effect in your dam analysis.

We will not belabor this point further. The means by which we approach model II and mixed-model analyses will be covered, but we will assume that our intererst is in the magnitude of the effects considered, and treat all of the effects in the exercises as fixed-effects, i.e., we will be applying model I ANOVA.

As mentioned previously, in conducting a 2-factor ANOVA, we are testing several null hypotheses. Using our ant and mouse experiment as an example, we are testing the null hypothesis that the sample means from the plots with fences estimate the same population mean as plots without fences. Note that these could be separate population means for the baited and unbaited plots if there is a significant ant effect. Put another way, we are testing the null hypothesis that there is no effect of mice (α = 0). We also are testing the null hypothesis that sample means from baited plots and unbaited plots estimate the same population mean. Again, there could be 2 separate population means if there is a significant effect of mice. Similar to the null hypothesis for the mouse effect, we could express this as "no ant effect" (β = 0). The third null hypothesis being tested is that there is no significant interaction between mice and ants (αβ = 0), i.e., whatever effects are present are additive.

The figure shown above suggests that the latter null hypothesis likely will be rejected and the analysis bears this out. This brings up an important point: when the null hypothesis for the interaction is rejected, the main effects cannot be interpreted. In other words, a significant, or nonsignificant F-value (n-factor ANOVA uses the same theoretical distribution as single-factor ANOVA) for any of the main effects has no meaning if there is a significant interaction. The reason for this is that the interaction masks the main effects (sorry, but this is not the answer to question 3, which asks you to explain why the main effects are masked). Thus, when computing a 2-factor ANOVA by hand, it would be prudent to first test the null hypothesis of no interaction. If that null hypothesis is rejected, then there is little value in proceeding with testing the main effects, because those results cannot be properly interpreted. Unfortunately, the calculations that we will employ require calculation of the main effects sums of squares in order to determine the appropriate sum of squares for testing the interaction.

Because the calculations for 2-factor ANOVA are cumbersome, we will work through Example 12.1 (page 251 of the 5th edition). The data for this example, and the calculations can be found in the Excel worksheet titled...wait for it..."Example 12.1" in this week's Excel workbook. If you neglected to download it earlier, you can still find it HERE.

Your textbook refers to each of the separate treatment groups as "cells" and, for once, we will stick with their convention, although I would prefer to refer to them as "subgroups". On the Excel worksheet you can see that I have calculated the sample sizes for each cell in row 9, the cell totals in row 10, and the cell means (using the =AVERAGE() function) in row 10. In cell H10, the grand mean is calculated. Look at the formula to make certain that you understand what was done.

The next step will be to calculate the total sum of squares (SStotal), which is simply:

Where a is the number of levels of factor A (hormone treatment), b is the number of levels of the second factor (sex), and n is the sample size for each group, which will be equal in all of our examples (as it should be for good designs). Basically, the grand mean is subtracted from every observation in the data set, the differences are squared, and then summed across all observations. These squared deviations are calculated in cells B13 to E17 (note the anchors), and SStotal is calculated in cell H19 (an alternative calculation that saves some time is in cell I19). Look at the formulas in order to understand the calculations...cutting and pasting for the other data sets that you have to analyze will just get you into trouble!!! The degrees of freedom are the total number of observations (4 groups with n = 5 gives 20 observations), which we will denote as N, minus 1. Thus: dftotal = N - 1 = 19.

The next step is to calculate the sum of squares for the cells (SScells):

This simply subtracts the grand mean from each of the cell means, squares the difference, and sums all the squared deviates together. This sum is then multiplied by the sample size of the groups. Obviously, this only works with equal sample sizes. The degrees of freedom for SScells is determined by subtracting 1 from the product of the number of levels for factor A (2) and the number of levels for factor B (2). Thus: dfcells = ab - 1.

The squared deviates for SScells are in cells B22 to E22, and SScells is calculated in cell H22.

The calculation for within-subgroups (I know...within-cells) sum of squares (SSwithin) is really no different from how it was calculated for single factor ANOVA. The squared deviates are derived from the difference between the observation and its cell mean:

The cell sums of squares are in cells B25 to E25, and SSwithin is calculated in cell H25. The degrees of freedom are calculated as: df within = ab(n - 1) = 2 x 2 x (5 - 1) = 16.

Question 2: Why was the within-group sum of squares for females with no hormone treatment calculated in the Excel example as (=4*VAR.S(B3:B7))?

If you cannot answer Question 2 correctly, you will have to calculate SSwithin the long way, or get dinged twice for applying methods that you do not understand, because you will get it wrong on some of this week's data sets. I'll give you a hint. The four is not the number of groups...

Note that the adding SSwithin and SScells gives you SStotal, and that the same is true for their respective degrees of freedom. Thus, the total variation can be partitioned between sum of squares based on the difference between group means and the grand mean, and between observations and the group means, as it was for single-factor ANOVA. We could, in single-factor ANOVA fashion, test for differences among cells (subgroups) by calculating Fs as:

Where the mean squares (MS) are derived by dividing the SS by their respective degrees of freedom as we did for single-factor ANOVA. This would estimate individual variation (ε) in the denominator, and individual variation plus all treatment variance and variance due to interactions (ε + α + β + αβ) in the numerator. Clearly, in order to isolate the separate treatment effects and interaction effects, we need to partition the SScells.

To look at the main effects, we treat each effect as though it were the only effect. In other words, we calculate the mean of all of the observations (10) for the no hormone groups, and all the observations (10) for the hormone groups, regardless of whether the observation belonged to a male or female. These means are used to calculate among-group sum of squares in the same way they were calculated for a single factor ANOVA:

If the effect of the second factor (in this case sex) is additive, it should influence both the no hormone and hormone treatments equally, and the estimate of the effect of hormone treatment should be unbiased, because we are estimating a variance. If, however, sex and hormone treatment interact in their effects, this approach will produce a worthless estimate of the effect of hormone treatment. Note that the number of groups for treatment B is part of the calculation for the sum of squares for treatment A! The opposite is true for calculating SSB. This will be important to remember when answering question 4.

Question 3: Explain why the F-values for main effects must be ignored if the F-value for the interaction is significant? (Hint: it relates to treating each effect as though it were the only effect)

For Example 12.1, a sample mean is calculated for the no hormone and hormone treatments (factor A) by pooling the male and female data, the grand mean is subtracted from those sample means, the difference squared, and the squared deviates summed together. This sum, like that for SSamong in single-factor ANOVA, must be weighted by the number of observations, and so it is multiplied by bn. The factor A means are calculated in cells J2 and K2, and SSA is calculated in cell J4. As with SSamong for single-factor ANOVA, dfA = a - 1.

Question 4: Explain how the equation in cell J4 would differ if there were 3 treatment levels for factor A.

Here's a hint: it has to do with using the VAR.S() function to get sum of squares...which you already should have figured out from question 2...

The sum of squares for factor B (sex) follows the same principles as that for factor A, calculating means for male and female samples by pooling the hormone treatments:

The pooled means are calculated in cells N2 and O2 (look closely at the formula to see how the separated columns are included in the average...see the comma?) and SSB is calculated in N4. The degrees of freedom for SSB, dfB, are calculated as b - 1. Notice that the number of groups for factor A is a component of the calculation for the sum of squares for factor B.

If the effects of the 2 factors, in this case, hormone treatment and sex, are additive (i.e., αβ = 0), then SSA and SSB would sum to SScells for a statistical population, and close to that value for samples. Thus, we can determine the sum of squares for the interaction (SSAxB) as:

The interaction sum of squares is calculated in cell L4, and the degrees of freedom are the product of the degrees of freedom for factor A and the degrees of freedom for factor B:

With all of the sums of squares calculated, the mean squares can be determined by dividing the appropriate sum of squares by its respective degrees of freedom, and the ANOVA table can be generated:

As you can see (or will see if you run the numbers), the variance ratios (Fs) values for all 3 null hypotheses are calculated with the mean squares for the null hypothesis (MSA, MSB, or MSAxB) as the numerator, and MSwithin as the denominator. For model I (fixed effects) ANOVA (most experiments can be justified as model I), the denominator variance is always MSwithin (error MS in your textbook). For model II (random effects) ANOVA, MSwithin is the denominator only for the variance ratio (Fs) testing for a significant interaction. The main effects variance ratios are generated using MSAxB as the denominator variance. For mixed-models, the denominator for testing the main effects is MSwithin for fixed effects and MSAxB for random effects. MSwithin is always the denominator when testing for a significant interaction. We will consider all of the analyses in this assignment to be model I (fixed effects) analyses, and so the denominator will always be MSwithin.

To find the probabilities associated with the generated values of Fs for each test (we are using the tables, so we only are interested in whether p is less than or greater than 0.05), the numerator degrees of freedom are those used to generate the mean squares value you used for the numerator, and the denominator degrees of freedom will be those used to calculate MSwithin for interactions and for fixed-effects, while the degrees of freedom associated with MSAxB will be used as denominator variance for random-effects.

When the interaction term is not significant (i.e., αβ = 0), as is the case with our example, then MSAxB, which estimates αβ + ε, should (in theory) be estimating the same thing as MSwithin, which estimates ε. If that is the case, we can (again, in theory) get a better estimate of ε by pooling the sum of squares (SSwithin + SSAxB), and the associated degrees of freedom (dfwithin + dfAxB), and generating a pooled value for MSwithin using those values. In our example, SSpooled would equal 301.39 + 4.90, and dfpooled would equal 16 + 1, making MSwithin 18.02. This smaller denominator will result in higher values for Fs in model I comparisons. Such pooling has been considered questionable by a number of authorities, because while it does improve the type II error rate, the effect on the type I error rate is not well-defined. Your textbook recommends against pooling, which is a safe recommendation. If, however, you have a legitimate, biological, a priori (i.e., before the data are examined) reason to be more concerned with type II error than with type I error, you may consider pooling of the sum of squares as an option.

For the remaining data sets in your worksheet, you will conduct a 2-factor ANOVA in the same fashion that was just outlined, and present your results as ANOVA tables. The assumptions of 2-factor ANOVA are the same as for single-factor ANOVA, and you should proceed as though all of the assumptions are met, and as though model I ANOVA was the appropriate choice. Remember that, if there is a significant interaction term, you will have to present the graph showing the interaction! Below, for your convenience, is a table summarizing the calculations for each component of the table for a model I ANOVA:

The worksheet titled "brown" contains data from an experiment on the foraging behavior of brown snakes (Boiga irregularis) during the wet and dry seasons in Guam. In each season (wet and dry) traps were baited with live or dead prey to determine whether the season influenced the importance of olfactory and visual cues. The observations are number of captures per trap night.

Question 5: Present the ANOVA table for the analysis of the brown snake data, and include a graph of the interactions if there is a significant interaction term.

The response of mesocosm zooplankton communities to phosphorus addition were studied in cattle tanks. Half of the tanks contained fish, and the other half did not contain fish. The data for rotifer abundances is presented in the worksheet titled "rotifer".

Question 6: Present the ANOVA table for the analysis of the rotifer data, and include a graph of the interactions if there is a significant interaction term.

As part of the same mesocosm experiment described above, the abundance of cladocerans also was examined. The data are contained in the worksheet titled "cladoceran".

Question 7: Present the ANOVA table for the analysis of the cladoceran data, and include a graph of the interactions if there is a significant interaction term.

Question 8: Choose one of the last 3 datasets, and provide an interpretation of the analysis, including a description of the significant effects, and a biological explanation for the observed results.

Although sound experimental design requires equal sample sizes, it often is the case that we have to deal with analysing data sets where the sample sizes are not equal. The concepts are the same, but the math is a little more difficult. Because I do not feel that there is anything to be gained conceptually by having you go through the calculations, I will simply point out that the formulas can be found in your book in section 12.2, and make use of "machine formulas". These formulas are designed to save on computing power, and so while they are convenient, the underlying concepts are not evident in their formulation.

Multiple comparisons can be applied in the event of a significant result in the same manner as we performed them last week. I will assume that you got the general concept last week, and point out only that if there is a significant interaction, the only valid comparisons that should be done are among the cell (subgroup) means.

The last thing we need to address this week is what to do if the data do not meet the assumptions of ANOVA? Your textbook describes one option in section 12.7 that remains the least problematic alternative. Last week's trick of performing the parametric test on ranked data, although widely employed, seems to be a poor alternative for 2-factor ANOVA.

As always, save your Excel workbook and Word document as yourlastnameex9 and submit them to me via Blackboard.

Disclaimer: Not surprisingly, no mice, ants, snakes, rotifers or cladocerans were harmed, inconvenienced, or even observed in the production of these data sets. While the means and estimates of the error are based on real scientific pursuits, the data were generated by models (the R programs can be viewed HERE) in order to protect the innocent.





Send comments, suggestions, and corrections to: Derek Zelmer