Variance Ratios and Linear Regression

The F-ratio for testing H₀: β = 0 is:

In terms of the necessary sums of squares for the variance ratios (remember that mean squares, MS, is SS/df) we already have done the necessary calculations, although one of them was a shortcut. The unexplained sum of squares (SS_unexplained) is the sum of the squared residuals:

This value was already calculated in cell I16. MS_unexplained is calculated by dividing SS_unexplained by the appropriate degrees of freedom, which happen to be n - 2. This value is calculated in cell J22, but by looking at cell G22, you can see that this is exactly equivalent to the residual mean square we calculated for the single-sample t-test.

Similarly, we already have calculated the total sum of squares, SS_total, as:

Explained sum of squares, SS_explained, is calculated as:

These deviations and their sum can be viewed in column J. Note that the sum of column J is the equivalent of the difference between the sum of column F and the sum of column I. This was the denominator for our r² value. In other words, SS_explained also can be calculated as:

The degrees of freedom for SS_explained (sometimes referred to as SS_regression) are always 1, such that MS_explained = SS_explained. The table below shows the basic structure of the ANOVA table for regression analysis, and includes the necessary formulas:

As with any ANOVA, this is always a 1-tailed test, with the numerator degrees of freedom equal to 1, and the denominator degrees of freedom = n - 2, corresponding to the df of the appropriate MS.

MS_explained is calculated in cell J21, MS_unexplained is calculated in cell J22, and F_s is calculated in cell J23. Note that the square root of the F_s value is equal to the value calculated for t_s. This is not just a happy coincidence, it is the actual relationship between t and F. If you were to compare the results of a 2-sample t-test to those of a single-factor ANOVA on the same data set, you would see the same relationship.

Now, I'm sure that you are anxious to do some analyses of your own...

The Excel worksheet labeled "zooplankton" in this week's Excel workbook contains the mass of individual zooplankton (the independent variable), and the mass of the algae that they consumed over a 24 hour period.

Question 2: Test to see if there is significant positive covariation between these two variables, using the single-sample t-test to test H₀: β = 0. Plot the observations and the regression model as was done for Chart 1, using the estimates of Y as the model. If you let Excel do the work for you by clicking "trendline", then I will give Excel the credit for the question, and you will get a zero. Be sure to explain whether your statistical conclusion was based on a 1-tailed or a 2-tailed probability, and why you chose that probability.

The Excel worksheet labeled "parasitoid" contains data from an experiment on density-dependence in parasitoid wasp larvae. The wasps lay their eggs inside of caterpillars, and the larvae develop within the caterpillar, ultimately bursting out and killing the caterpillar. The clutch size was based on counting the individual larvae that erupted from the caterpillars, and the head width of each larva was measured as a surrogate measure of body size.

Question 3: Test to see if there is significant covariation between clutch size and head width for these parasitoid larvae using the variance ratio approach. Include the ANOVA table on your answer sheet. Plot the data and the regression line in your workbook. Do not worry about formatting it or labelling it, but you need to plot your estimates of Y to make sure that you did the regression properly.

The last 2 worksheets ("marine" and "C3C4") contain nitrogen and carbon amounts from bones of vertebrates in 2 different biomes; a marine biome, and a terrestrial biome with both C3 and C4 plants (a dry biome). The C:N ratio tends to be lower in marine systems than in terrestrial systems because of differing limitations. The main thrust of the investigation was determining if consistent C:N ratios could be found by examining bones. A constant ratio would be evidenced by a linear relationship between the 2 variables...

Question 4: Examine both sets of data to determine whether there is significant covariation between the mass of nitrogen and the mass of carbon in the bones examined. Don't panic when you see that the values for C are negative. These values are actually exponents, i.e., this is a transformed data set. Use either method to test the significance of the relationship, just remember to justify your choice of a 1-tailed or 2-tailed probability if you opt for the t-test, and to report the results as an ANOVA table if you choose to use ANOVA. Plot the data and the regression lines for both sets of data in your workbook. Again, do not worry about formatting it or labelling them, it is just a way to check to make sure that your regression lines are correct.

It probably has occurred to you that there will be occasions where it would be useful to compare the slopes of two relationships. This can be done using a t-test analagous to a 2-sample t-test. The calculation for t_s is:

Where b₁ and b₂ are the 2 slopes, and the standard error of the difference is calculated as:

The value in both of the numerators is the pooled SS_unexplained (pooled SS_residual), which is calculated as:

The critical value of t for comparison is associated with n₁ + n₂ - 4 degrees of freedom.

Question 5: Compare the slopes for the marine and C3 + C4 biomes to determine whether they differ significantly. Use a 2-tailed probability for t_crit to make your conclusion.

It should be noted that the calculations that we have done should only be applied to data where there is a single value of Y for each value of X. This is not true for some of our data sets. The estimate of b and a are not influenced by this, but the type I error rate may be. Your book covers the appropriate calculations for circumstances when you have multiple values of Y for each X, but the calculations are simplified, which means that the concepts are masked. Hopefully the procedures that you have applied, even though they might have influenced our conclusions about the fake data sets, have taught you enough about the analysis itself that you will be able to easily apply and interpret similar analyses using statistical software packages.

As always, save your Excel workbook and Word document as yourlastnameex10 and submit them to me via Blackboard.

Disclaimer: Once again, no birds, zooplankton, parasitoids, caterpillars, or vertebrates were harmed, inconvenienced, or even observed in the production of these data sets. While the means and estimates of the error are based on real scientific pursuits, the data were generated by models (the R programs can be viewed HERE) in order to protect the innocent.

Send comments, suggestions, and corrections to: Derek Zelmer