Symmetry and Kurtosis

As has been mentioned previously, distributions that have an equal number of observations spread similarly on either side of the mode, are said to be symmetrical. For such distributions, the mean and median will have close to the same value. One example that we have seen of a symmetrical distribution was for the difference of sample means from the population mean:

For this distribution the mean is 0.00368, and the median is 0.00342. The 4 distributions that we worked with in our Excel workbook also are symmetrical, as you can see by comparing the sample means to the sample medians. The degree to which a distribution deviates from symmetry is referred to as the skewness. We saw an example of a skewed distribution with the vetch data:

In this example, the data show a positive skew (or right skew), with the tail stretching to the right. There is a rule of thumb that suggests that the position of the mean relative to the median will give you the direction of the skew. In this instance, that appears to be the case, as the mean is to the right of the median, but this rule of thumb is not a good one to apply, especially for multimodal distributions, because it is not consistent enough. The best use of the comparison between the mean and the median is simply to determine whether a distribution is symmetrical or not.

There is another measure of shape that we can apply to frequency distributions that is called kurtosis. This is a measure of how peaked a distribution is. Kurtosis and skewness can be measured, but we won't explore those methods, as these characteristics are rarely reported when data are summarized.

Standard Error

Technically speaking, the terms "standard deviation" and "standard error" are interchangeable. In practice, however, people tend to use the term "standard deviation" to indicate the spread of observations around the sample mean (identical to our usage of sample standard deviation thus far), and will use "standard error" to indicate the standard deviation of sample means around the population mean.

If you consider this difference, you should come to the conclusion that in order to derive the "standard error", one would have to take a large number of samples from a population, and use the sample means as observations in the calculation of the standard deviation. Put another way, sample standard deviation is based on the differences between observations and the sample mean, while standard error is based upon the difference between sample means and the population mean.

Because standard error (we will stick with the common usage) describes how widely spread sample means are around the parameter that they are estimating (μ), it is an excellent indication of the expected accuracy of our estimate of μ. For this reason, it is almost always more informative to report the standard error in conjunction with the mean than it is to report the standard deviation. Standard deviation is a good measure when you are trying to convey the spread of the observations when you are reporting on a single sample, such as reporting morphological measurements for a species description, but if the purpose of presenting the data is for comparison, i.e., if you are presenting more than one mean, one should always report the standard error.

Let's pause for a moment to establish 2 rules for presenting data:

Rule #1: Never report an estimate without some measure of its error!!!

That's right. Three exclamation marks. This is serious stuff. I should point out that when dealing with descriptive or inferential statistics, "error" is synonomous with variation, not with "mistake". The estimate that we are interested in is that of the population mean, μ. Standard deviation and standard error can both be considered as providing information about the error present in our estimate of the mean, but standard error provides the most relevant and interpretable measure. This leads us to our second rule:

Rule #2: When in doubt, report the standard error.

Just a period this time. It's a good rule, but not one that will result in your being shunned by the entire scientific community if you violate it.

At this point in the discussion, that feeling of anxiety should have returned. You have just been told that, more often than not, the error term you will want to use in association with your sample mean is the standard error. You also have been told that standard error represents the standard deviation of sample means around the population mean, which means that it cannot be derived from a single sample. Remember the friendly letters on "The Hitchhikers Guide to the Galaxy". We may not be able to derive the standard error (hereafter abbreviated as "SE") from a single sample, but we most certainly can estimate it using the following formula:

Not surprisingly, the amount of variation in our sample, relative to the size of the sample, gives us information about how close our estimate of the mean is likely to be to the population mean. The only remaining question is whether or not this estimate is biased or not. To examine this, I calculated the SE from 1000 samples of 50 observations each, (the R program I used to do this is here) and plotted the distribution of those standard errors on the following graph:

The mean of this distribution is 0.282, and the median is 0.281. At the same time, the program calculated sample means for 1000 samples of 50 from the same statistical population. The standard deviation of those sample means was 0.279.

Question 4: Based on this example, does it appear that SE is an unbiased estimate of the standard deviation of sample means around a population mean? Justify your answer.

If one is to report a mean within the text of a report, one can do so simply as: mean ∓ SE (or mean +/- SE if you don't want to hunt for special characters. My mom says that I am a "special character"...). For example, for the bluegill values below, one could write: 0.0369 ∓ 0.0011. More commonly, standard errors are depicted as error bars on a graph of the means. Calculate the standard errors for the means on both spreadsheets. You may put the "COUNT" function within the "SQRT" function to calculate the denominator, e.g., =sqrt(count(A3.A11)). Put the means and standard errors together in a table similar to this:

This will allow you to select column L (in this example) for the X-axis series, and column M for the Y-axis series. It also will allow you to easily select the appropriate standard errors when you get to that step of the process. Unfortunately, copying the mean and SE into the table will copy the formulas with flawed references if you do not avail yourself of the "paste special" option. After using "Ctrl+c" to copy, and selecting the destination cell with the cursor, click on "Paste" beneath the clipboard icon (on the left side of the "HOME" toolbar) to bring down the menu and select "Paste Special". This will bring up the "Paste Special" menu. From the options in the upper-left select "Values", and then click on "OK". This will paste the result of the formula instead of the formula. If you prefer a keyboard to a mouse, you can accomplish the same thing by typing in sequence (don't hold any keys down) "Alt", e, s, v. Then all that remains is to click on "OK", or, more in keeping with the principle of mouse-avoidance, hit "Enter".

Start by making bar graphs of the means for the 2 sets of data. As described above, make the fish species the X-axis series, and the means the Y-axis series (you will only need one series for each graph). Format the graphs as we have described previously, with one exception: make the bars white (or no fill) instead of black. Make certain to use the appropriate units in your axis titles. Now...you will be ready to:

Add error bars in Excel 2016

Or...

Add error bars in Excel 2013

Or...

Add error bars in Excel 2010

Send comments, suggestions, and corrections to: Derek Zelmer