What do the Data Points Mean?

In Statistics 101 we were told that each data point in a scatter plot should have a precise meaning. Hopefully all ecologists agree with this, and if so I proceed to ask two questions about the ecology literature:

  1. What fraction of scatter plots in ecology papers define what the dots on the plot mean? Are they individual measurements, are they means of several measurements? Are they predictions from a mathematical model?
  2. Given that we know what the dots are, are we shown confidence limits for the points, or do we assume they are absolutely precise with no possible error?

With these two simple questions in mind I did a short, non-random search of recent ecology journals. Perhaps if a graduate ecology class is reading this blog, they could do a much wider search so that we might even be able to tell some of the editors of our journals how they score on Statistics 101 Quiz # 1. I went through 3 issues of Ecology (2015, issues 4, 5, and 6), 3 issues of the Journal of Animal Ecology (2015, issues 4 to 6), and 3 issues of Ecology Letters (2016, issues 1, 2, and 3). I scored each figure in each paper. The first question above is harder to score, so I divided the answer into three groups: clearly defined in figure legend, not defined in figure legend but clear in the paper itself, and not clearly defined anywhere. I kept the second question above on a simpler scale by asking if there were or were not confidence limits or S.E. on the dots in the scatter diagram. I considered histogram bars as ‘data points’ equivalent to scatter plots and scored these with these same 2 questions. I scored figures with multiple plots in the same figure as just one data source for my survey. I ignored maps, simulation data, and papers with only models. I got these results:

    Data points Confidence Limits or S.E.
Journal Number of papers Clearly defined in figure legend Yes No
Ecology 80 179
(95%)
98
(50%)
96
(50%)
Journal of Animal Ecology 84 195
(98%)
119
(60%)
81
(40%)
Ecology Letters 33 64
(94%)
29
(43%)
39
(57%)

The good news is that virtually all the data points in figures that contained empirical data were clearly defined, so the first question was not problematic. The potentially bad news is that around half of the data figures did not contain any measure of statistical precision for the data points.

There could be many reasons why confidence limits could not be applied to data points on graphs in papers. In some cases it would clutter the plot too much. In other cases the data points are completely accurate and have no error although this might be unusual in ecological data. Whatever the reason, some mention of the reason should be given in the text or the figure legend.

There were many limitations to this brief survey. It is clear that some subdisciplines of ecology adhere to Statistics 101 recommendations more carefully than others, but I did not tally these subdisciplines. One could make a thesis out of this sort of tally. Often I could not decipher if the data point was for an experimental unit or for a sampling unit but I have not analyzed for this error here.

So what do we conclude from this non-random survey? The take home message for authors is to make sure that the data points or histograms in their published figures are clearly defined in the figure legend and include if possible some measure of probable error. The message for reviewers and journal editors is to check that data points presented in submitted papers are properly identified and labeled with some measure of precision.

Leave a Reply