The Meaningless of Random Sampling

Statisticians tell us that random sampling is necessary for making general inferences from the particular to the general. If field ecologists accept this dictum, we can only conclude that it is very difficult to nearly impossible to reach generality. We can reach conclusions about specific local areas, and that is valuable, but much of our current ecological wisdom on populations and communities relies on the faulty model of non-random sampling. We rarely try to define the statistical ‘population’ which we are studying and attempting to make inferences about with our data. Some examples might be useful to illustrate this problem.

Marine ecologists ae mostly agreed that sea surface temperature rise is destroying coral reef ecosystems. This is certainly true, but it camouflages the fact that very few square kilometres of coral reefs like the Great Barrier Reef have been comprehensively studied with a proper sampling design (e.g. Green 1979, Lewis 2004). When we analyse the details of coral reef declines, we find that many species are affected by rising sea temperatures, but some are not, and it is possible that some species will adapt by natural selection to the higher temperatures. So we quite rightly raise the alarm about the future of coral reefs. But in doing so we neglect in many cases to specify the statistical ‘population’ to which our conclusions apply.

Most people would agree that such an approach to generalizing ecological findings is tantamount to saying the problem is “how many angels can dance on the head of a pin”, and in practice we can ignore the problem and generalize from the studied reefs to all reefs. And scientists would point out that physics and chemistry seek generality and ignore this problem because one can do chemistry in Zurich or in Toronto and use the same laws that do not change with time or place. But the ecosystems of today are not going to be the ecosystems of tomorrow, so generality in time cannot be guaranteed, as paleoecologists have long ago pointed out.

It is the spatial problem of field studies that collides most strongly with the statistical rule to random sample. Consider a hypothetical example of a large national park that has recently been burned by this year’s fires in the Northern Hemisphere. If we wish to measure the recovery process of the vegetation, we need to set out plots to resample. We have two choices: (1) lay out as many plots as possible, and sample these for several years to plot recovery. Or (2) lay out plots at random each year, never repeating the same exact areas to satisfy the specifications of statisticians to “random sample” the recovery in the park. We typically would do (1) for two reasons. Setting up new plots each year as per (2) would greatly increase the initial field work of defining the random plots and would probably mean that travel time between the plots would be greatly increased. Using approach (1) we would probably set out plots with relatively easy access from roads or trails to minimize costs of sampling. We ignore the advice of statisticians because of our real-world constraints of time and money. And we hope to answer the initial questions about recovery with this simpler design.

I could find few papers in the ecological literature that discuss this general problem of inference from the particular to the general (Ives 2018, Hauss 2018) and only one that deals with a real-world situation (Ducatez 2019). I would be glad to be sent more references on this problem by readers.

The bottom line is that if your supervisor or research coordinator criticizes your field work because your study areas are not randomly placed or your replicate sites were not chosen at random, tell him or her politely that virtually no ecological research in the field is done by truly random sampling. Does this make our research less useful for achieving ecological understanding – probably not. And we might note that medical science works in exactly the same way field ecologists work, do what you can with the money and time you have. The law that scientific knowledge requires random sampling is often a pseudo-problem in my opinion.  

Ducatez, S. (2019) Which sharks attract research? Analyses of the distribution of research effort in sharks reveal significant non-random knowledge biases. Reviews in Fish Biology and Fisheries, 29, 355-367. doi. 10.1007/s11160-019-09556-0

Green, R.H. (1979) Sampling Design and Statistical Methods for Environmental Biologists. Wiley, New York. 257 pp.

Hauss, K. (2018) Statistical Inference from Non-Random Samples. Problems in Application and Possible Solutions in Evaluation Research. Zeitschrift fur Evaluation, 17, 219-240. doi.

Ives, A.R. (2018) Informative Irreproducibility and the Use of Experiments in Ecology. BioScience, 68, 746-747. doi. 10.1093/biosci/biy090

Lewis, J. (2004) Has random sampling been neglected in coral reef faunal surveys? Coral Reefs, 23, 192-194. doi: 10.1007/s00338-004-0377-y.

Leave a Reply

Your email address will not be published. Required fields are marked *