On Defining a Statistical Population

The more I do “field ecology” the more I wonder about our standard statistical advice to young ecologists to “random sample your statistical population”. Go to the literature and look for papers on “random environmental fluctuations”, or “non-random processes”, or “random mating” and you will be overwhelmed with references and biology’s preoccupation with randomness. Perhaps we should start with the opposite paradigm, that nothing in the biological world is random in space or time, and then the corollary that if your data show a random pattern or random mating or whatever random, it means you have not done enough research and your inferences are weak.

Since virtually all modern statistical inference rests on a foundation of random sampling, every statistician will be outraged by any concerns that random sampling is possible only in situations that are scientifically uninteresting. It is nearly impossible to find an ecological paper about anything in the real world that even mentions what their statistical “population” is, what they are trying to draw inferences about. And there is a very good reason for this – it is quite impossible to define any statistical population except for those of trivial interest. Suppose we wish to measure the heights of the male 12-year-olds that go to school in Minneapolis in 2017. You can certainly do this, and select a random sample, as all statisticians would recommend. And if you continued to do this for 50 years, you would have a lot of data but no understanding of any growth changes in 12-year-old male humans because the children of 2067 in Minneapolis would be different in many ways from those of today. And so, it is like the daily report of the stock market, lots of numbers with no understanding of processes.

Despite all these ‘philosophical’ issues, ecologists carry on and try to get around this by sampling a small area that is considered homogeneous (to the human eye at least) and then arm waving that their conclusions will apply across the world for similar small areas of some ill-defined habitat (Krebs 2010). Climate change may of course disrupt our conclusions, but perhaps this is all we can do.

Alternatively, we can retreat to the minimalist position and argue that we are drawing no general conclusions but only describing the state of this small piece of real estate in 2017. But alas this is not what science is supposed to be about. We are supposed to reach general conclusions and even general laws with some predictive power. Should biologists just give up pretending they are scientists? That would not be good for our image, but on the other hand to say that the laws of ecology have changed because the climate is changing is not comforting to our political masters. Imagine the outcry if the laws of physics changed over time, so that for example in 25years it might be that CO2 is not a greenhouse gas. Impossible.

These considerations should make ecologists and other biologists very humble, but in fact this cannot be because the media would not approve and money for research would never flow into biology. Humility is a lost virtue in many western cultures, and particularly in ecology we leap from bandwagon to bandwagon to avoid the judgement that our research is limited in application to undefined statistical populations.

One solution to the dilemma of the impossibility of random sampling is just to ignore this requirement, and this approach seems to be the most common solution implicit in ecology papers. Rabe et al. (2002) surveyed the methods used by management agencies to survey population of large mammals and found that even when it was possible to use randomized counts on survey areas, most states used non-random sampling which leads to possible bias in estimates even in aerial surveys. They pointed out that ground surveys of big game were even more likely to provide data based on non-random sampling simply because most of the survey area is very difficult to access on foot. The general problem is that inference is limited in all these wildlife surveys and we do not know the ‘population’ to which the numbers derived are applicable.

In an interesting paper that could apply directly to ecology papers, Williamson (2003) analyzed research papers in a nursing journal to ask if random sampling was utilized in contrast to convenience sampling. He found that only 32% of the 89 studies he reviewed used random sampling. I suspect that this kind of result would apply to much of medical research now, and it might be useful to repeat his kind of analysis with a current ecology journal. He did not consider the even more difficult issue of exactly what statistical population is specified in particular medical studies.

I would recommend that you should put a red flag up when you read “random” in an ecology paper and try to determine how exactly the term is used. But carry on with your research because:

Errors using inadequate data are much less than those using no data at all.

Charles Babbage (1792–1871

Krebs CJ (2010). Case studies and ecological understanding. Chapter 13 in: Billick I, Price MV, eds. The Ecology of Place: Contributions of Place-Based Research to Ecological Understanding. University of Chicago Press, Chicago, pp. 283-302. ISBN: 9780226050430

Rabe, M. J., Rosenstock, S. S. & deVos, J. C. (2002) Review of big-game survey methods used by wildlife agencies of the western United States. Wildlife Society Bulletin, 30, 46-52.

Williamson, G. R. (2003) Misrepresenting random sampling? A systematic review of research papers in the Journal of Advanced Nursing. Journal of Advanced Nursing, 44, 278-288. doi: 10.1046/j.1365-2648.2003.02803.x

 

2 thoughts on “On Defining a Statistical Population

  1. sleather2012

    Whenever I am refereeing a paper and I come across the phrase “..randomly sampled” i query if it was actually truly random sampling, haphazard sampling or professorially random.

    Reply

Leave a Reply to sleather2012 Cancel reply