Tag Archives: hypothesis testing

Blaming Climate Change for Ecological Changes

The buzz word for all ecological applications for funding and for many submitted papers is climate change. Since the rate of climate change is not something ecologists can control, there are only two reasons to cite climate change as a reason to fund current ecological research. First, since change is continuous in communities and ecosystems, it would be desirable to determine how many of the observed changes might be caused by climate change. Second, it might be desirable to measure the rate of change in ecosystems, correlate these changes to some climate variable, and then use these data as a political and social tool to stimulate politicians to do something about greenhouse gas emissions. The second approach is that taken by climatologists who blame hurricanes and tornadoes on global warming. There is no experimental way to trace any particular hurricane to particular amounts of global warming, so it is easy for critics to say these are just examples of weather variation of which we have measured much over the last 150 years and paleo-ecologists have traced over tens of thousands of years using proxies from tree rings and sediment cores. If we are to use the statistical approach we need a large enough sample to argue that extreme events are becoming more frequent, and that might take 50 years by which time the argument would be made too late to request proper action.

The second approach to prediction in ecology is fraught with problems, as outlined in Berteaux et al. (2006) and Dietze (2017). The first approach has many statistical problems as well in selecting a biologically coherent model that can be tested by in a standard scientific manner. Since there are a very large number of climate variables, the possibility of spurious correlations is excessive, and the only way to avoid these kinds of results is to be predictive and to have a biological causal chain that is testable. Myers (1998) reviewed all the fishery data for predictive models of juvenile recruitment that used environmental variables as predictors and data was subsequently collected and tested with the published model. The vast majority of these aquatic models failed when retested but a few were very successful. The general problem is that model failures or successes might not be published so even this approach can be biased if only a literature survey is undertaken. The take home message from Myers (1998) was that almost none of the recruitment-environment correlations were being used in actual fishery management.

How much would this conclusion about the failure of environmental models in fishery management apply to other areas in ecology? Mouquet et al. (2014) pointed out that predictions could be classified as ‘explanatory’ or ‘anticipatory’ and that “While explanatory predictions are necessarily testable, anticipatory predictions need not be…….In summary, anticipatory predictions differ from explanatory predictions in that they do not aim at testing models and theory. They rely on the assumption that underlying hypotheses are valid while explanatory predictions are based on hypotheses to be tested. Anticipatory predictions are also not necessarily supposed to be true.” (page 1296). If we accept these distinctions, we have (I think) a major problem in that many of the predictive models put forward in the ecological literature are anticipatory, so they would be of little use to a natural resource manager who requires an explanatory model.

If we ignore this problem with anticipatory predictions, we can concentrate on explanatory predictions that are useful to managers. One major set of explanatory predictions in ecology are those associated with range changes in relation to climate change. Cahill et al. (2014) examined the conventional hypothesis that warm-edge range limits are set by biotic interactions rather than abiotic interactions. Contrary to expectations, they found in 125 studies that abiotic factors were more frequently supported as setting warm-edge range limits. Clearly a major paradigm about warm-edge range limits is of limited utility.

Explanatory predictions are not always explicit. Mauck et al. (2018) for example developed a climate model to predict reproductive success in Leach’s storm petrel on an island off New Brunswick in eastern Canada. From 56 years of hatching success they concluded that annual global mean temperature during the spring breeding season was the single most important predictor of breeding success. They considered only a few measures of temperature as predictor variables and found that a quadratic form of annual global mean temperature was the best variable to describe the changes in breeding success. The paper speculates about how global or regional mean temperature could possibly be an ecological predictor of breeding success, and no mechanisms are specified. The actual data on breeding success are not provided in the paper, even as a temporal plot. Since global temperatures were rising steadily from 1955 to 2010, any temporal trend in any population parameter that is rising would correlate with temperature records. The critical quadratic relationship in their analysis suggests that a tipping point was reached in 1988 when hatching success began to decline. Whether or not this is a biologically correct explanatory model can be determined by additional data gathered in future years. But it would be more useful to find out what the exact ecological mechanisms are.

If the ecological world is going to hell in a handbasket, and temperatures however measured are going up, we can certainly construct a plethora of models to describe the collapse of many species and the rise of others. But this is hardly progress and would appear to be anticipatory predictions of little use to advancing ecological science, as Guthery et al. (2005) pointed out long ago. Someone ought to review and evaluate the utility of AIC methods as they are currently being used in ecological and conservation science for predictions.

Berteaux, D., Humphries, M.M., Krebs, C.J., Lima, M., McAdam, A.G., Pettorelli, N., Reale, D., Saitoh, T., Tkadlec, E., Weladji, R.B., and Stenseth, N.C. (2006). Constraints to projecting the effects of climate change on mammals. Climate Research 32, 151-158. doi: 10.3354/cr032151.

Cahill, A.E., Aiello-Lammens, M.E., Fisher-Reid, M.C., Hua, X., and Karanewsky, C.J. (2014). Causes of warm-edge range limits: systematic review, proximate factors and implications for climate change. Journal of Biogeography 41, 429-442. doi: 10.1111/jbi.12231.

Dietze, M.C. (2017). Prediction in ecology: a first-principles framework. Ecological Applications 27, 2048-2060. doi: 10.1002/eap.1589.

Guthery, F.S., Brennan, L.A., Peterson, M.J., and Lusk, J.J. (2005). Information theory in wildlife science: Critique and viewpoint. Journal of Wildlife Management 69, 457-465. doi: 10.1890/04-0645.

Mauck, R.A., Dearborn, D.C., and Huntington, C.E. (2018). Annual global mean temperature explains reproductive success in a marine vertebrate from 1955 to 2010. Global Change Biology 24, 1599-1613. doi: 10.1111/gcb.13982.

Mouquet, N., Lagadeuc, Y., Devictor, V., Doyen, L., and Duputie, A. (2015). Predictive ecology in a changing world. Journal of Applied Ecology 52, 1293-1310. doi: 10.1111/1365-2664.12482.

Myers, R.A. (1998). When do environment-recruitment correlations work? Reviews in Fish Biology and Fisheries 8, 285-305. doi: 10.1023/A:1008828730759.


On Questionable Research Practices

Ecologists and evolutionary biologists are tarred and feathered along with many scientists who are guilty of questionable research practices. So says this article in “The Conservation” on the web:

Read this article if you have time but here is the essence of what they state:

“Cherry picking or hiding results, excluding data to meet statistical thresholds and presenting unexpected findings as though they were predicted all along – these are just some of the “questionable research practices” implicated in the replication crisis psychology and medicine have faced over the last half a decade or so.

“We recently surveyed more than 800 ecologists and evolutionary biologists and found high rates of many of these practices. We believe this to be first documentation of these behaviours in these fields of science.

“Our pre-print results have certain shock value, and their release attracted a lot of attention on social media.

  • 64% of surveyed researchers reported they had at least once failed to report results because they were not statistically significant (cherry picking)
  • 42% had collected more data after inspecting whether results were statistically significant (a form of “p hacking”)
  • 51% reported an unexpected finding as though it had been hypothesised from the start (known as “HARKing”, or Hypothesising After Results are Known).”

It is worth looking at these claims a bit more analytically. First, the fact that more than 800 ecologists and evolutionary biologists were surveyed tells you nothing about the precision of these results unless you can be convinced this is a random sample. Most surveys are non-random and yet are reported as though they are a random, reliable sample.

Failing to report results is common in science for a variety of reasons that have nothing to do with questionable research practices. Many graduate theses contain results that are never published. Does this mean their data are being hidden? Many results are not reported because they did not find an expected result. This sounds awful until you realize that journals often turn down papers because they are not exciting enough, even though the results are completely reliable. Other results are not reported because the investigator realized once the study is complete that it was not carried on long enough, and the money has run out to do more research. One would have to have considerable detail about each study to know whether or not these 64% of researchers were “cherry picking”.

Alas the next problem is more serious. The 42% who are accused of “p-hacking” were possibly just using sequential sampling or using a pilot study to get the statistical parameters to conduct a power analysis. Any study which uses replication in time, a highly desirable attribute of an ecological study, would be vilified by this rule. This complaint echos the statistical advice not to use p-values at all (Ioannidis 2005, Bruns and Ioannidis 2016) and refers back to complaints about inappropriate uses of statistical inference (Armhein et al. 2017, Forstmeier et al. 2017). The appropriate solution to this problem is to have a defined experimental design with specified hypotheses and predictions rather than an open ended observational study.

The third problem about unexpected findings hits at an important aspect of science, the uncovering of interesting and important new results. It is an important point and was warned about long ago by Medewar (1963) and emphasized recently by Forstmeier et al. (2017). The general solution should be that novel results in science must be considered tentative until they can be replicated, so that science becomes a self-correcting process. But the temptation to emphasize a new result is hard to restrain in the era of difficult job searches and media attention to novelty. Perhaps the message is that you should read any “unexpected findings” in Science and Nature with a degree of skepticism.

The cited article published in “The Conversation” goes on to discuss some possible interpretations of what these survey results mean. And the authors lean over backwards to indicate that these survey results do not mean that we should not trust the conclusions of science, which unfortunately is exactly what some aspects of the public media have emphasized. Distrust of science can be a justification for rejecting climate change data and rejecting the value of immunizations against diseases. In an era of declining trust in science, these kinds of trivial surveys have shock value but are of little use to scientists trying to sort out the details about how ecological and evolutionary systems operate.

A significant source of these concerns flows from the literature that focuses on medical fads and ‘breakthroughs’ that are announced every day by the media searching for ‘news’ (e.g. “eat butter”, “do not eat butter”). The result is almost a comical model of how good scientists really operate. An essential assumption of science is that scientific results are not written in stone but are always subject to additional testing and modification or rejection. But one result is that we get a parody of science that says “you can’t trust anything you read” (e.g. Ashcroft 2017). Perhaps we just need to repeat to ourselves to be critical, that good science is evidence-based, and then remember George Bernard Shaw’s comment:

Success does not consist in never making mistakes but in never making the same one a second time.

Amrhein, V., Korner-Nievergelt, F., and Roth, T. 2017. The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research. PeerJ  5: e3544. doi: 10.7717/peerj.3544.

Ashcroft, A. 2017. The politics of research-Or why you can’t trust anything you read, including this article! Psychotherapy and Politics International 15(3): e1425. doi: 10.1002/ppi.1425.

Bruns, S.B., and Ioannidis, J.P.A. 2016. p-Curve and p-Hacking in observational research. PLoS ONE 11(2): e0149144. doi: 10.1371/journal.pone.0149144.

Forstmeier, W., Wagenmakers, E.-J., and Parker, T.H. 2017. Detecting and avoiding likely false-positive findings – a practical guide. Biological Reviews 92(4): 1941-1968. doi: 10.1111/brv.12315.

Ioannidis, J.P.A. 2005. Why most published research findings are false. PLOS Medicine 2(8): e124. doi: 10.1371/journal.pmed.0020124.

Medawar, P.B. 1963. Is the scientific paper a fraud? Pp. 228-233 in The Threat and the Glory. Edited by P.B. Medawar. Harper Collins, New York. pp. 228-233. ISBN 978-0-06-039112-6

On Caribou and Hypothesis Testing

Mountain caribou populations in western Canada have been declining for the past 10-20 years and concern has mounted to the point where extinction of many populations could be imminent, and the Canadian federal government is asking why this has occurred. This conservation issue has supported a host of field studies to determine what the threatening processes are and what we can do about them. A recent excellent summary of experimental studies in British Columbia (Serrouya et al. 2017) has stimulated me to examine this caribou crisis as an illustration of the art of hypothesis testing in field ecology. We teach all our students to specify hypotheses and alternative hypotheses as the first step to solving problems in population ecology, so here is a good example to start with.

From the abstract of this paper, here is a statement of the problem and the major hypothesis:

“The expansion of moose into southern British Columbia caused the decline and extirpation of woodland caribou due to their shared predators, a process commonly referred to as apparent competition. Using an adaptive management experiment, we tested the hypothesis that reducing moose to historic levels would reduce apparent competition and therefore recover caribou populations. “

So the first observation we might make is that much is left out of this approach to the problem. Populations can decline because of habitat loss, food shortage, excessive hunting, predation, parasitism, disease, severe weather, or inbreeding depression. In this case much background research has narrowed the field to focus on predation as a major limitation, so we can begin our search by focusing on the predation factor (review in Boutin and Merrill 2016). In particular Serrouya et al. (2017) focused their studies on the nexus of moose, wolves, and caribou and the supposition that wolves feed preferentially on moose and only secondarily on caribou, so that if moose numbers are lower, wolf numbers will be lower and incidental kills of caribou will be reduced. So they proposed two very specific hypotheses – that wolves are limited by moose abundance, and that caribou are limited by wolf predation. The experiment proposed and carried out was relatively simple in concept: kill moose by allowing more hunting in certain areas and measure the changes in wolf numbers and caribou numbers.

The experimental area contained 3 small herds of caribou (50 to 150) and the unmanipulated area contained 2 herds (20 and 120 animals) when the study began in 2003. The extended hunting worked well, and moose in the experimental area were reduced from about 1600 animals down to about 500 over the period from 2003 to 2014. Wolf numbers in the experimental area declined by about half over the experimental period because of dispersal out of the area and some starvation within the area. So the two necessary conditions of the experiment were satisfied – moose numbers declined by about two-thirds from additional hunting and wolf numbers declined by about half on the experimental area. But the caribou population on the experimental area showed mixed results with one population showing a slight increase in numbers but the other two showing a slight loss. On the unmanipulated area both caribou populations showed a continuing slow decline. On the positive side the survival rate of adult caribou was higher on the experimental area, suggesting that the treatment hypothesis was correct.

From the viewpoint of caribou conservation, the experiment failed to change the caribou population from continuous slow declines to the rapid increase needed to recover these populations to their former greater abundance. At best it could be argued that this particular experiment slowed the rate of caribou decline. Why might this be? We can make a list of possibilities:

  1. Moose numbers on the experimental area were not reduced enough (to 300 instead of to 500 achieved). Lower moose would have meant much lower wolf numbers.
  2. Small caribou populations are nearly impossible to recover because of chance events that affect small numbers. A few wolves or bears or cougars could be making all the difference to populations numbering 10-20 individuals.
  3. The experimental area and the unmanipulated area were not assigned treatments at random. This would mean to a pure statistician that you cannot make statistical comparisons between these two areas.
  4. The general hypothesis being tested is wrong, and predation by wolves is not the major limiting factor to mountain caribou populations. Many factors are involved in caribou declines and we cannot determine what they are because they change for area to area, year to year.
  5. It is impossible to do these landscape experiments because for large landscapes it is impossible to find 2 or more areas that can be considered replicates.
  6. The experimental manipulation was not carried out long enough. Ten years of manipulation is not long for caribou who have a generation time of 15-25 years.

Let us evaluate these 6 points.

#1 is fair enough, hard to achieve a population of moose this low but possible in a second experiment.

#2 is a worry because it is difficult to deal experimentally with small populations, but we have to take the populations as a given at the time we do a manipulation.

#3 is true if you are a purist but is silly in the real world where treatments can never be assigned at random in landscape experiments.

#4 is a concern and it would be nice to include bears and other predators in the studies but there is a limit to people and money. Almost all previous studies in mountain caribou declines have pointed the finger at wolves so it is only reasonable to start with this idea. The multiple factor idea is hopeless to investigate or indeed even to study without infinite time and resources.

#5 is like #3 and it is an impossible constraint on field studies. It is a common statistical fallacy to assume that replicates must be identical in every conceivable way. If this were true, no one could do any science, lab or field.

#6 is correct but was impossible in this case because the management agencies forced this study to end in 2014 so that they could conduct another different experiment. There is always a problem deciding how long a study is sufficient, and the universal problem is that the scientists or (more likely) the money and the landscape managers run out of energy if the time exceeds about 10 years or more. The result is that one must qualify the conclusions to state that this is what happened in the 10 years available for study.

This study involved a heroic amount of field work over 10 years, and is a landmark in showing what needs to be done and the scale involved. It is a far cry from sitting at a computer designing the perfect field experiment on a theoretical landscape to actually carrying out the field work to get the data summarized in this paper. The next step is to continue to monitor some of these small caribou populations, the wolves and moose to determine how this food chain continues to adjust to changes in prey levels. The next experiment needed is not yet clear, and the eternal problem is to find the high levels of funding needed to study both predators and prey in any ecosystem in the detail needed to understand why prey numbers change. Perhaps a study of all the major predators – wolves, bears, cougars – in this system should be next. We now have the radio telemetry advances that allow satellite locations, activity levels, timing of mortality, proximity sensors when predators are near their prey, and even video and sound recording so that more details of predation events can be recorded. But all this costs money that is not yet here because governments and people have other priorities and value the natural world rather less than we ecologists would prefer. There is not yet a Nobel Prize for ecological field research, and yet here is a study on an iconic Canadian species that would be high up in the running.

What would I add to this paper? My curiosity would be satisfied by the number of person-years and the budget needed to collect and analyze these results. These statistics should be on every scientific paper. And perhaps a discussion of what to do next. In much of ecology these kinds of discussions are done informally over coffee and students who want to know how science works would benefit from listening to how these informal discussions evolve. Ecology is far from simple. Physics and chemistry are simple, genetics is simple, and ecology is really a difficult science.

Boutin, S. and Merrill, E. 2016. A review of population-based management of Southern Mountain caribou in BC. {Unpublished review available at: http://cmiae.org/wp-content/uploads/Mountain-Caribou-review-final.pdf

Serrouya, R., McLellan, B.N., van Oort, H., Mowat, G., and Boutin, S. 2017. Experimental moose reduction lowers wolf density and stops decline of endangered caribou. PeerJ  5: e3736. doi: 10.7717/peerj.3736.


On Defining a Statistical Population

The more I do “field ecology” the more I wonder about our standard statistical advice to young ecologists to “random sample your statistical population”. Go to the literature and look for papers on “random environmental fluctuations”, or “non-random processes”, or “random mating” and you will be overwhelmed with references and biology’s preoccupation with randomness. Perhaps we should start with the opposite paradigm, that nothing in the biological world is random in space or time, and then the corollary that if your data show a random pattern or random mating or whatever random, it means you have not done enough research and your inferences are weak.

Since virtually all modern statistical inference rests on a foundation of random sampling, every statistician will be outraged by any concerns that random sampling is possible only in situations that are scientifically uninteresting. It is nearly impossible to find an ecological paper about anything in the real world that even mentions what their statistical “population” is, what they are trying to draw inferences about. And there is a very good reason for this – it is quite impossible to define any statistical population except for those of trivial interest. Suppose we wish to measure the heights of the male 12-year-olds that go to school in Minneapolis in 2017. You can certainly do this, and select a random sample, as all statisticians would recommend. And if you continued to do this for 50 years, you would have a lot of data but no understanding of any growth changes in 12-year-old male humans because the children of 2067 in Minneapolis would be different in many ways from those of today. And so, it is like the daily report of the stock market, lots of numbers with no understanding of processes.

Despite all these ‘philosophical’ issues, ecologists carry on and try to get around this by sampling a small area that is considered homogeneous (to the human eye at least) and then arm waving that their conclusions will apply across the world for similar small areas of some ill-defined habitat (Krebs 2010). Climate change may of course disrupt our conclusions, but perhaps this is all we can do.

Alternatively, we can retreat to the minimalist position and argue that we are drawing no general conclusions but only describing the state of this small piece of real estate in 2017. But alas this is not what science is supposed to be about. We are supposed to reach general conclusions and even general laws with some predictive power. Should biologists just give up pretending they are scientists? That would not be good for our image, but on the other hand to say that the laws of ecology have changed because the climate is changing is not comforting to our political masters. Imagine the outcry if the laws of physics changed over time, so that for example in 25years it might be that CO2 is not a greenhouse gas. Impossible.

These considerations should make ecologists and other biologists very humble, but in fact this cannot be because the media would not approve and money for research would never flow into biology. Humility is a lost virtue in many western cultures, and particularly in ecology we leap from bandwagon to bandwagon to avoid the judgement that our research is limited in application to undefined statistical populations.

One solution to the dilemma of the impossibility of random sampling is just to ignore this requirement, and this approach seems to be the most common solution implicit in ecology papers. Rabe et al. (2002) surveyed the methods used by management agencies to survey population of large mammals and found that even when it was possible to use randomized counts on survey areas, most states used non-random sampling which leads to possible bias in estimates even in aerial surveys. They pointed out that ground surveys of big game were even more likely to provide data based on non-random sampling simply because most of the survey area is very difficult to access on foot. The general problem is that inference is limited in all these wildlife surveys and we do not know the ‘population’ to which the numbers derived are applicable.

In an interesting paper that could apply directly to ecology papers, Williamson (2003) analyzed research papers in a nursing journal to ask if random sampling was utilized in contrast to convenience sampling. He found that only 32% of the 89 studies he reviewed used random sampling. I suspect that this kind of result would apply to much of medical research now, and it might be useful to repeat his kind of analysis with a current ecology journal. He did not consider the even more difficult issue of exactly what statistical population is specified in particular medical studies.

I would recommend that you should put a red flag up when you read “random” in an ecology paper and try to determine how exactly the term is used. But carry on with your research because:

Errors using inadequate data are much less than those using no data at all.

Charles Babbage (1792–1871

Krebs CJ (2010). Case studies and ecological understanding. Chapter 13 in: Billick I, Price MV, eds. The Ecology of Place: Contributions of Place-Based Research to Ecological Understanding. University of Chicago Press, Chicago, pp. 283-302. ISBN: 9780226050430

Rabe, M. J., Rosenstock, S. S. & deVos, J. C. (2002) Review of big-game survey methods used by wildlife agencies of the western United States. Wildlife Society Bulletin, 30, 46-52.

Williamson, G. R. (2003) Misrepresenting random sampling? A systematic review of research papers in the Journal of Advanced Nursing. Journal of Advanced Nursing, 44, 278-288. doi: 10.1046/j.1365-2648.2003.02803.x


On Ecological Predictions

The gold standard of ecological studies is the understanding of a particular ecological issue or system and the ability to predict the operation of that system in the future. A simple example is the masting of trees (Pearse et al. 2016). Mast seeding is synchronous and highly variable seed production among years by a population of perennial plants. One ecological question is what environmental drivers cause these masting years and what factors can be used to predict mast years. Weather cues and plant resource states presumably interact to determine mast years. The question I wish to raise here, given this widely observed natural history event, is how good our predictive models can be on a spatial and temporal scale.

On a spatial scale masting events can be widespread or localized, and this provides some cues to the important weather variables that might be important. Assuming we can derive weather models for prediction, we face two often unknown constraints – space and time. If we can derive a weather model for trees in New Zealand, will it also apply to trees in Australia or California? Or on a more constrained geographical view, if it applied on the South Island of New Zealand will it also apply on the North Island? At the other extreme, must we derive models for every population of particular plants in different areas, so that predictability is spatially limited? We hope not and work on the assumption of more spatial generality than what we can measure on our particular small study areas.

The temporal stability of our explanations is now particularly worrisome because of climate change. If we have a good model of masting for a particular tree species in 2017, will it still be working in 2030, 2050 or 2100? A physicist would never ask such a question since a “scientific law” is independent of time. But biology in general and ecology in particular is not time independent both because of evolution and now in particular because of changing climate. But we have not faced up to whether or not we must check our “ecological laws” over and over again as the environment changes, and if we have to do this what must the time scale of rechecking be? Perhaps this question can be answered by determining the speed of potential evolutionary change in species groups. If virus diseases can evolve quickly in terms of months or years, we must be eternally vigilant to consider if the flu virus of 2017 is going to be the same as that of 2016. We should not stop virus research and say that we have sorted out some universal model that will become an equivalent of the laws of physics.

The consequences of these simple observations are not simple. One consequence is the implication that monitoring is an essential ecological activity. But in most ecological funding agencies monitoring is thought to be unscientific, not leading to progress, and more stamp collecting. So we have to establish that, like the Weather Bureau every country supports, we must have an equivalent ecological monitoring bureau. We do have these bureaus for some ecological systems that make money, like marine fisheries, but most other ecosystems are left in limbo with little or no funding on the generalized assumption that “mother or father nature will take care of itself” or expressed more elegantly by a cabinet minister who must be nameless, “there is no need for more forestry research, as we know everything we need to know already”. The urge by politicians to cut research funding lives too much in environmental research.

But ecologists are not just ‘stamp collectors’ as some might think. We need to develop generality but at a time scale and a spatial scale that is reliable and useful for the resolution of the problem that gave rise to the research. Typically for ecological issues this time scale would be 10-25 years, and a rule of thumb might be for 10 generations of the organisms being studied. For many of our questions an annual scale might be most useful, but for long-lived plants and animals we must be thinking of decades or even centuries. Some practical examples from Pacifici et al. (2013): If you study field voles (Microtus spp.) typically you can complete your studies of 10 generations in 3.5 years (on average). If you study red squirrels (Tamiasciurus hudsonicus), the same 10 generations will cost you 39 years, and if red foxes (Vulpes vulpes) 58 years. If wildebeest (Connochaetes taurinus) in the Serengeti, 10 generations will take you 80 years, and if you prefer red kangaroos (Macropus rufus) it will take about 90 years. All these estimates are very approximate but they give you an idea of what the time scale of a long-term study might be. Except for the rodent example, all these study durations are nearly impossible to achieve, and the question for ecologists is this: Should we be concerned about these time scales, or should we scale everything to the human research time scale?

The spatial scale has expanded greatly for ecologists with the advent of radio transmitters and the possibility of satellite tracking. These technological advances allow many conservation questions regarding bird migration to be investigated (e.g. Oppel et al. 2015). But no matter what the spatial scale of interest in a research or management program, variation among individuals and sites must be analyzed by means of the replication of measurements or manipulations at several sites. The spatial scale is dictated by the question under investigation, and the issue of fragmentation has focused attention on the importance of spatial movements both for ecological and evolutionary questions (Betts et al. 2014).

And the major question remains: can we construct an adequate theory of ecology from a series of short-term, small area or small container studies?

Betts, M.G., Fahrig, L., Hadley, A.S., Halstead, K.E., Bowman, J., Robinson, W.D., Wiens, J.A. & Lindenmayer, D.B. (2014) A species-centered approach for uncovering generalities in organism responses to habitat loss and fragmentation. Ecography, 37, 517-527. doi: 10.1111/ecog.00740

Oppel, S., Dobrev, V., Arkumarev, V., Saravia, V., Bounas, A., Kret, E., Velevski, M., Stoychev, S. & Nikolov, S.C. (2015) High juvenile mortality during migration in a declining population of a long-distance migratory raptor. Ibis, 157, 545-557. doi: 10.1111/ibi.12258

Pacifici, M., Santini, L., Di Marco, M., Baisero, D., Francucci, L., Grottolo Marasini, G., Visconti, P. & Rondinini, C. (2013) Database on generation length of mammals. Nature Conservation, 5, 87-94. doi: 10.3897/natureconservation.5.5734

Pearse, I.S., Koenig, W.D. & Kelly, D. (2016) Mechanisms of mast seeding: resources, weather, cues, and selection. New Phytologist, 212 (3), 546-562. doi: 10.1111/nph.14114

Climate Change and Ecological Science

One dominant paradigm of the ecological literature at the present time is what I would like to call the Climate Change Paradigm. Stated in its clearest form, it states that all temporal ecological changes now observed are explicable by climate change. The test of this hypothesis is typically a correlation between some event like a population decline, an invasion of a new species into a community, or the outbreak of a pest species and some measure of climate. Given clever statistics and sufficient searching of many climatic measurements with and without time lags, these correlations are often sanctified by p< 0.05. Should we consider this progress in ecological understanding?

An early confusion in relating climate fluctuations to population changes was begun by labelling climate as a density independent factor within the density-dependent model of population dynamics. Fortunately, this massive confusion was sorted out by Enright (1976) but alas I still see this error repeated in recent papers about population changes. I think that much of the early confusion of climatic impacts on populations was due to this classifying all climatic impacts as density-independent factors.

One’s first response perhaps might be that indeed many of the changes we see in populations and communities are indeed related to climate change. But the key here is to validate this conclusion, and to do this we need to talk about the mechanisms by which climate change is acting on our particular species or species group. The search for these mechanisms is much more difficult than the demonstration of a correlation. To become more convincing one might predict that the observed correlation will continue for the next 5 (10, 20?) years and then gather the data to validate the correlation. Many of these published correlations are so weak as to preclude any possibility of validation in the lifetime of a research scientist. So the gold standard must be the deciphering of the mechanisms involved.

And a major concern is that many of the validations of the climate change paradigm on short time scales are likely to be spurious correlations. Those who need a good laugh over the issue of spurious correlation should look at Vigen (2015), a book which illustrates all too well the fun of looking for silly correlations. Climate is a very complex variable and a nearly infinite number of measurements can be concocted with temperature (mean, minimum, maximum), rainfall, snowfall, or wind, analyzed over any number of time periods throughout the year. We are always warned about data dredging, but it is often difficult to know exactly what authors of any particular paper have done. The most extreme examples are possible to spot, and my favorite is this quotation from a paper a few years ago:

“A total of 864 correlations in 72 calendar weather periods were examined; 71 (eight percent) were significant at the p< 0.05 level. …There were 12 negative correlations, p< 0.05, between the number of days with (precipitation) and (a demographic measure). A total of 45- positive correlations, p<0.05, between temperatures and (the same demographic measure) were disclosed…..”

The climate change paradigm is well established in biogeography and the major shifts in vegetation that have occurred in geological time are well correlated with climatic changes. But it is a large leap of faith to scale this well established framework down to the local scale of space and a short-term time scale. There is no question that local short term climate changes can explain many changes in populations and communities, but any analysis of these kinds of effects must consider alternative hypotheses and mechanisms of change. Berteaux et al. (2006) pointed out the differences between forecasting and prediction in climate models. We desire predictive models if we are to improve ecological understanding, and Berteaux et al. (2006) suggested that predictive models are successful if they follow three rules:

(1) Initial conditions of the system are well described (inherent noise is small);

(2) No important variable is excluded from the model (boundary conditions are defined adequately);

(3) Variables used to build the model are related to each other in the proper way (aggregation/representation is adequate).

Like most rules for models, whether these conditions are met is rarely known when the model is published, and we need subsequent data from the real world to see if the predictions are correct.

I am much less convinced that forecasting models are useful in climate research. Forecasting models describe an ecological situation based on correlations among the measurements available with no clear mechanistic model of the ecological interactions involved. My concern was highlighted in a paper by Myers (1998) who investigated for fish populations the success of published juvenile recruitment-environmental factor (typically temperature) correlations and found that very few forecasting models were reliable when tested against additional data obtained after publication. It would be useful for someone to carry out a similar analysis for bird and mammal population models.

Small mammals show some promise for predictive models in some ecosystems. The analysis by Kausrud et al. (2008) illustrates a good approach to incorporating climate into predictive explanations of population change in Norwegian lemmings that involve interactions between climate and predation. The best approach in developing these kinds of explanations and formulating them into models is to determine how the model performs when additional data are obtained in the years to follow publication.

The bottom line is to avoid spurious climatic correlations by describing and evaluating mechanistic models that are based on observable biological factors. And then make predictions that can be tested in a realistic time frame. If we cannot do this, we risk publishing fairy tales rather than science.

Berteaux, D., et al. (2006) Constraints to projecting the effects of climate change on mammals. Climate Research, 32, 151-158. doi: 10.3354/cr032151

Enright, J. T. (1976) Climate and population regulation: the biogeographer’s dilemma. Oecologia, 24, 295-310.

Kausrud, K. L., et al. (2008) Linking climate change to lemming cycles. Nature, 456, 93-97. doi: 10.1038/nature07442

Myers, R. A. (1998) When do environment-recruitment correlations work? Reviews in Fish Biology and Fisheries, 8, 285-305. doi: 10.1023/A:1008828730759

Vigen, T. (2015) Spurious Correlations, Hyperion, New York City. ISBN: 978-031-633-9438

On Statistical Progress in Ecology

There is a general belief that science progresses over time and given that the number of scientists is increasing, this is a reasonable first approximation. The use of statistics in ecology has been one of ever increasing improvements of methods of analysis, accompanied by bandwagons. It is one of these bandwagons that I want to discuss here by raising the general question:

Has the introduction of new methods of analysis in biological statistics led to advances in ecological understanding?

This is a very general question and could be discussed at many levels, but I want to concentrate on the top levels of statistical inference by means of old-style frequentist statistics, Bayesian methods, and information theoretic methods. I am prompted to ask this question because of my reviewing of many papers submitted to ecological journals in which the data are so buried by the statistical analysis that the reader is left in a state of confusion whether or not any progress has been made. Being amazed by the methodology is not the same as being impressed by the advance in ecological understanding.

Old style frequentist statistics (read Sokal and Rohlf textbook) has been criticized for concentrating on null hypothesis testing when everyone knows the null hypothesis is not correct. This has led to refinements in methods of inference that rely on effect size and predictive power that is now the standard in new statistical texts. Information-theoretic methods came in to fill the gap by making the data primary (rather than the null hypothesis) and asking the question which of several hypotheses best fit the data (Anderson et al. 2000). The key here was to recognize that one should have prior expectations or several alternative hypotheses in any investigation, as recommended in 1897 by Chamberlin. Bayesian analysis furthered the discussion not only by having several alternative hypotheses but by the ability to use prior information in the analysis (McCarthy and Masters 2006). Implicit in both information theoretic and Bayesian analysis is the recognition that all of the alternative hypotheses might be incorrect, and that the hypothesis selected as ‘best’ might have very low predictive power.

Two problems have arisen as a result of this change of focus in model selection. The first is the problem of testability. There is an implicit disregard for the old idea that models or conclusions from an analysis should be tested with further data, preferably with data obtained independently from the original data used to find the ‘best’ model. The assumption might be made that if we get further data, we should add it to the prior data and update the model so that it somehow begins to approach the ‘perfect’ model. This was the original definition of passive adaptive management, which is now suggested to be a poor model for natural resource management. The second problem is that the model selected as ‘best’ may be of little use for natural resource management because it has little predictability. In management issues for conservation or exploitation of wildlife there may be many variables that affect population changes and it may not be possible to conduct active adaptive management for all of these variables.

The take home message is that we need in the conclusions of our papers to have a measure of progress in ecological insight whatever statistical methods we use. The significance of our research will not be measured by the number of p-values, AIC values, BIC values, or complicated tables. The key question must be: What new ecological insights have been achieved by these methods?

Anderson, D.R., Burnham, K.P., and Thompson, W.L. 2000. Null hypothesis testing: problems, prevalence, and an alternative. Journal of Wildlife Management 64(4): 912-923.

Chamberlin, T.C. 1897. The method of multiple working hypotheses. Journal of Geology 5: 837-848 (reprinted in Science 148: 754-759 in 1965). doi:10.1126/science.148.3671.754.

McCarthy, M.A., and Masters, P.I.P. 2005. Profiting from prior information in Bayesian analyses of ecological data. Journal of Applied Ecology 42(6): 1012-1019. doi:10.1111/j.1365-2664.2005.01101.x.

Walters, C. 1986. Adaptive Management of Renewable Resources. Macmillan, New York.


Hypothesis testing using field data and experiments is definitely NOT a waste of time

At the ESA meeting in 2014 Greg Dwyer (University of Chicago) gave a talk titled “Trying to understand ecological data without mechanistic models is a waste of time.” This theme has recently been reiterated on Dynamic Ecology Jeremy Fox, Brian McGill and Megan Duffy’s blog (25 January 2016 https://dynamicecology.wordpress.com/2016/01/25/trying-to-understand-ecological-data-without-mechanistic-models-is-a-waste-of-time/).  Some immediate responses to this blog have been such things as “What is a mechanistic model?” “What about the use of inappropriate statistics to fit mechanistic models,” and “prediction vs. description from mechanistic models”.  All of these are relevant and interesting issues in interpreting the value of mechanistic models.

The biggest fallacy however in this blog post or at least the title of the blog post is the implication that field ecological data are collected in a vacuum.  Hypotheses are models, conceptual models, and it is only in the absence of hypotheses that trying to understand ecological data is a “waste of time”. Research proposals that fund field work demand testable hypotheses, and testing hypotheses advances science. Research using mechanistic models should also develop testable hypotheses, but mechanistic models are certainly are not the only route to hypothesis creation of testing.

Unfortunately, mechanistic models rarely identify how the robustness and generality of the model output could be tested from ecological data and often fail comprehensively to properly describe the many assumptions made in constructing the model. In fact, they are often presented as complete descriptions of the ecological relationships in question, and methods for model validation are not discussed. Sometimes modelling papers include blatantly unrealistic functions to simplify ecological processes, without exploring the sensitivity of results to the functions.

I can refer to my own area of research expertise, population cycles for an example here.  It is not enough for example to have a pattern of ups and downs with a 10-year periodicity to claim that the model is an acceptable representation of cyclic population dynamics of for example a forest lepidopteran or snowshoe hares. There are many ways to get cyclic dynamics in modeled systems. Scientific progress and understanding can only be made if the outcome of conceptual, mechanistic or statistical models define the hypotheses that could be tested and the experiments that could be conducted to support the acceptance, rejection or modification of the model and thus to inform understanding of natural systems.

How helpful are mechanistic models – the gypsy moth story

Given the implication of Dwyer’s blog post (or at least blog post title) that mechanistic models are the only way to ecological understanding, it is useful to look at models of gypsy moth dynamics, one of Greg’s areas of modeling expertise, with the view toward evaluating whether model assumptions are compatible with real-world data Dwyer et al.  2004  (http://www.nature.com/nature/journal/v430/n6997/abs/nature02569.html)

Although there has been considerable excellent work on gypsy moth over the years, long-term population data are lacking.  Population dynamics therefore are estimated by annual estimates of defoliation carried out by the US Forest Service in New England starting in 1924. These data show periods of non-cyclicity, two ten-year cycles (peaks in 1981 and 1991 that are used by Dwyer for comparison to modeled dynamics for a number of his mechanistic models) and harmonic 4-5 year cycles between 1943 and1979 and since the 1991 outbreak. Based on these data 10-year cycles are the exception not the rule for introduced populations of gypsy moth. Point 1. Many of the Dwyer mechanistic models were tested using the two outbreak periods and ignored over 20 years of subsequent defoliation data lacking 10-year cycles. Thus his results are limited in their generality.

As a further example a recent paper, Elderd et al. (2013)  (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3773759/) explored the relationship between alternating long and short cycles of gypsy moth in oak dominated forests by speculating that inducible tannins in oaks modifies the interactions between gypsy moth larvae and viral infection. Although previous field experiments (D’Amico et al. 1998) http://onlinelibrary.wiley.com/doi/10.1890/0012-9658(1998)079%5b1104:FDDNAW%5d2.0.CO%3b2/abstract concluded that gypsy moth defoliation does not affect tannin levels sufficiently to influence viral infection, Elderd et al. (2013) proposed that induced tannins in red oak foliage reduces variation in viral infection levels and promotes shorter cycles. In this study, an experiment was conducted using jasmonic acid sprays to induce oak foliage. Point 2 This mechanistic model is based on experiments using artificially induced tannins as a mimic of insect damage inducing plant defenses. However, earlier fieldwork showed that foliage damage does not influence virus transmission and thus does not support the relevance of this mechanism.

In this model Elderd et al. (2013) use a linear relationship for viral transmission (transmission of infection on baculovirus density) based on two data points and the 0 intercept. In past mechanistic models and in a number of other systems the relationship between viral transmission and host density is nonlinear (D’Amico et al. 2005, http://onlinelibrary.wiley.com/doi/10.1111/j.0307-6946.2005.00697.x/abstract;jsessionid=D93D281ACD3F94AA86185EFF95AC5119.f02t02?userIsAuthenticated=false&deniedAccessCustomisedMessage= Fenton et al. 2002, http://onlinelibrary.wiley.com/doi/10.1046/j.1365-2656.2002.00656.x/full). Point 3. Data are insufficient to accurately describe the viral transmission relationship used in the model.

Finally the Elderd et al. (2013) model considers two types of gypsy moth habitat – one composed of 43% oaks that are inducible and the other of 15% oaks and the remainder of the forest composition is in adjacent blocks of non-inducible pines. Data show that gypsy moth outbreaks are limited to areas with high frequencies of oaks. In mixed forests, pines are only fed on by later instars of moth larvae when oaks are defoliated. The pines would be interspersed amongst the oaks not in separate blocks as in the modeled population. Point 4.  Patterns of forest composition in the models that are crucial to the result are unrealistic and this makes the interpretation of the results impossible.

Point 5 and conclusion. Because it can be very difficult to critically review someone else’s mechanistic model as model assumptions are often hidden in supplementary material and hard to interpret, and because relationships used in models are often arbitrarily chosen and not based on available data, it could be easy to conclude that “mechanistic models are misleading and a waste of time”. But of course that wouldn’t be productive. So my final point is that closer collaboration between modelers and data collectors would be the best way to ensure that the models are reasonable and accurate representations of the data.  In this way understanding and realistic predictions would be advanced. Unfortunately the great push to publish high profile papers works against this collaboration and manuscripts of mechanistic models rarely include data savvy referees.

D’Amico, V., J. S. Elkinton, G. Dwyer, R. B. Willis, and M. E. Montgomery. 1998. Foliage damage does not affect within-season transmission of an insect virus. Ecology 79:1104-1110.

D’Amico, V. D., J. S. Elkinton, P. J.D., J. P. Buonaccorsi, and G. Dwyer. 2005. Pathogen clumping: an explanation for non-linear transmission of an insect virus. Ecological Entomology 30:383-390.

Dwyer, G., F. Dushoff, and S. H. Yee. 2004. The combined effects of pathogens and predators on insect outbreaks. Nature 430:341-345.

Elderd, B. D., B. J. Rehill, K. J. Haynes, and G. Dwyer. 2013. Induced plant defenses, host–pathogen interactions, and forest insect outbreaks. Proceedings of the National Academy of Sciences 110:14978-14983.

Fenton, A., J. P. Fairbairn, R. Norman, and P. J. Hudson. 2002. Parasite transmission: reconciling theory and reality. Journal of Animal Ecology 71:893-905.

On Log-Log Regressions

Log-log regressions are commonly used in ecological papers, and my attention to their limitations was twigged by a recent paper by Hatton et al. (2015) in Science. I want to look at just one example of a log-log regression from this paper as an illustration of what I think might be some pitfalls of this approach. The regression under discussion is Figure 1 in the Hatton paper, a plot of predator biomass (Y) on prey biomass (X) for a variety of African large mammal ecosystems. I emphasize that this is a critique of log-log regression problems, not a detailed critique of this paper.

Figure 1 shows the raw data reported in the Hatton et al. (2015) paper but plotted in arithmetic space. It is clear that the variance increases with the mean and the data are highly variable, as well as slightly curvilinear, so a transformation is clearly desirable for statistical analysis. Unfortunately we are given no error bars on each of the point estimates, so it is not possible to plot confidence limits for each estimate.

Figure 1A

We log both the axes and get Figure 2 which is identical to that plotted as Figure 1 in Hatton et al. (2015). Clearly the regression fit is better that that of Figure 1 and yet there is still considerable variation around the line of best fit.

Figure 2A

The variation around this log-log line is the main issue I wish to discuss here. Much depends on the reason for the regression line. Mac Nally (2000) made the point that regressions are often used for predictive purposes but sometimes used only as explanations. I assume here one wishes this to be a predictive regression.

So the next question is if the Figure 2 regression is predictive, how wide are the confidence limits? In this case we will adopt the usual 95% confidence predictions for a single data point. The result is shown in Figure 3, which did not appear in the Science article. The red lines define the 95% confidence belt.

Figure 3A

Now comes the main point of my concerns with log-log regressions. What do these error limits really mean when they are translated back to the original measurements that define the graph?

The table given below gives the prediction intervals for a hypothetical set of 8 prey abundances scattered along the span of prey densities reported.

Prey abundance (kg/km2)

Estimated predator abundance (kg/km2)

Predicted lower 95% confidence limit

Predicted upper 95% confidence limit

Width of lower confidence interval (%)

Width of upper confidence interval (%)

















































The overall average confidence limits for this log-log regression are -43% to +75%, given that the SE of the predictions varies little across the range of values used in the regression. These are very broad confidence limits for any prediction from a regression line.

The bottom line is that log-log regressions can camouflage a great deal of variation, which may or may not be acceptable depending on the use of the regression. These plots always visually look much better than they are. You probably already knew this but I worry that it is a point that can be easily overlooked.

Lastly, a minor quibble with this regression. Some authors (e.g. Ricker 1983, Smith 2009) have discussed the issue of using the reduced major axis (or geometric mean regression) when the X variable is measured with error instead of the standard regression method. One could argue for this particular data set that the X variable is measured with error, so that I have used a reduced major axis regression in this discussion. The overall conclusions are not changed if standard regression methods are used.

Hatton, I.A., McCann, K.S., Fryxell, J.M., Davies, T.J., Smerlak, M., Sinclair, A.R.E. & Loreau, M. (2015) The predator-prey power law: Biomass scaling across terrestrial and aquatic biomes. Science 349 (6252). doi: 10.1126/science.aac6284

Mac Nally, R. (2000) Regression and model-building in conservation biology, biogeography and ecology: The distinction between – and reconciliation of – ‘predictive’ and ‘explanatory’ models. Biodiversity & Conservation, 9, 655-671. doi: 10.1023/A:1008985925162

Ricker, W.E. (1984) Computation and uses of central trend lines. Canadian Journal of Zoology 62 (10), 1897-1905.doi: 10.1139/z84-279

Smith, R.J. (2009) Use and misuse of the reduced major axis for line-fitting. American Journal of Physical Anthropology, 140, 476-486. doi: 10.1002/ajpa.21090

The Volkswagen Syndrome and Ecological Science

We have all been hearing the reports that Volkswagen fixed diesel cars by some engineering trick to show low levels of pollution, while the actual pollution produced on the road is 10-100 times higher than the laboratory predicted pollution levels. I wonder if this is an analogous situation to what we have in ecology when we compare laboratory studies and conclusions to real-world situations.

The push in ecology has always been to simplify the system first by creating models full of assumptions, and then by laboratory experiments that are greatly oversimplified compared with the real world. There are very good reasons to try to do this, since the real world is rather complicated, but I wonder if we should call a partial moratorium on such research by conducting a review of how far we have been led astray by both simple models and simple laboratory population, community and ecosystem studies in microcosms and mesocosms. I can almost hear the screams coming up that of course this is not possible since graduate students must complete a degree in 2 or 3 years, and postdocs must do something in 2 years. If this is our main justification for models and microcosms, that is fair enough but we ought to be explicit about stating that and then evaluate how much we have been misled by such oversimplification.

Let me try to be clear about this problem. It is an empirical question of whether or not studies in laboratory or field microcosms can give us reliable generalizations for much more extensive communities and ecosystems that are not in some sense space limited or time limited. I have a personal view on this question, heavily influenced by studies of small mammal populations in microcosms. But my experience may be atypical of the rest of natural systems, and this is an empirical question, not one on which we can simply state our opinions.

If the world is much more complex than our current understanding of it, we must conclude that an extensive list of climate change papers should be moved to the fiction section of our libraries. If we assume equilibrial dynamics in our communities and ecosystems, we fly in violation of almost all long term studies of populations, communities, and ecosystems. The problem lies in the space and time vision of our science. Our studies are too short to show even a good representation of dynamics over a 100 year time scale, and the problems of landscape ecology highlight that what we see in patch A may be greatly influenced by whether patches B and C are close by or not. We see this darkly in a few small studies but are compelled to believe that such landscape effects are unusual or atypical. This may in fact be the case, but we need much more work to see if it is rare or common. And the broader issue is what use do we as ecologists have for ecological predictions that cannot be tested without data for the next 100 years?

Are all our grand generalizations of ecology falling by the wayside without us noticing it? Prins and Gordon (2014) in their overview seem to feel that the real world is poorly reflected in many of our beloved theories. I think this is a reflection of the Volkswagen Syndrome, of the failure to appreciate that the laboratory in its simplicity is so far removed from real world community and ecosystem dynamics that we ought to start over to build an ecological edifice of generalizations or rules with a strong appreciation of the limited validity of most generalizations until much more research has been done. The complications of the real world can be ignored in the search for simplicity, but one has to do this with the realization that predictions that flow from faulty generalizations can harm our science. We ecologists have very much research yet to do to establish secure generalizations that lead to reliable predictions.

Prins, H.H.T. & Gordon, I.J. (2014) Invasion Biology and Ecological Theory: Insights from a Continent in Transformation. Cambridge University Press, Cambridge. 540 pp. ISBN 9781107035812.