Tag Archives: research quality

On Questionable Research Practices

Ecologists and evolutionary biologists are tarred and feathered along with many scientists who are guilty of questionable research practices. So says this article in “The Conservation” on the web:

Read this article if you have time but here is the essence of what they state:

“Cherry picking or hiding results, excluding data to meet statistical thresholds and presenting unexpected findings as though they were predicted all along – these are just some of the “questionable research practices” implicated in the replication crisis psychology and medicine have faced over the last half a decade or so.

“We recently surveyed more than 800 ecologists and evolutionary biologists and found high rates of many of these practices. We believe this to be first documentation of these behaviours in these fields of science.

“Our pre-print results have certain shock value, and their release attracted a lot of attention on social media.

  • 64% of surveyed researchers reported they had at least once failed to report results because they were not statistically significant (cherry picking)
  • 42% had collected more data after inspecting whether results were statistically significant (a form of “p hacking”)
  • 51% reported an unexpected finding as though it had been hypothesised from the start (known as “HARKing”, or Hypothesising After Results are Known).”

It is worth looking at these claims a bit more analytically. First, the fact that more than 800 ecologists and evolutionary biologists were surveyed tells you nothing about the precision of these results unless you can be convinced this is a random sample. Most surveys are non-random and yet are reported as though they are a random, reliable sample.

Failing to report results is common in science for a variety of reasons that have nothing to do with questionable research practices. Many graduate theses contain results that are never published. Does this mean their data are being hidden? Many results are not reported because they did not find an expected result. This sounds awful until you realize that journals often turn down papers because they are not exciting enough, even though the results are completely reliable. Other results are not reported because the investigator realized once the study is complete that it was not carried on long enough, and the money has run out to do more research. One would have to have considerable detail about each study to know whether or not these 64% of researchers were “cherry picking”.

Alas the next problem is more serious. The 42% who are accused of “p-hacking” were possibly just using sequential sampling or using a pilot study to get the statistical parameters to conduct a power analysis. Any study which uses replication in time, a highly desirable attribute of an ecological study, would be vilified by this rule. This complaint echos the statistical advice not to use p-values at all (Ioannidis 2005, Bruns and Ioannidis 2016) and refers back to complaints about inappropriate uses of statistical inference (Armhein et al. 2017, Forstmeier et al. 2017). The appropriate solution to this problem is to have a defined experimental design with specified hypotheses and predictions rather than an open ended observational study.

The third problem about unexpected findings hits at an important aspect of science, the uncovering of interesting and important new results. It is an important point and was warned about long ago by Medewar (1963) and emphasized recently by Forstmeier et al. (2017). The general solution should be that novel results in science must be considered tentative until they can be replicated, so that science becomes a self-correcting process. But the temptation to emphasize a new result is hard to restrain in the era of difficult job searches and media attention to novelty. Perhaps the message is that you should read any “unexpected findings” in Science and Nature with a degree of skepticism.

The cited article published in “The Conversation” goes on to discuss some possible interpretations of what these survey results mean. And the authors lean over backwards to indicate that these survey results do not mean that we should not trust the conclusions of science, which unfortunately is exactly what some aspects of the public media have emphasized. Distrust of science can be a justification for rejecting climate change data and rejecting the value of immunizations against diseases. In an era of declining trust in science, these kinds of trivial surveys have shock value but are of little use to scientists trying to sort out the details about how ecological and evolutionary systems operate.

A significant source of these concerns flows from the literature that focuses on medical fads and ‘breakthroughs’ that are announced every day by the media searching for ‘news’ (e.g. “eat butter”, “do not eat butter”). The result is almost a comical model of how good scientists really operate. An essential assumption of science is that scientific results are not written in stone but are always subject to additional testing and modification or rejection. But one result is that we get a parody of science that says “you can’t trust anything you read” (e.g. Ashcroft 2017). Perhaps we just need to repeat to ourselves to be critical, that good science is evidence-based, and then remember George Bernard Shaw’s comment:

Success does not consist in never making mistakes but in never making the same one a second time.

Amrhein, V., Korner-Nievergelt, F., and Roth, T. 2017. The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research. PeerJ  5: e3544. doi: 10.7717/peerj.3544.

Ashcroft, A. 2017. The politics of research-Or why you can’t trust anything you read, including this article! Psychotherapy and Politics International 15(3): e1425. doi: 10.1002/ppi.1425.

Bruns, S.B., and Ioannidis, J.P.A. 2016. p-Curve and p-Hacking in observational research. PLoS ONE 11(2): e0149144. doi: 10.1371/journal.pone.0149144.

Forstmeier, W., Wagenmakers, E.-J., and Parker, T.H. 2017. Detecting and avoiding likely false-positive findings – a practical guide. Biological Reviews 92(4): 1941-1968. doi: 10.1111/brv.12315.

Ioannidis, J.P.A. 2005. Why most published research findings are false. PLOS Medicine 2(8): e124. doi: 10.1371/journal.pmed.0020124.

Medawar, P.B. 1963. Is the scientific paper a fraud? Pp. 228-233 in The Threat and the Glory. Edited by P.B. Medawar. Harper Collins, New York. pp. 228-233. ISBN 978-0-06-039112-6

A Modest Proposal for a New Ecology Journal

I read the occasional ecology paper and ask myself how this particular paper ever got published when it is full of elementary mistakes and shows no understanding of the literature. But alas we can rarely do anything about this as individuals. If you object to what a particular paper has concluded because of its methods or analysis, it is usually impossible to submit a critique that the relevant journal will publish. After all, which editor would like to admit that he or she let a hopeless paper through the publication screen. There are some exceptions to this rule, and I list two examples below in the papers by Barraquand (2014) and Clarke (2014). But if you search the Web of Science you will find few such critiques for published ecology papers.

One solution jumped to mind for this dilemma: start a new ecology journal perhaps entitled Misleading Ecology Papers: Critical Commentary Unfurled. Papers submitted to this new journal would be restricted to a total of 5 pages and 10 references, and all polemics and personal attacks would be forbidden. The key for submissions would be to state a critique succinctly, and suggest a better way to construct the experiment or study, a new method of analysis that is more rigorous, or key papers that were missed because they were published before 2000. These rules would potentially leave a large gap for some very poor papers to avoid criticism, papers that would require a critique longer than the original paper. Perhaps one very long critique could be distinguished as a Review of the Year paper. Alternatively, some long critiques could be published in book form (Peters 1991), and not require this new journal. The Editor of the journal would require all critiques to be signed by the authors, but would permit in exceptional circumstances to have the authors be anonymous to prevent job losses or in more extreme cases execution by the Mafia. Critiques of earlier critiques would be permitted in the new journal, but an infinite regress will be discouraged. Book reviews could be the subject of a critique, and the great shortage of critical book reviews in the current publication blitz is another aspect of ecological science that is largely missing in the current journals. This new journal would of course be electronic, so there would be no page charges, and all articles would be open access. All the major bibliographic databases like the Web of Science would be encouraged to catalog the publications, and a doi: would be assigned to each paper from CrossRef.

If this new journal became highly successful, it would no doubt be purchased by Wiley-Blackwell or Springer for several million dollars, and if this occurred, the profits would accrue proportionally to all the authors who had published papers to make this journal popular. The sale of course would be contingent on the purchaser guaranteeing not to cancel the entire journal to prevent any criticism of their own published papers.

At the moment criticism of ecological science does not occur for several years after a poor paper is published and by that time the Donald Rumsfeld Effect would have occurred to apply the concept of truth to the conclusions of this poor work. For one example, most of the papers critiqued by Clarke (2014) were more than 10 years old. By making the feedback loop much tighter, certainly within one year of a poor paper appearing, budding ecologists could be intercepted before being led off course.

This journal would not be popular with everyone. Older ecologists often strive mightily to prevent any criticism of their prior conclusions, and some young ecologists make their career by pointing out how misleading some of the papers of the older generation are. This new journal would assist in creating a more egalitarian ecological world by producing humility in older ecologists and more feelings of achievements in young ecologists who must build up their status in the science. Finally, the new journal would be a focal point for graduate seminars in ecology by bringing together and identifying the worst of the current crop of poor papers in ecology. Progress would be achieved.


Barraquand, F. 2014. Functional responses and predator–prey models: a critique of ratio dependence. Theoretical Ecology 7(1): 3-20. doi: 10.1007/s12080-013-0201-9.

Clarke, P.J. 2014. Seeking global generality: a critique for mangrove modellers. Marine and Freshwater Research 65(10): 930-933. doi: 10.1071/MF13326.

Peters, R.H. 1991. A Critique for Ecology. Cambridge University Press, Cambridge, England. 366 pp. ISBN:0521400171


On Tipping Points and Regime Shifts in Ecosystems

A new important paper raises red flags about our preoccupation with tipping points, alternative stable states and regime shifts (I’ll call them collectively sharp transitions) in ecosystems (Capon et al. 2015). I do not usually call attention to papers but this paper and a previous review (Mac Nally et al. 2014) seem to me to be critical for how we think about ecosystem changes in both aquatic and terrestrial ecosystems.

Consider an oversimplified example of how a sharp transition might work. Suppose we dumped fertilizer into a temperate clear-water lake. The clear water soon turns into pea soup with a new batch of algal species, a clear shift in the ecosystem, and this change is not good for many of the invertebrates or fish that were living there. Now suppose we stop dumping fertilizer into the lake. In time, and this could be a few years, the lake can either go back to its original state of clear water or it could remain as a pea soup lake for a very long time even though the pressure of added fertilizer was stopped. This second outcome would be a sharp transition, “you cannot go back from here” and the question for ecologists is how often does this happen? Clearly the answer is of great interest to natural resource managers and restoration ecologists.

The history of this idea for me was from the 1970s at UBC when Buzz Holling and Carl Walters were modelling the spruce budworm outbreak problem in eastern Canadian coniferous forests. They produced a model with a manifold surface that tipped the budworm from a regime of high abundance to one of low abundance (Holling 1973). We were all suitably amazed and began to wonder if this kind of thinking might be helpful in understanding snowshoe hare population cycles and lemming cycles. The evidence was very thin for the spruce budworm, but the model was fascinating. Then by the 1980s the bandwagon started to roll, and alternative stable states and regime change seemed to be everywhere. Many ideas about ecosystem change got entangled with sharp transition, and the following two reviews help to unravel them.

Of the 135 papers reviewed by Capon et al. (2015) very few showed good evidence of alternative stable states in freshwater ecosystems. They highlighted the use and potential misuse of ecological theory in trying to predict future ecosystem trajectories by managers, and emphasized the need of a detailed analysis of the mechanisms causing ecosystem change. In a similar paper for estuaries and near inshore marine ecosystems, Mac Nally et al. (2014) showed that of 376 papers that suggested sharp transitions, only 8 seemed to have sufficient data to satisfy the criteria needed to conclude that a transition had occurred and was linkable to an identifiable pressure. Most of the changes described in these studies are examples of gradual ecosystem changes rather than a dramatic shift; indeed, the timescale against which changes are assessed is critical. As always the devil is in the details.

All of this is to recognize that strong ecosystem changes do occur in response to human actions but they are not often sharp transitions that are closely linked to human actions, as far as we can tell now. And the general message is clearly to increase rigor in our ecological publications, and to carry out the long-term studies that provide a background of natural variation in ecosystems so that we have a ruler to measure human induced changes. Reviews such as these two papers go a long way to helping ecologists lift our game.

Perhaps it is best to end with part of the abstract in Capon et al. (2015):

“We found limited understanding of the subtleties of the relevant theoretical concepts and encountered few mechanistic studies that investigated or identified cause-and-effect relationships between ecological responses and nominal pressures. Our results mirror those of reviews for estuarine, nearshore and marine aquatic ecosystems, demonstrating that although the concepts of regime shifts and alternative stable states have become prominent in the scientific and management literature, their empirical underpinning is weak outside of a specific environmental setting. The application of these concepts in future research and management applications should include evidence on the mechanistic links between pressures and consequent ecological change. Explicit consideration should also be given to whether observed temporal dynamics represent variation along a continuum rather than categorically different states.”


Capon, S.J., Lynch, A.J.J., Bond, N., Chessman, B.C., Davis, J., Davidson, N., Finlayson, M., Gell, P.A., Hohnberg, D., Humphrey, C., Kingsford, R.T., Nielsen, D., Thomson, J.R., Ward, K., and Mac Nally, R. 2015. Regime shifts, thresholds and multiple stable states in freshwater ecosystems; a critical appraisal of the evidence. Science of The Total Environment 517(0): in press. doi:10.1016/j.scitotenv.2015.02.045.

Holling, C.S. 1973. Resilience and stability of ecological systems. Annual Review of Ecology and Systematics 4: 1-23. doi:10.1146/annurev.es.04.110173.000245.

Mac Nally, R., Albano, C., and Fleishman, E. 2014. A scrutiny of the evidence for pressure-induced state shifts in estuarine and nearshore ecosystems. Austral Ecology 39: 898-906. doi:10.1111/aec.12162.

On Indices of Population Abundance

I am often surprised at ecological meetings by how many ecological studies rely on indices rather than direct measures. The most obvious cases involve population abundance. Two common criteria for declaring a species as endangered are that its population has declined more than 70% in the last ten years (or three generations) or that its population size is less than 2500 mature individuals. The criteria are many and every attempt is made to make them quantitative. But too often the methods used to estimate changes in population abundance are based on an index of population size, and all too rarely is the index calibrated against known abundances. If an index increases by 2-fold, e.g. from 20 to 40 counts, it is not at all clear that this means the population size has increased 2-fold. I think many ecologists begin their career thinking that indices are useful and reliable and end their career wondering if they are providing us with a correct picture of population changes.

The subject of indices has been discussed many times in ecology, particularly among applied ecologists. Anderson (2001) challenged wildlife ecologists to remember that indices include an unmeasured term, detectability: Anderson (2001, p. 1295) wrote:

“While common sense might suggest that one should estimate parameters of interest (e.g., population density or abundance), many investigators have settled for only a crude index value (e.g., “relative abundance”), usually a raw count. Conceptually, such an index value (c) is the product of the parameter of interest (N) and a detection or encounter probability (p): then c=pN

He noted that many indices used by ecologists make a large assumption that the probability of encounter is a constant over time and space and individual observers. Much of the discussion of detectability flowed from these early papers (Williams, Nichols & Conroy 2002; Southwell, Paxton & Borchers 2008). There is an interesting exchange over Anderson’s (2001) paper by Engeman (2003) followed by a retort by Anderson (2003) that ended with this blast at small mammal ecologists:

“Engeman (2003) notes that McKelvey and Pearson (2001) found that 98% of the small-mammal studies reviewed resulted in too little data for valid mark-recapture estimation. This finding, to me, reflects a substantial failure of survey design if these studies were conducted to estimate population size. ……..O’Connor (2000) should not wonder “why ecology lags behind biology” when investigators of small-mammal communities commonly (i.e., over 700 cases) achieve sample sizes <10. These are empirical methods; they cannot be expected to perform well without data.” (page 290)

Take that you small mammal trappers!

The warnings are clear about index data. In some cases they may be useful but they should never be used as population abundance estimates without careful validation. Even by small mammal trappers like me.

Anderson, D.R. (2001) The need to get the basics right in wildlife field studies. Wildlife Society Bulletin, 29, 1294-1297.

Anderson, D.R. (2003) Index values rarely constitute reliable information. Wildlife Society Bulletin, 31, 288-291.

Engeman, R.M. (2003) More on the need to get the basics right: population indices. Wildlife Society Bulletin, 31, 286-287.

McKelvey, K.S. & Pearson, D.E. (2001) Population estimation with sparse data: the role of estimators versus indices revisited. Canadian Journal of Zoology, 79, 1754-1765.

O’Connor, R.J. (2000) Why ecology lags behind biology. The Scientist, 14, 35.

Southwell, C., Paxton, C.G.M. & Borchers, D.L. (2008) Detectability of penguins in aerial surveys over the pack-ice off Antarctica. Wildlife Research, 35, 349-357.

Williams, B.K., Nichols, J.D. & Conroy, M.J. (2002) Analysis and Management of Animal Populations. Academic Press, New York.

Citation Analysis Gone Crazy

Perhaps we should stop and look at the evils of citation analysis in science. Citation analysis began some 15 or 20 years ago with a useful thought that it might be nice to know if one’s scientific papers were being read and used by others working in the same area. But now it has morphed into a Godzilla that has the potential to run our lives. I think the current situation rests on three principles:

  1. Your scientific ability can be measured by the number of citations you receive. This is patent nonsense.
  2. The importance of your research is determined by which journals accept your papers. More nonsense.
  3. Your long-term contribution to ecological science can be measured precisely by your h–score or some variant.

These principles appeal greatly to the administrators of science and to many people who dish out the money for scientific research. You can justify your decisions with numbers. Excellent job to make the research enterprise quantitative. The contrary view which I might hope is held by many scientists rests on three different principles:

  1. Your scientific ability is difficult to measure and can only be approximately evaluated by another scientist working in your field. Science is a human enterprise not unlike music.
  2. The importance of your research is impossible to determine in the short term of a few years, and in a subject like ecology probably will not be recognized for decades after it is published.
  3. Your long-term contribution to ecological science will have little to do with how many citations you accumulate.

It will take a good historian to evaluate these alternative views of our science.

This whole issue would not matter except for the fact that it is eroding science hiring and science funding. The latest I have heard is that Norwegian universities are now given a large amount of money by the government if they publish a paper in SCIENCE or NATURE, and a very small amount of money if they publish the same results in the CANADIAN JOURNAL OF ZOOLOGY or – God forbid – the CANADIAN FIELD NATURALIST (or equivalent ‘lower class’ journals). I am not sure how many other universities will fall under this kind of reward-based publication scores. All of this is done I think because we do not wish to involve the human judgment factor in decision making. I suppose you could argue that this is a grand experiment like climate change (with no controls) – use these scores for 30 years and then see if they worked better than the old system based on human judgment. How does one evaluate such experiments?

NSERC (Natural Sciences and Engineering Research Council) in Canada has been trending in that direction in the last several years. In the eternal good old days scientists read research proposals and made judgments about the problem, the approach, and the likelihood of success of a research program. They took time to discuss at least some of the issues. But we move now into quantitative scores that replace human judgment, which I believe to be a very large mistake.

I view ecological research and practice much like I think medical research and medical practice operate. We do not know how well certain studies and experiment will work, any more than a surgeon knows exactly whether a particular technique or treatment will work or a particular young doctor will be a good surgeon, and we gain by experience in a mostly non-quantitative manner. Meanwhile we should encourage young scientists to try new ideas and studies, to give them opportunities based on judgments rather than on counts of papers or citations. Currently we want to rank everyone and every university like sporting teams and find out the winner. This is a destructive paradigm for science. It works for tennis but not for ecology.

Bornmann, L. & Marx, W. (2014) How to evaluate individual researchers working in the natural and life sciences meaningfully? A proposal of methods based on percentiles of citations. Scientometrics, 98, 487-509.

Leimu, R. & Koricheva, J. (2005) What determines the citation frequency of ecological papers? Trends in Ecology & Evolution, 20, 28-32.

Parker, J., Lortie, C. & Allesina, S. (2010) Characterizing a scientific elite: the social characteristics of the most highly cited scientists in environmental science and ecology. Scientometrics, 85, 129-143.

Todd, P.A., Yeo, D.C.J., Li, D. & Ladle, R.J. (2007) Citing practices in ecology: can we believe our own words? Oikos, 116, 1599-1601.

Back to p-Values

Alas ecology has slipped lower on the totem-pole of serious sciences by an article that has captured the attention of the media:

Low-Décarie, E., Chivers, C., and Granados, M. 2014. Rising complexity and falling explanatory power in ecology. Frontiers in Ecology and the Environment 12(7): 412-418. doi: 10.1890/130230.

There is much that is positive in this paper, so you should read it if only to decide whether or not to use it in a graduate seminar in statistics or in ecology. Much of what is concluded is certainly true, that there are more p-values in papers now than there were some years ago. The question then comes down to what these kinds of statistics mean and how this would justify a conclusion captured by the media that explanatory power in ecology is declining over time, and the bottom line of what to do about falling p-values. Since as far as I can see most statisticians today seem to believe that p-values are meaningless (e.g. Ioannidis 2005), one wonders what the value of showing this trend is. A second item that most statisticians agree about is that R2 values are a poor measure of anything other than the items in a particular data set. Any ecological paper that contains data to be analysed and reported summarizes many tests providing p-values and R2 values of which only some are reported. It would be interesting to do a comparison with what is recognized as a mature science (like physics or genetics) by asking whether the past revolutions in understanding and prediction power in those sciences corresponded with increasing numbers of p-values or R2 values.

To ask these questions is to ask what is the metric of scientific progress? At the present time we confuse progress with some indicators that may have little to do with scientific advancement. As journal editors we race to increase their impact factor which is interpreted as a measure of importance. For appointments to university positions we ask how many citations a person has and how many papers they have produced. We confuse scientific value with some numbers which ironically might have a very low R2 value as predictors of potential progress in a science. These numbers make sense as metrics to tell publication houses how influential their journals are, or to tell Department Heads how fantastic their job choices are, but we fool ourselves if we accept them as indicators of value to science.

If you wish to judge scientific progress you might wish to look at books that have gathered together the most important papers of the time, and examine a sequence of these from the 1950s to the present time. What is striking is that papers that seemed critically important in the 1960s or 1970s are now thought to be concerned with relatively uninteresting side issues, and conversely papers that were ignored earlier are now thought to be critical to understanding. A list of these changes might be a useful accessory to anyone asking about how to judge importance or progress in a science.

A final comment would be to look at the reasons why a relatively mature science like geology has completely failed to be able to predict earthquakes in advance and even to specify the locations of some earthquakes (Steina et al. 2012; Uyeda 2013). Progress in understanding does not of necessity dictate progress in prediction. And we ought to be wary of confusing progress with p-and R2 values.

Ioannidis, J.P.A. 2005. Why most published research findings are false. PLoS Medicine 2(8): e124.

Steina, S., Gellerb, R.J., and Liuc, M. 2012. Why earthquake hazard maps often fail and what to do about it. Tectonophysics 562-563: 1-24. doi: 10.1016/j.tecto.2012.06.047.

Uyeda, S. 2013. On earthquake prediction in Japan. Proceedings of the Japan Academy, Series B 89(9): 391-400. doi: 10.2183/pjab.89.391.

On publishing in SCIENCE and NATURE

We are having an ongoing discussion at the University of Canberra Institute for Applied Ecology about the need to obtain a measure of our strength in research. We have entered the age of quantification of all things even those that cannot be quantified, and so each of us must get our ranking from our citation rates or h-scores, or journal impact factors. And institutes rise and fall along with our research grants on the basis of these numbers. All of this seems to be necessary but is quite silly for two reasons. First, the importance of any particular paper or idea can only be judged in the long term, so trying to decide if you should have a job because of your citation rate is a cop out. Second, this quantification undermines the importance of judgment of scientists and administrators as adjudicators of the relative merits of specific research and specific scientists. The problem is that as a young scientist in particular you are caught in a web of nonsense and you have to play the game.

The name of the game is to get a paper in SCIENCE or NATURE. To do this you must shorten the presentation so much that it is nearly unintelligible and violates the staid assumption that a scientific paper must have enough detail in it that someone else can repeat the study and test its conclusions. These details are typically left to be put in the supplementary materials that one can download separately from the published paper. So these papers become like headlines in a newspaper, giving a grand conclusion with little of the details of how it was reached. But this publication is the hallmark of success so one must try. The only rule I can suggest is to have a Plan B for publication since about 99% of papers are rejected from SCIENCE AND NATURE.

There is a demography at work here that we must keep in mind. If scientific output is doubling every 7 years approximately, then getting a paper into SCIENCE or NATURE now is twice as hard as it was 7 years ago, on a totally random model of acceptance. So when your supervisor tells you that he or she got a paper in SCIENCE xx years ago, and so should you now, you might point out the demographic momentum of science.

Editors of any journal especially SCIENCE and NATURE are under great pressure, and if anyone thinks that their decisions are completely unbiased, they probably think that the earth is flat. All of us think some parts of our science are more important than others, and editorial decisions are far from perfect. The important message for young scientists is not to get discouraged when rejection slips appear. Any senior scientist could paper the hallways with letters of rejection from various journals. The important thing is to do good research, test hypotheses, make interesting speculations that can be tested, and move on, with or without a paper in SCIENCE or NATURE.

Finally, if someone wants an interesting project, you might trace the history of papers that have appeared in SCIENCE and NATURE over the last 50 years and see how many of them have been significant contributions to the ecological science we recognize now. Perhaps someone has done this already and it has been rejected by SCIENCE and is sitting in a filing cabinet somewhere…….