Some papers about p values

Jump to follow-up

These papers have nothing much to do with single molecule kinetics. They were written by David Colquhoun after his retirement from the world of single ion channels, as a way to keep him off the streets. They are listed here as a convenient place to keep a record.

The papers concern the misinterpretation of tests of significance. Such tests were barely ever used in our single ion channel work. They represent a return to the interest of DC in statistical inference that he had in the 1960s, and which culminated on the publication of a textbook, Lectures on Biostatistics (OUP, 1971). The textbook has aged quite well, with the exception of the parts on interpretation of p values. In the 1960s, I missed entirely the problems of null hypothesis significance testing. But better late than never.

The problem lies in the fact that most people still think that the p value is the probability that your results occurred by chance -see, for example. Gigerenzer et al.,(2006) [download pdf]. It is nothing of the sort.

The false positive risk (FPR) is the probability that a result that has been labelled as “statistically significant” is in fact a false positive. It is always bigger than the p value, often much bigger.

My recommendations. In brief, I suggest that p values and confidence intervals should still be cited, but they should be supplemented by a single number that gives an idea of the false positive risk (FPR). The simplest way to do this is to calculate the false positive risk that corresponds to a prior probability of there being a real effect of 0.5. This would still be optimistic for implausible hypotheses but it would be a great improvement on p values. The FPR50, calculated in this way is just a more comprehensible way of citing likelihood ratio (see 2019 paper).

Please note: the term “false discovery rate”, which was used in earlier papers, has now been replaced by “false positive risk”. The reasons for this change are explained in the introduction of the 2017 paper.

If you prefer a video to reading, try this, on YouTube.

Original papers about the problem


Colquhoun, D. (2014). An investigation of the false discovery rate and the misinterpretation of p-values. Open access at Royal Society Open Science This first paper looked at the risk of false positive results by simulation of Student’s t test. The advantage of simulation is that it makes the assumptions very clear without much mathematics. The disadvantage is that the results aren’t very general.

Colquhoun, D. (2017). The reproducibility of research and the misinterpretation of p-values . Open Access at Royal Society Open Science. This paper gives, in the appendix, mathematically exact solutions or the false positive risk, calculated by the p-equals method. This allows the false positive risk to be calculated, as a function of the observed p value, for a range of sample sizes. A web calculator is provided that makes the calculations simple to do.

Colquhoun, D. and Longstaff, C. (2017). Web calculator for false positive risk http://fpr-calc.ucl.ac.uk/

Update. Thanks to a cock-up by UCL, the web calculator was offline for a while. Thanks to a ingenious solution provided by the wonderful people at our web hosts, Positive Internet, the original link, http://fpr-calc.ucl.ac.uk/, now works again. That link is actually redirected to an html page (http://fpr-calc.positive-dedicated.net/) on the Positive Internet server in which is embedded the back up copy that is kindly hosted by Daniel Lakens in the Netherlands. The only difference is that this version on longer rearranges on a mobile phone.

There is also a copy of the web calculator at https://davidcolquhoun.shinyapps.io/fpr-calc-ver1-7/

Colquhoun, D. (2019). The false positive risk: a proposal concerning what to do about p values, The American Statistician, 2019 (open access to full text). Also available on arXiv). This paper examines more closely than before the assumptions that are made in calculations of FPR. It makes concrete proposals about how to solve the problem posed by the inadequacy of p values, with examples.

In the same online edition, The American Statistician published 43 papers that were designed to say what should be done about the problem of abuse of p values. One of these makes suggestions that are similar to mine, but based on even simpler calculations -see Benjamin and Berger (2019). The only problem with their approach is that it doesn’t cater for the Jeffreys-Lindley phenomenon and it is therefore not suitable for very large samples.

At the same time, Nature published a comment piece on the p value problem. The gist of this piece was a plea to abandon the term “statistically-significant”, because it involves the obviously silly idea that observing p = 0.049 tells you something different from p = 0.051. It was co-signed by 840 people (including me). Nature also published an editorial which half-understood the problem and, sadly, said “Nature is not seeking to change how it considers statistical analysis in evaluation of papers at this time”. This sums up the problem: it is in the interests of both authors, and of journals, to continue to publish too many false positive results. Until this problem is solved, the corruption will continue.

Colquhoun, D. (2019b). A response to critiques of ‘The reproducibility of research and the misinterpretation of p-values’. Royal Society Open Science. This one started life as a response to a critique of my 2017 paper. But it evolved into a more general discussion of the assumptions made in my approach, and concluded with a summary of my present views about what should be done about p values. In brief, I now think that p values and confidence intervals should continue to be given, but they should be supplemented by an estimate of the false positive risk. I suggest the notation FPR50 for the false positive risk that’s calculated on the basis that the prior probability of a real effect existing is 0.5.

Popular accounts of the problem

Colquhoun, D. (2015) False discovery rates and P values: the movie. On YouTube. This slide show is now superseded by the 2018 version.

Colquhoun, D. (2015). The perils of p-values. In Chalkdust magazine. Available at http://chalkdustmagazine.com/features/the-perils-of-p-values/. Chalkdust is a magazine run by students at UCL. This article deals with the principles of randomisation tests as a non-mathematical way to get p values, plus a bit about what’s wrong with p values.

Colquhoun,D.(2015). Randomisation tests. How to get a P value with no mathematics. A short (6 slides, 15 min) video on YouTube. Forget t tests. The randomisation test is at least as powerful and it makes no assumption of normal distributions. Furthermore it makes very clear the fact that random allocation of treatments is an essential assumption for all tests of statistical significance. Of course the result is just a p value. It doesn’t tell you the probability that you are wrong: for that, see the other stuff on this page.

Colquhoun, D.(2016). The problem with p-values. Aeon magazine. (This attracted 147 comments.
This essay is about the logic of inductive inferencee. It is a non-mathematical introduction to the ideas raised in my 2014 paper.

Colquhoun, D. (2017). Five ways to fix statistics. State false positive risk, too. Nature, volume 251. A collection of short comments by five authors on what should be done about p values.

Colquhoun, D. (2018). The false positive risk: a proposal concerning what to do about p-values (version 2). This video is a slightly extended version of a talk that I gave at the Evidence Live meeting, June 2018, at the Centre for Evidence-Based Medicine, Oxford. It supersedes my earlier 2015 video on the same topic. It is an exposition of the ideas that are given in more detail in the 2017 paper and in the 2018 paper. In November 2018 an new version was posted -the content is the same, but the volume of the sound track is better.

Why p values can’t tell you what you need to know and what to do about it

2020 version  I gave a talk to the RIOT science club, on 1 October 2020.  It has appeared on YouTube.  This gave me a chance to update my ideas about what to do about p values.  After the talk. Chris F. Carroll kindly sent me a transcript of it.  I took the chance to improve a bit on some of the explanations that I’d given in the talk -especially in the Q&A and I’ve posted the result on my blog, with links to the original talk. It’s here.

slide 1

2021 version

This is a recording of a Zoom seminar, for the UCL Department of Statistical Science, given on 6 May 2021, It’s slightly more technical than earlier versions. It explains better than earlier versions the assumptions that underlie my suggestions.

It’s remarkable that statisticians are still at war about how best to decide whether the difference between the means of two independent samples is a result of sampling error alone or whether it’s real.

The FPR50: a simple, but rough, solution to the p values war (?)

Latest video: 2022

This is a recording of a talk to the UCL R users’ group, via Zoom, on 8 December 2022. It has a bit more about the web calculator (an R Shiny app) than other talks.

Follow-up

Leave a Reply