June 13, 2016 | David Trafimow, Editor, Basic and Applied Social Psychology

The demise of Null Hypothesis Significance Testing: a personal view

From David Trafimow, Editor, Basic and Applied Social Psychology

In 2015, the Editor of Basic and Applied Social Psychology announced in an editorial that the journal would cease accepting papers that relied on certain statistical methods – especially the null hypothesis significance testing procedure (NHSTP) – with immediate effect.

The editorial went far and wide, sparking a great deal of (mixed) attention within the scholarly community. Tweets were written, blog posts were shared, and publications including Scientific American and the Economist debated the matter.

How has the editorial influenced scientific practice? How have submissions to the journal changed? What’s been the reaction to the editor? David Trafimow discusses his decision, and recounts his personal experiences and views over one year down the line.

Read more on the story, and arguments both for and against the veto on the use of null hypothesis testing and p intervals here.

From David Trafimow,Editor-In-ChiefBasic and Applied Social Psychology

The null hypothesis significance testing procedure (NHSTP) involves proposing a null hypothesis, collecting data, computing the probability of the data (or data more extreme) given that the null hypothesis is true (this is p), and then rejecting or failing to reject the null hypothesis depending on whether p is below an arbitrary level. For many decades, there have been complaints that the NHSTP is invalid. The main complaint is that although it is possible to compute the probability of the data given the null hypothesis (p), there is no way to make a valid inference in the other direction – an inverse inference – about the probability of the null hypothesis given the data. Thus, the NHSTP forces the researcher into an inverse inference fallacy; the researcher rejects or fails to reject the null hypothesis, based on p, without knowing how likely the null hypothesis is to be true. Despite the many times this complaint has been voiced across the decades, researchers have continued to use and advocate the use of the NHSTP and have continued to commit the inverse inference fallacy. And yet, I believe that change is in the air and has been accelerating rapidly since the beginning of 2015, largely due to the editorial in Basic and Applied Social Psychology (BASP) that I wrote with Michael Marks as coauthor.

This part of the story really begins with my first BASP editorial in 2014. In that editorial, I discouraged the use of the NHSTP but did not ban it. It did not work! The unpleasant fact of the matter is that every empirical manuscript we received that year used the NHSTP in the initial submission. I quickly realized that, given the NHSTP’s entrenchment in scientific practice, the 2014 editorial was akin to using a squirt gun to hunt elephants. I needed an elephant gun. The 2015 editorial, in which we banned the NHSTP outright, served as the elephant gun, or at least as a reasonable facsimile.

The NHSTP ban has been fruitful. There is an obvious trend for initial submissions to BASP not to use the NHSTP. But perhaps more important, the scientific and statistical communities are now discussing the NHSTP and its shortcomings as never before. Many journals have published articles, comments, and interviews about the ban. My favorite, perhaps, was an article in Science News entitled, “P value ban: small step for a journal, giant leap for science.” The 2015 BASP editorial banning the NHSTP was viewed an unprecedented 100,000 times on the BASP website. I believe the genie is out of the bottle, and it is too late to put it back. The NHSTP is in serious trouble, as it ought to be. Scientists’ realization of its flaws is already improving research in many areas of science. I hope these developments will spread, and will in time also lead to improvements in the training of graduate students. Most young scientists are acculturated to viewing the NHSTP as the way science is done. That needs to stop.

In addition, emails from researchers in areas as diverse as biostatistics, economics, and others have informed me that journal editors are beginning to discourage the NHSTP and that the topic has become a major source of controversy in conferences. For instance, I was recently invited to write an article on the topic for Clinical Orthopaedics and Related Research. Orthopedics is a medical and scientific area that has almost nothing to do with social psychology, and yet our 2015 editorial has had an impact there.

The American Statistical Association has also become involved with the discussion. Director Ron Wasserstein recruited approximately two dozen world class statisticians to aid in drafting a statement about the NHSTP, and he kindly consulted with me on early versions. Although the statement stops short of endorsing our ban of the NHSTP, it does explain some important shortcomings of p, including that it provides poor evidence for drawing conclusions about the null hypothesis. As the point of the NHSTP is for researchers to use p to reject (or fail to reject) the null hypothesis, the statement that p provides poor evidence in this direction is important.

In fact, as we emphasize in the most recent BASP editorial (February 2016), p serves none of the purposes that it has been touted as serving. Therefore, given that nobody seems to be able to state a logically valid way of using p to draw conclusions, other than the conclusion-by-definition that the data have a particular probability given the null hypothesis, I would prefer that the American Statistical Association had endorsed the ban. Nevertheless, the fact that the statement goes most of the way in the direction of the 2015 editorial is a success.

Another positive development concerns conferences in my own field of psychology. The Society of Experimental Social Psychology invited me to chair a September 2015 symposium of journal editors to discuss the NHSTP. And in June of this year I will give an invited address to the Canadian Psychological Association on the topic. It is wonderful to see that so many scientists, in psychology and in other academic areas, have been inspired by the editorial to question what has until now been virtual dogma about how to draw scientific conclusions.

But none of these developments is as exciting as the opportunity that the demise of the NHSTP is providing for researchers to create new procedures for producing scientific inferences. Sometimes this involves asking new questions. For example, I have a paper “in press” in Educational and Psychological Measurement that asks why scientists collect a sample in the first place instead of doing that which is much easier – collecting only a single case. The answer, of course, is that researchers want to be confident that their sample statistics are close to corresponding population parameters. But this suggests two additional questions: (1) How close is “close?” (2) How confident is “confident?” My new procedure involves researchers committing to answers to these questions before collecting data, and I provide a way, given these a priori commitments, of finding the necessary sample size. Because scientists decide the sample size necessary to trust the data prior to collecting the data, they can trust the descriptive statistics obtained from the data without any further inferential work and without committing the inverse inference fallacy. And I just submitted a new manuscript (with Justin MacDonald) that expands on this idea.

Whether or not my a priori procedure catches on is less important than the more general consideration that, as researchers continue to consider the issue, some procedure or set of procedures will catch on, and constitute an important improvement over the NHSTP to the betterment of science.

Published: June 13, 2016 | Author: David Trafimow, Editor, Basic and Applied Social Psychology | Category: Front page, News and ideas |