16 August 2018

How to present statistics: a gap in journals’ instructions to authors

I recently reviewed a couple of papers, and was reminded by how bad the presentation of statistics in many papers is. This is true even from veterans with lengthy publication records who you might think would know better. Here are a few tips on presenting statistics that I’ve gleaned over the years.

Averages


One paper wrote “(7.8 ± 0.8).” I guess this was supposed to be a mean and standard deviation. But I had to guess, because the paper didn’t say. But people often report other measures of dispersion around an average (standard errors, coefficient of variations) the exact same way.

Curran-Everett and Benos (2004) write, “The ± symbol is superfluous: a standard deviation is a single positive number.” When I tweeted this yesterday, a few wrote that this was silly, because Curran-Everett and Benos’s recommended format was a few characters longer and people worried about it being repetitive and hard to read. This reminds me of fiction writers who try to avoid repeating “He said,” usually with unreadable results. My concern is not about the ± symbol as the numbers having no description at all.

Regardless, my usual strategy is similar to Curran-Everett and Benos. I usually write something like, “(mean = 34.5, SD = 33.0, n = 49).” Yes, it’s longer, but it’s explicit.

That strategy isn’t the only way, of course. I have no problem with a line saying, “averages are reported as (mean ± S.D.) throughout.” That’s explicit, too.

Common statistical tests


Another manuscript repeatedly just said, “p < 0.05.” It didn’t tell me what test was used, nor any other information that could be used to check that it is correct.

For reporting statistical tests, I write something like, “(Kruskal-Wallis = 70.76, df = 2, p < 0.01).” That makes it explicit:

  • What test I am using.
  • The exact test statistic (i.e., the result of the statistical calculations).
  • The degrees of freedom, or sample size, or any other values that is relevant to checking and interpreting the test’s calculated value.
  • The exact p value, and never “greater than” or “lesser than” 0.05. Again, anyone who wants to confirm that the calculations have been done correctly needs an exact p value.  People’s interpretations of the data change depending on the reported p value. People don’t interpret a p value of 0.06 and 0.87 the same, even though both are “greater than 0.05.” Yes, I know that people probably should not put much stake in that exact value, and that p values are less reproducible than people expect, but there it is.

My understanding is that these values particularly matter for people doing meta-analyses.

Journals aren’t helping (much)


I wondered why I keep seeing stats presented in ways that are either uninterpretable or unverifiable. I checked the author’s instructions of PeerJ, PLOS ONE, The Journal of Neuroscience, and The Journal of Experimental Biology. As far as I could find, only The Journal of Neuroscience provided guidance on what their standards for reporting statistics is. The Journal of Experimental Biology’s checklist says, “For small sample sizes (n<5), descriptive statistics are not appropriate, and instead individual data points should be plotted.”

(Additional: PeerJ does have some guidelines. They are under “Policies and procedures” rather than “Instructions for authors,” so I missed them in my quick search.)

This stands in stark contrast to the notoriously detailed instructions practically every journal has for reference formatting. This is even true for PeerJ, which has a famously relaxed attitude towards reference formatting.

In the long haul, the proper reporting of statistical tests is probably more important to the long term value of a paper in the scientific record than the exact reference format.

Judging from how often I see minimal to inadequate presentation of statistics in manuscripts that I’m asked to review, authors need help. Sure, most authors should “know better,” but journals should provide reminders even for authors who should know this stuff.

How about it, editors?

Additional: One editor took me up on this challenge. Alejandro Montenegro gets it. Hooray!

More additional: And Joerg Heber gets it, too. Double hooray!

References

Curran-Everett D, Benos DJ. 2004. Guidelines for reporting statistics in journals published by the American Physiological Society. American Journal of Physiology - Gastrointestinal and Liver Physiology 287(2): G307-G309. http://ajpgi.physiology.org/content/287/2/G307.short

No comments: