05 August 2014

Better a deluge than a drought

Another prominent opinion piece is crying again that there is too much low-quality research.

This annoys me so much. It presumes “quality” can be judged immediately and accurately, and that researchers should all be of the same mind about what the “right” research questions to ask are.

I wonder if, in other fields of creative endeavor, people write editorials calling for less work. “People are releasing more songs than ever, but have you noticed that there are still only 52 number one songs on Billboard magazine each year? We haven’t had any increases in number one songs since the 1950s!”

Would anyone ask a musician, “Why don’t you just write hits?” without expecting to get punched in the face? No, because there is some understanding that not everything is going to be a hit.

Let’s look at a few bits of the article here.

Estimates vary wildly, but probably between a quarter and a third of all research papers in the natural sciences go uncited.

“Uncited research” is research that has not been cited yet. This is a great strength of scientific literature: you can go back and look at the old stuff. Some projects have “long tails,” and it’s not possible to know when someone will stumble across something you have created and find out something relevant to what they are doing. For example, Hill (1979) was not cited for almost two decades. Failure? Maybe. But because there is no statute of limitations on when we can cite papers, it eventually was cited. (Yes, I’ve cited it.)

Scientific papers are love letters to the future. We write them in hope that not only will they be useful within the first few years of publication, but that they may be useful to researchers living long after we are not.

Some works will only reach a small audience. That does not automatically make them less worthy, or less influential.

To use the music analogy again:

In 1968, The Velvet Underground were releasing records that very few people bought. But their work lasted, and regularly shows up on “Best of all time” lists.

In comparison, the 1968 Grammy winner for Record of the Year that year was “Up, Up, and Away.”

It’s a breezy, catchy, even memorable tune, but... I bet it doesn’t show on on many “Best of all time” lists now. I wager not many people would know the name of the band now.

This in turn leads to the bane of every scientist's existence: far too many papers to read in far too little time.

Not my bane. It has never been easier for me to find papers that are relevant to my interests, thanks to Google Scholar and similar tools.

One reason is the rise of author-pays open-access publishing. This has the benefit of allowing anyone to read the research in question without paying the publisher, but also has the disadvantage of giving publishers a strong commercial incentive to issue as much content as possible. ...
(S)ubscription business models at least help to concentrate the minds of publishers on the poor souls trying to keep up with their journals.

Elsevier has almost 3,000 technical journals, and Springer has 2,200 journals, and Wiley has 1,500 journals, most of which are subscription journals. That, to me, does not suggest that subscription-based publishers are trying to keep the literature down to a manageable size.

Subscription publishers have incentives to publish more scientific literature, just like open access publishers do. If each journal tends to be profitable, then you have incentive for publishers to make more journals. The more journals they can put in their “big deal” packages, they more they might make.

The incentive to publish is not coming from publishers. The incentive to publish comes from administrations, funding agencies, hiring committees, tenure and promotion committees. They all count publications. This seems indisputable. Indeed, the article goes on to make to admit this:

On one hand funders and employers should encourage scientists to issue smaller numbers of more significant research papers. This could be achieved by placing even greater emphasis on the impact of a researcher's very best work and less on their aggregate activity.

How are we going to evaluate “best” work? Unfortunately, the typical way that “very best work” is evaluate now is the journal Impact Factor (van Dijk et al. 2014). The problems of using Impact Factor to assess individual work are many, to put it mildly (Brembs et al. 2013).

In the end, we get a bait and switch! Instead of what the piece initially calls for (publish less), it ends with a call to publish even more. Now we are supposed to publish data in addition to our papers:

On the other they should require scientists to share all of their results as far as practically possible. But most of these should not appear in the form of traditional scholarly papers, which are too laborious for both the author and the reader to fulfil such a role. Rather, less significant work should be a issued in a form that is simple, standardised and easy for computers to index, retrieve, merge and analyse. Humans would interact with them only when looking for aggregated information on very specific topics.

So the issue is filter failure, not information overload.

Whatever the shortcomings of traditional journal articles are, they realize the awesome power of narrative. This is, I think, the reason why scientific journals have never just published the data, as I wrote before:

If science is purely and solely about “the facts,” why do we publish scientific papers at all? Why not just upload methods and datasets? If you have the data and the methods to generate them, isn’t that all you need to assess the “facts” in play?

(T)here is an inherent connection between stories and experimental science: they are both about causes. A satisfying story is built around causal connections. Without those causal connections, you have a series of disconnected events that makes about as much sense as a random inkblot.

If we struggle with too many papers now, we will struggle even more with too many datasets.

Ignorance is a much, much bigger problem than too much knowledge.

Brembs B, Button K, Munafò M. 2013. Deep Impact: Unintended consequences of journal rank. Frontiers in Human Neuroscience 7: 291. http://www.frontiersin.org/Journal/Abstract.aspx?s=537&name=human_neuroscience&ART_DOI=10.3389/fnhum.2013.00291

Hill GW. 1979. Biogenic sedimentary structures produced by the mole crab Lepidopa websteri Benedict. Texas Journal of Science 31(1): 43-51.

van Dijk D, Manor O, Carey LB. Publication metrics and success on the academic job market. Current Biology 24(11): R516-R517. http://www.cell.com/current-biology/abstract/S0960-9822(14)00477-1

Related posts

I’m a pebble in the avalanche
Balkanizing small universities 
Storytelling is dead, long live narrative

External links

Stop the deluge of science research

Photo by Broo_am (Andy B) on Flickr; used under a Creative Commons license.


Jim Till said...

"So the issue is not filter failure, not information overload."

I assume that the first 'not' is an error, and that the sentence should be: "So the issue is filter failure, not information overload."

Carl said...

Nice post, I quite agree with your perspective here, but I got a rather different impression about what the author's real thesis is -- I think he'd agree with you.

As I read it, the author is not any way decrying 'too much low quality research', but quite the opposite. The piece is really advocating for publishing research in a format that scales, so that we can take better advantage of it.

I believe the success of existing databases like Genbank and methods implementations like we have in R packages prove this point wonderfully. No one would suggest that we ignore the existence of R or Genbank and just rely on Google Scholar to find papers that describe each gene sequence they will blast against or each algorithm subroutine they will then implement by hand. Conversely, every time we use these tools we are benefitting from huge numbers of papers we've never read, couldn't read in 100 lifetimes. Data and methods make much more scalable building blocks than narrative alone.

I think the Guardian piece is advocating exactly this. That if more papers published methods and data, they could have a much larger contribution than they do now. The real thesis is not the first sentence, but the last.

(My one gripe with the piece is the economics -- OA publishers have no more incentive to publish lots of low-quality stuff than Apple does to sell lots of low quality stuff).

Zen Faulkes said...

Jim: Doh. Just fixed that.

Carl: You may be right that I failed to determine the most important bits of the article. It made several points, and they seemed only loosely connected.

I'd have to think longer and harder about issues of publishing datasets and, as you put it, "scaling" research findings.