16 August 2012

Self archiving science is not the solution

I can’t watch “The Power of the Daleks.”

The British Broadcasting Corporation (BBC), feeling that television was a disposable, ephemeral medium, junked many tapes of television series like Doctor Who. They even junked notable ones like the first episode of Patrick Troughton’s second Doctor facing the best known and most popular villains of the entire series.

Archiving is a tricky thing.

The card game Legend of the Five Rings is about 15 years old. But it’s almost impossible to find material about the game’s influential first couple of years. A lot of key discussion happened on a company email listerv. The one copy someone from the company was keeping was lost when a hard drive crashed.

I went looking for a podcast of a story I’d heard on Quirks and Quarks about singing mice. I wanted to listen to the audio again. But their online archive goes back to 2006, and the story I remembered was from November 2005.

You might think. “Well, that’s just pop culture,” but the same issue plague science. Physicist Slava Turyshev, who is working on the Pioneer spacecraft, noted:

To study the Pioneer anomaly we needed the probe’s navigational data. But mission tapes were normally saved for only a few months and then thrown away, so you’re lucky if you can find what you need.

Today, some experiments generate so much data that chucking the data away is a necessary part of the process because we just can’t keep it all. This happens at the Large Hadron Collider, for instance.

Just as I posted this, I learned a personal subscription to Nature only gets you papers back to 1997.

I mention these because one of the ideas to reform the scientific publishing enterprise is one where journals are removed from the picture entirely. Scientific communication should be replaced by blogs and self archiving, according to some. This is not a vision I can get behind, because I’ve hung around with taxonomists.

Taxonomists are often accused of being too conservative and hidebound in publishing. The botantists required species descriptions paper and Latin species descriptions until very recently. But I have to say, as a group, they made me more aware of the importance of archiving than any other. Taxonomists are often routinely revisiting literature that is centuries old. How am I going to make sure someone can read my papers in the twenty-fifth century?

Björn Brembs argues that libraries can effectively take the place of journals. That is certainly an improvement over self-archiving, though I think we have a long way to go on that front. At our institution, one enterprising librarian started an institutional repository for faculty, where we could deposit not only our reprints, but more ephemeral work like conference posters.

But she quit. And since then, I’m not even sure who, if anyone, is in charge of the repository. It’s been a while since I deposited anything there.

While I personally plan to live forever, in the remote chance that some accident occurs where I don’t, how can I make sure my scientific contributions are available to researchers one hundred years from now? Two hundred? Three? Sticking my own PDFs on my own university server is not going to cut it. I can’t control what happens to my papers after my death. Maintaining the scientific record needs to be done by communities and institutions, not individuals.

Hat tip: This grew in part out of a conversation that bubbled up over at Living in an Ivory Basement.


Bjoern Brembs said...

Indeed, it's communities and institutions that should do the archiving and I can't think of any better institution that our own libraries. Clearly, this effort in the future needs to be as much a matter of course as libraries having books is today.
Maybe one day it might happen that the person taking care of the books leaves for another job and then the few people who actually still wander into a library can't find them any more...

Zen Faulkes said...

Indeed, the more I think about this issue, the more in awe of libraries I am.

I am still trying to work out, though, if my own single university library is a sufficient repository.

Bjoern Brembs said...

"I am still trying to work out, though, if my own single university library is a sufficient repository. "

Of course not - torrent-like sharing is the key. "Public Library of Science" and I don't mean the non-profit in San Francsisco! :-)

That way, half or more of the libraries can go down and nothing is lost. Also, that system always guarantees the fastest connections.

Of course, the libraries wouldn't just archive the papers: data and software tools need to go there as well: everything you need to do track, understand or replicate science. All of this needs to be under our own control and not some short-lived company with their own, diverging interests in mind.

ExLabJunkie said...

Our federal government in Canada have been closing numerous libraries because they believe all the info we need can be found on the net. In my work, I need to locate original articles, not just reviews of work done. I know only too well how much cannot be found using only internet searches. Archiving cannot be left to the search engines of the day. We need to support and value our libraries.

Heather Morrison said...

arXiv is a great model - supported by Cornell University Library and many others - with 18 mirror sites. I recommend cross-posting to institutional repositories, too. We need more arXivs - and go talk to your library, tell them they need to commit to providing the service!

Kevin said...

I think the key to this problem is duplication. Self-archive, throw it up on Pubmed central (if in the US) or whatever government server, arXive.org or similar non-profit repository etc. Institutions may have more longevity than people, but they're still transient.

Duplication of medium is important too. as ubiquitous as .pdf is now, who knows if the tools to read that format will be around in 50 or 100 years? A lot of people criticize the fact that we need to submit a hard copy of our PhD theses for reproduction on microfilm, but the fact is that hard-drives crash, servers fail, and natural disasters happen. Digital madia is still physical at some point, and all are potentially vulnerable.

Disk space is getting cheaper all the time - there's no excuse not to spread this stuff around for a bit of redundancy.