18 March 2024

Contamination of the scientific literature by ChatGPT

Mushroom cloud from atomic bomb
I’ve written a little bit about how often notions of “purity” come up in discussions of scientific publishing.

I see lots hand waving about the “purity and integrity of the scientific record,” which is never how it’s been. The scientific literature has always been messy.

But in the last couple of weeks, I’m starting to think that maybe this is a more useful metaphor than it has been in the past. And the reason is, maybe unsurprisingly, generative AI. 

Part of my thinking was inspired by this article about the “enshittification” of the internet. People are complaining about searching for anything online, because so much of search results is being dominated by low quality content designed to attract clicks, not be accurate. And increasingly, that’s being generated by AI. Which was trained on online text. So we have a positive feedback loop of crap.

(G)enerative artificial intelligence is poison for an internet dependent on algorithms.

But it’s not just the big platforms like Amazon and Etsy and Google that is being overwhelmed by AI content. Academic journals are turning out to be susceptible to enshittification.

Right after that article appeared, science social media was widely sharing examples of published papers in academic journals with clear, obvious signs of being blindly pasted from generative AI large language models like ChatGPT. Guillaume Cabanac has provided many examples of ChatGPT giveaways like, “Certainly, here is a possible introduction to your topic:” or “regenerate response” or apologizing that “I am a large language model so cannot...”.

It’s not clear how widespread this problem is, but that even these most obvious examples are not getting screened out by routine quality control is concerning.

And another preprint making the rounds show more subtle telltale signs that a lot of reviewers are using ChatGPT to write their reviews.

So we have machines writing articles that machines are reviewing and humans seem to be hellbent on taking themselves out of this loop no matter what the consequence. I can’t remember where I first heard the saying, but “It is not enough than a machine knows the answer” feels like an appropriate reminder.

The word that springs to mind with all of this is “contaminated.” Back to the article that started this post:

After the world’s governments began their above-ground nuclear weapons tests in the mid-1940s, radioactive particles made their way into the atmosphere, permanently tainting all modern steel production, making it challenging (or impossible) to build certain machines (such as those that measure radioactivity). As a result, we’ve a limited supply of something called “low-background steel,” pre-war metal that oftentimes has to be harvested from ships sunk before the first detonation of a nuclear weapon, including those dating back to the Roman Empire.

Just like the use of atomic bombs in the atmosphere created a dividing line of “before” and “after” there was widespread contamination of low-level radiation, the release of ChatGPT is enhancing and deepening another dividing line. Scientific literature has been contaminated with ChatGPT. Admittedly, this contamination might turn out to be at a low level that might not even be harmful, just like most of us don’t really think about the atmospheric radiation from years of above ground testing of atomic bombs.

While I said that it isn’t helpful to talk about “purity” of academic literature before, I think this is truly a different situation than we have encountered before. We’re not talking about messiness because research is messy, or that humans are sloppy. We’re talking about an external technology that in impinging on how articles are written and reviewed. It is a different problem that might warrant describing it as “contamination.”

(I say generative AI is deepening the dividing line because the problems language AI are creating were preceded by the widespread release and availability of Photoshop and other image editing software. Both have eroded our confidence that what we see in scientific papers represents the real work of human beings.)

Related posts

How much harm is done by predatory journals? 

External links

Are we watching the internet die?

12 March 2024

Scientists will not “just”: Individual scientists can’t solve systemic problems

Undark has an interview with Paul Sutter about the problems of science. Now before I get into my reaction, I want to say that this interview was conducted because Sutter has a book, Rescuing Science, that covers these topics. I haven’t read the book. Maybe it has more nuance than this short interview.

Sutter has a lot of complaints about the current state of science, but his big one?

(A)n inability for scientists to meaningfully engage with the public.

The interview tries to peel back the layers of why this is. Sutter, like many academics, blames the incentives.

We, as a community of scientists, are so obsessed with publishing papers (that) this is causing is an environment where scientific fraud can flourish unchecked.

Sutter goes on to critique journal Impact Factor and h-index and peer review and a lot of the usual suspects. But Sutter’s solution to these big systemic problems might be summed up as, “Scientists need to get out more.” He wants scientists to do more public communication. Lots more.

Scientists should be the face of science. How do we increase diversity and representation within science? How about showing people what scientists actually look like. How do people actually understand the scientific method? What if scientists actually explained how they’re applying it in their everyday job.

This is a very familiar view to me. It’s why I started blogging here more than twenty years ago. Much of what I have achieved professionally, I can credit in part or in whole, to blogging and otherwise being Very Online. But I try to have a clear-eyed view of what I was able to achieve in building trust in science: probably not much.

I see three problems.

First. it feels like Sutter is saying, “If only scientists would just talk to non-scientists more.” I am reminded of this:

If your solution to a human problem involves the phrase “If only everyone would just...”, you don’t have a solution. Never in the History of Ever has everyone “just” anything... and we’re not likely to start now. 

(This tweet is from Laura Hunter. I remember seeing some version of this on Twitter, but can’t recall if I saw Laura Hunter’s tweet specifically.)

Second, lots of excellent scientists are not great at communicating to non-specialist audiences. They can’t stop using words like “canonical” in interviews. They aren’t trained in this.

Third, Sutter points out that the interests of science journalists are not always aligned with the interests of scientists. But that is a feature, not a bug. Journalists are not supposed to be stenographers, or cheerleaders, for science. And Sutter spends lot of time criticizing journalism while the profession is practically collapsing in front of our eyes while we watch. Local news outlets are vanishing. Online publications are shuttering and laying off hundreds of staffers. Popular Science is gone.

Wait, I have more than three.

Fourth, this sounds like trying to fix traffic jams (systemic problems) by asking people to drive carefully (individual actions). That doesn’t have a good record of success. 

In the interview, Sutter doesn’t come to grips with the systemic issues of money and power that are being leveraged to make coordinated attacks on science.

I’ve said it before: Individual scientists – who are struggling to write grants after grant to keep their labs afloat – are not on a level playing field with international media corporations backed by billions of dollars. Or tech corporations that write recommendation algorithms. Or religions that have a several thousand year head start in winning hearts and shaping culture.

I am guessing that if Sutter were to point to the sort of people who exemplify his preferred method of restoring trust in science by getting out and talking publicly, he might point to Anthony Fauci or Peter Hotez – both of whom were excellent communicators throughout the COVID-19 pandemic. But we have seen the power asymmetry: Fauci and Hotez were physically threatened for publicly talking about science.

Anti-science is large, well-funded, and organized. Single scientists with a blog – or even a few invitations to speak on national platforms – are overmatched.

External links

Paul M. Sutter Thinks We’re Doing Science (and Journalism) Wrong

Rescuing Science

11 March 2024

How Godzilla movies reflect scientific research

Leonard Maltin
I have a very specific memory of film critic Leonard Maltin on Entertainment Tonight reviewing Godzilla 1985 (the first American release of 1984’s Gojira or The Return of Godzilla). Maltin said something like, “Many remakes fail because they stray too far from the original. Godzilla 1985 doesn’t have that problem.”

It’s still the same cheap Japanese monster movie.”

That stung so much that here I am remembering it almost 40 years later.

I can’t think of anything that summarized the attitude towards Godzilla for so long. 

So last night’s Oscar win for Godzilla Minus One feels like vindication for a lifelong fan like me. 

Godzilla -1 effects team with Oscars

For decades, Godzilla movies were the butt of jokes. And deservedly so, I have to say. As much as I count myself as a Godzilla fan, I have no desire to watch Son of Godzilla or All Monsters Attack (Godzilla’s Revenge) ever again. (Shudder.)

But fandom is a funny thing. You love the things you love and it still kind of stings when you you hear them derided.

But somewhere in the years after Maltin snubbed the 30th anniversary movie, something shifted in people’s attitude towards Godzilla.

Those of us who watched a few dubbed movies as kids remembered Godzilla as we grew up. I heard in the 1990s that there was a new “high tech” series of Godzilla movies being made in Japan. The Internet removed friction for finding out fannish stuff. You could find retrospectives about the making of the series in English.

And say what you will about the American Godzilla made in 1998, Hollywood wouldn’t have shelled out the cash to try to make that movie a big summer blockbuster if there wasn’t some sort of name recognition.

After all those years in the wilderness, what films could show was finally catching up with the visions of Godzilla that fans held in their heads.

And it occurred to me that this is sometimes how science works.

You have an idea. You get it out there. 

Maybe it’s derided as cheap and mostly dismissed. But you try again.

And other people pick up some aspect of it. And maybe sometimes the results are embarrassing, with the offshoots are not as good as the original was.

And sometimes, if you wait and keep trying, that original idea somehow stands the test of time. Other people come around and start to recognize that idea was a good idea. And you end up with something that gets better that you ever though.

Related posts

“Why do you love monsters?”

10 March 2024

How can we fix Daylight Saving Time?

I’ve mentioned this on social media before, but I might as well get it into the blog for archival purposes.

We just switched to Daylight Saving Time overnight. Nobody likes this switch, because you lose an hour, it messes with sleep, you have to change microwave clocks we always forgot how to do that.

It occured to me that the fundamental problem with the change is that an hour is too big a step in one go. But nobody notices when timekeepers throw in a leap second. (Yes, that is a real thing I learned about years ago.)

Instead of two hour long time shifts twice a year, why not make more but smaller changes? Like 10 minutes, every month. It comes to the same amount of time over the course of a year.

“Zen, that may be easier on sleep cycles, but that means ten more annoying clock changes every year.”

Use new technology instead of being stuck using old technology. More and more clocks are setting themselves automatically using an Internet signal or some radio signal from an atomic clock. We could get even more if we mandated self-setting clocks in new equipment

Build the future, not the past.

21 February 2024

I told you transcript changes didn’t affect grade inflation

Way back when, I blogged about a Texas proposal to include average course grades next to a student’s earned grades on the student transcript. The argument was that this could be a way to curb grade inflation. I was skeptical. 

This never came to pass in Texas, but what I didn’t know at the time that this was the practice at Cornell University.

A practice they just stopped.

It turned out that – surprise! – showing average class grades didn’t stop grade inflation. In fact, showing class averages probably increased grade inflation. Because with easy access to average course grades, students preferentially took the classes seen to be “easy A’s”.

I have to admit I didn’t see that possibility, but it tracks.

Related posts

The “Texas transcript” is a good idea, but won’t solve grade inflation

External links

Cornell Discontinues Median Grade Visibility on Transcripts 15 Years After Inception  


19 February 2024

Rats, responsibility, and reputations in research, or: That generative AI figure of rat “dck”

Say what you will about social media, it is a very revealing way to learn what your colleagues think.

Last week, science Twitter could not stop talking about this figure:

Figure generated by AI showing rat with impossibly large genitalia. The figure has labels but none of the letter make actual words.

There were two more multi-part figures that are less obviously comical but equally absurd.

The paper these figures were in has now been retracted, but I found the one above in this tweet by CJ Houldcroft. You can also find them in Elizabeth Bik’s blog.

This is clearly a “cascade of fail” situation with lots of blame to go around. But the discussion made me wonder where people put responsibility for this. I ran a poll on Twitter and a poll on Mastodon asking who came out looking the worst. The combined results from 117 respondents were:

Publisher: 31.6%
Editor: 30.8%
Peer reviewers: 25.6%
Authors: 12.0% 

I can both understand these results to some degree and have these results blow my mind 🤯 a little. 

People know the name of the publisher, and many folks have been criticizing Frontiers as a publisher for a while. Critics will see this as more confirmation that Frontiers is performing poorly. So Frontiers looks bad.

The editor and peer reviewers look bad because, as the saying goes, “You had one job.” They are supposed to be responsible for quality control and they didn’t do that. (Though one reviewer sad he didn’t think the figures were his problem, which will get its own post over on the Better Posters blog later.)

But I am still surprised that the authors are getting off so lightly in this discussion. It almost feels like blaming the fire department instead of the arsonist.

At the surface level, the authors did nothing technically wrong. The journal allows AI figures if they are disclosed, and the authors disclosed it. But the figures are so horribly and obviously wrong that to even submit it feels to me more like misconduct than sloppiness.

And is so often the case, when you pull at one end of a thread, it’s interesting to see what starts to unravel.

Last author Ding-Jun Hao (whose name also appears in papers as Dingjun Hao) has had multiple papers retracted before this one (read PubPeer comments on one retracted paper), which a pseudonymous commenter on Twitter claimed was the work of a papermill. Said commenter further claimed that another paper is from a different papermill.

Lead author Xinyu Guo appears to have been author on another retracted paper.

I’ve been reminded of this quote from a former journal editor:

“Don’t blame the journal for a bad paper. Don’t blame the editor for a bad paper. Don’t blame the reviewers for a bad paper. Blame the authors for having the temerity to put up bad research for publication.” - Fred Schram in 2011, then editor of Journal of Crustacean Biology

Why do people think the authors don’t look so bad in this fiasco?

I wonder if other working scientists relate all to well to the pressure to publish, and think, “Who among us has not been tempted to use shortcuts like generative AI to get more papers out?”

I wonder if people think, “They’re from China, and China has a problem with academic misconduct.” Here’s an article from nine years ago about China trying to control its academic misconduct issues.

I wonder if people just go, “Never heard of them.” Hard to damage your reputation if you don’t have one.

But this strategy may finally be too risky. China has announced new measures to improve academic integrity issues, which could include any retracted paper requiring an explanation. And the penalties listed could be severe. Previous investigations of retractions in China resulted in “salary cuts, withdrawal of bonuses, demotions and timed suspensions from applying for research grants and rewards.”

Related posts

The Crustacean Society 2011: Day 3

References

[Retracted] Guo X, Dong L, Hao D, 2024. Cellular functions of spermatogonial stem cells in relation to JAK/STAT signaling pathway. Frontiers in Cell and Developmental Biology 11:1339390. https://doi.org/10.3389/fcell.2023.1339390

Retraction notice for Guo et al.

External links

Scientific journal publishes AI-generated rat with gigantic penis in worrying incident

Study featuring AI-generated giant rat penis retracted, journal apologizes 

The rat with the big balls and the enormous penis – how Frontiers published a paper with botched AI-generated images

China conducts first nationwide review of retractions and research misconduct (2024)

China pursues fraudsters in science publishing (2015)

29 January 2024

A JIF myth

“Why do zebras have stripes?”

“Why did T. rex have such small arms?”

In evolutionary biology, functional questions like those are notoriously tricky to answer, because people tend to mix up two separate questions: the origin of the feature, and the current use of the feature.

People often have difficulty grasping that those two things are different. This leads to questions like, “What good is half a wing?” . The implication is that that because wings are used for flying now, they must always have been used for flying, so how could they evolve?

The answer is that the bits that make up a wing can be used for lots of things, and that you can have a functional shift. Something that’s great for insulation or display like feathers proves incidentally useful for gliding. The incidental use eventually becomes the primary use.

Clarivate Analytics Journal Impact Factor
I mention this because I stumbled across a myth about the Journal Impact Factor™️. You will find statements here and there Eugene Garfield created the Impact Factor to help libraries decide what journals to buy. This was back in the day when journal subscriptions were sold individually rather than in bundled “big deals.”

But Garfield’s own account of the history of the Impact Factor shows this is not true (link to edited version; full version can be found in external links).

Garfield was involved in creating the Genetics Citation Index in the 1950s, and needed to decided what journal to include in the index. They first tried just counting citations to journals, which favoured journals that just published a lot of papers. They realized that “total citations” missed journals that published fewer papers that were highly cited.

Impact Factor helped solve this problem. It wasn’t about libraries at all. So where did the “Impact Factor was created for libraries” belief come from?

In the 1960s, the Genetics Citation Index broadened out to become the Science Citation Index. Garfield did lots of research on citation patterns, and published a 1972 paper in Science about citation analysis. Garfield talks about all the findings using this measure, Impact Factor, that he created more than a decade earlier.

But at the end of the paper, there’s a section that begins, “Some applications,” and the first sentence is:

The results of this type of citation analysis would appear to be of great potential value in the management of library journal collections.

And that’s the origin of the myth that Impact Factor was created for libraries.

That 1972 Science paper went on to become the one that many people referred to in discussions of Impact Factor. And it’s easy to see how Garfield’s suggestion that libraries could use Impact Factor to make purchasing decisions could morph into, “Garfield created Impact Factor for libraries.” Because people don’t always read the original papers of the things they cite in details.

The Impact Factor not only shows how easily origin gets muddled, it also shows the concept of functional shift. Because libraries did use it for purchasing decisions, and then administrators started using it to make hiring, tenure, and promotion decisions. DORA is trying to provide a new selection pressure that could cause another functional shift, away from evaluating researchers.

External links

Essays about Impact Factor by Eugene Garfield

Garfield E. 1972. Citation analysis as a tool in journal evaluation. Science 178: 471-479. https://doi.org/10.1126/science.178.4060.471