13 July 2020

Who am I citing?

There is research that indicates that women scholars are cited less than men. There is research that indicates that Black and brown scholars are cited less than white ones. So I see, and am sympathetic to, calls for people to check who they are citing. Citing only white guys perhaps means you are not capturing the full range of scholarship that exists.

This is harder than it sounds.

Star Trek title card showing "Written by D.C. Fontana" (The "D" was for "Dorothy".)
Some authors want to obscure their gender or background, sometimes to reduce bias. So they use only their initials.

Some journals show only author initials, particularly in the references.

Some names are used by both genders. Names like Terry, Kelly, Zen...

Once you get past your own language and culture, trying to work out gender from the name alone becomes much more difficult. I wonder how well most English speakers would do at guessing the gender associated with Chinese or Indian names.

So when someone asks, “What percent of your reference list in your papers are white men?”, my answer is, “I don’t know.”

I am not sure what the solution here is.

In theory, this kind of demographic data might be registered by ORCID. Eventually, I could imagine a system where you downloaded ORCID into a citation manager, which could then do an analysis on a reference list. Or you could have a plug-in or webpage that did that. But ORCID currently doesn’t capture anything like that. I don’t think any academic database does.

Otherwise, the only answer I can think of is doing a lot of googling, which will probably not lead to definitive answers in many cases.

Update, 14 June 2020: Thanks for Beth Lapour for alerting me to this work. This paper tries to examine citation bias in neuroscience journals. Excerpt from the abstract:

Using data from five top neuroscience journals, we find that reference lists tend to include more papers with men as first and last author than would be expected if gender were unrelated to referencing. Importantly, we show that this imbalance is driven largely by the citation practices of men and is increasing over time as the field diversifies.

They used a couple of automated techniques to try to distinguish gender of the authors. Using two databases, they assigned an author as male or female if their confidence was 70% or better. One was an R stats package. I seem to recall reading criticisms of this package on Twitter, but can’t find it now.

They failed to assign gender for 12% of authors: 7% because there wasn’t high enough confidence by their criteria, and 5% because no author name was available for the paper. I’m not sure what the latter group could be. Unsigned editorials, maybe?

They then tried to find an independent way to check the accuracy for the 88% of authors they assigned a gender. They did this by sampling 200 authors and Google stalking them for pronoun use. And according to that, their algorithmic assignment was about 96% accurate.

So according to this, the problem is currently small. But this is just a snapshot of one field. I wonder if the difficulty will get larger or smaller over time for reasons mentioned in the main post.

Update, 28 July 2020: Marie Rivas articulates some of the reasons I am uncomfortable with using software to assess gender for research purposes: 

1. You cannot “identify” or “verify” gender on behalf of someone else; you can only guess.
2. Guessing gender is often inaccurate, offensive, and exceptionally harmful.
3. You don’t actually need to know peoples’ genders for most use cases; and if you really must know, just ask.

Reference

Jordan D. Dworkin JD, Linn KA, Teich EG, Zurn P, Shinohara RT, Bassett DS. 2020. The extent and drivers of gender imbalance in neuroscience reference lists. Nature Neuroscience: in press. https://doi.org/10.1038/s41593-020-0658-y

No comments: