26 September 2017

Microsoft Academic: Second impressions

By happenstance, I thought of Microsoft’s imitation of Google Scholar, Microsoft Academic Research, yesterday. I reviewed it years ago, but hadn’t thought of it much since then. I wondered what happened to it.

Quite a lot, as it turned out. The original website was decommissioned, and the name was shorted to Microsoft Academic. Version 2 launched in July 2017. I thought it was worth a new look.

Sadly, the second look is not much more promising than the first one.

One of the changes is that you can create a profile of your papers. That could be good. I’ve found profiles in other similar sites to be kind of useful. Okay, you have to create an account. No problem, I do that all the time... and you hit the first oddity.

Weirdly, you must use another site to set up your account. You can’t just give an email address and pick a password, like pretty much every other website on the planet. You have to use Twitter, Google, Facebook, Microsoft, or LinkedIn.

I thought, “My institutional ID is handled by Microsoft, and I use that to log in to Office 365, so that should work.” Nope. So I logged in with Google.

I discovered a possible reason why Microsoft Academic won’t let me use my institutional email as I start building a profile. It asks for institution, and snootily insists, “An affiliation must be selected from the list of suggestions.” Except that, according to it, the University of Texas System consists of just five institutions, not fourteen. And my university was not one of those five.

Building out a profile was weird, too. I thought that this next gen scholarly database would support ORCID, so you could enter one number and have it gather all your publications. Nope.

Microsoft Academic seems to identify authors by some mystic combination of name, institution, and... something else. For example, it considers the Zen Faulkes who was at the University of Melbourne a different author than the Zen Faulkes at the University of Texas-Pan American.

So you have to go in and “claim” your papers from however many random ways Microsoft Academic has parsed your name. I have a very distinctive name, and my papers were split into something like ten different authors. I cannot imagine how many different ways publications might be split if you have a common name. Or if you changed your names.

For some of my new papers, I found them by searching the database for the title. I added them to my profile by clicking the paper’s hyperlink, clicking “Claim,” then going to another page and clicking “Submit claim” again. It seemed to be a lot of clicking.

The profile lists “top papers,” but the metric Microsoft Academic uses to determine “top” papers is not clear. It’s not citations, because the number of citations in my list of papers are: 11, 28, 17, 9, 16.

Maybe the profile has a few weak spots, but good coverage of the scientific literature might make it valuable. I searched for “Emerita benedicti” (on my mind since the publication of my newest paper last Friday). That gives 15 hits in Microsoft Academic, but over 2,300 in Google Scholar. But if I search for that exact combination of words in Google Scholar, I am still left with over 32 hits, more than double Microsoft Academic’s yield.

Microsoft makes much of Academic’s “semantic search,” so it may be that I will find it more useful as I try other, more complex searches, rather than simple things like looking for a simple species name.

The home page for all this provides a customized dashboard with a calendar of scientific conference, research news, recent publications, and recent citations of your papers (not visible in the screenshot below).

Google Scholar gives you a couple of alerts on its home page, a sparse approach that leave you with no doubt as to what its job is: Google Scholar is a search engine. It’s not quite clear what Microsoft Academic wants to be. The home page of Microsoft Academic feels more like it wants to be a science news feed; more the science section of Google News than Google Scholar.

Perhaps the kiss of death in all this is that practically everything on this site feels like it’s moving through molasses. It’s sloooooooow. I spent a lot of time looking at a screen like this, waiting for it to populate:

Even while writing this blog post, I got a “You do not have permission to view this directory or page” error message when I tried to go to the home page. Google Scholar feels like it’s using telepathy in comparison. (Update, 26 September 2017: A bot attack may have been responsible for this slow performance.)

I will keep trying Microsoft Academic for a while to see if I learn more. But this project is now at least six years old. And darn it, it still feels like a clunky beta version, just like it did back in 2011, not a modern 2.0 version of an academic search engine that it says it is.

Related posts

Microsoft Academic Research: First impressions

External links

Zen Faulkes’s  Microsoft Academic profile

1 comment:

Mihaela V (@mihaela_v) said...

Hi Dr. Zen! I work for Microsoft Academic (MA) and would like to provide a bit of context to your observations. I am also a tenured professor at Purdue University, but I found the potential of MA so great that I moved to Microsoft to help improve the user experience.

First, thank you for thinking of MA and taking the time to write about it. It seems you looked at it at the same time when one of our engineers was deploying a fix against an attack that has been slowing the site down. While the fix was deployed, the site went down for a few minutes and you got the error message.

As you noticed, there's room for improving the user experience on MA. We are working on that. We are a small team of scientists, and MA was born as a by-product of our research work. People found it useful, so we are happy to improve it and make it into something that's not only useful, but also highly usable.

MA is created by a group inside Microsoft Research working on semantic search. Our main objective is to teach machines to read and understand content meant for humans, in order to empower people to discover and explore knowledge. We use academic publications as a way to conduct and advance research semantic search. We could have used some other data corpus, but as scientists ourselves, this area was particularly interesting.
One of the reasons we don't use ORCID is because we want to conduct research on and improve semantic search. ORCID is meant for machines, not humans. We want to challenge the machine to figure out different authors just like a human would: by looking at the content of the papers (which our artificial intelligence engine reads and understands), at author institutional affiliations, collaborators, etc. This research problem is known as author disambiguation. I, too, have a very unique name, yet the machine splits my publications among different authors that I have to claim manually. (I agree that the claiming process can, and will be, streamlined.) Most of the time, this is not an error. A human being looking at the same publication set might have a hard time figuring out if the papers were written by one or several people.

Our team read your review and we are learning from your feedback. I'd like to briefly respond to a few other points you make:

- Office 365 login - agreed, known, on our to do list;

- "top papers"- yes, different than citation count. We have a paper coming out in Nature soon that explains our ranking approach, and the website will be updated once that is out. As a researcher yourself, I am sure you understand the situation.

- coverage - we have more than 172 Million papers, and growing every week. If a paper is online, or an online paper cites it, chances are we have it or we will soon. The search you conducted was actually a keyword search. Unfortunately, you didn't get to use the power of Microsoft Academic. When you search for a specific term that is not identified as one of our +53K fields of study, we fall back on Bing keyword search. So, that one query you conducted is not necessarily indicative of coverage, or of what our semantic search can do, as you gracefully acknowledge in the post. For a more in-depth analysis of coverage, please see https://harzing.com/blog/2017/06/microsoft-academic-is-one-year-old-the-phoenix-is-ready-to-leave-the-nest.

- indeed, our semantic search is very powerful. It enables users to do things that are simply not possible with keyword-based search. See, for example: https://sway.com/jMpEM89o3nb1x06b?ref=Link and https://twitter.com/MSFTAcademic/status/898255609518112772

Once again, thank you for the time you took to review our work!