Querying & Accessing Scholarly Literature Metadata: Using rcrossref, rorcid, and roadoi


Librarians are increasingly called on to gather and analyze metadata from the scholarly literature. This may include understanding open access publishing at their own institutions, publication patterns in specific disciplines or journals, citation analysis, and much more. Software developers have created a number of packages for accessing the scholarly literature in R over the last several years: among them rcrossref, rorcid, and roadoi. These packages make use of the APIs in their respective systems to allow users to execute specific queries, and pull the structured data into R, where it can be reshaped, merged with other data, and analyzed. While some experience with working in R will be helpful, this session will assume no knowledge of R. Therefore the session will begin with a brief introduction to what R is, what it can do, and how to operate in the R Studio environment. In advance of the workshop, attendees will be provided full instructions for installing R and preparing their computers for the session. They will also be provided pre-written R scripts, as well as step-by-step instructions for each section of the course. This will help ease them into using R, and will serve as a resource they can use in the future as they make their own queries.
Three R packages will be introduced that allow us to access the scholarly literature. rcrossref interfaces with the CrossRef API, allowing users to pull article metadata based on DOIs, keywords, funders, authors, and more. This can be immensely powerful for collecting citation data, conducting literature reviews, understanding publication patterns, and more. rorcid interfaces with the ORCID API, allowing users to pull publication data based on a specific ORCID iD, or to input names and other identifying information to find a specific individual’s identifier. Finally, roadoi interfaces with Unpaywall, allowing users to input a set of DOIs and return publication information along with potential locations of open access versions.

By the conclusion of the session, attendees will be able to work with and analyze data in R on a basic level, and will be familiar with some of the major functions in each of the listed packages. On a deeper level, they will have more powerful tools for gathering subsets of the scholarly literature, in clean and structured formats, based on specific parameters.

Furthermore, as the session is designed to provide basic competence in R, attendees will be able to make use of a far more powerful tool than spreadsheet software, such as Excel. As librarians are increasingly required to master and make sense of data, using R provides many more paths for analysis and visualization, and therefore understanding of that data.


Presented by Oklahoma State University, Pre-conference Workshop, at TCDL 2019.