What's Behind Door Number 2? Discovering and Using Hidden APIs to Automate Repetitive Tasks

Date

2023-05-17

Authors

Hoover, Susan

Journal Title

Journal ISSN

Volume Title

Publisher

Texas Digital Library

Abstract

At my institution, we have been working on a project to digitize approximately 19,000 theses and dissertations from 1940 to 2010. We sorted these into three batches based on the copyright laws in effect at the year of publication. For the oldest and newest theses it was a straightforward task to determine copyright status. The interesting period is 1978 to 1988, for which we needed to check each of 3700 theses to see whether it had been registered in the copyright database. At an optimistic rate of one lookup per minute, we were still looking at a week and a half of person-time to check the copyright status of this batch.

In this presentation I will show how I solved the volume problem by using browser developer tools to locate and explore an undocumented API on the copyright website and by creating a Ruby script to automate the copyright lookup. I will also show how I modified the lookup as I learned the quirks of the copyright website.

Description

TCDL 2023 Session 2D, Wednesday, 5/17/2023, 11:00 am to 12:00 pm | Moderated by Gabrielle Hernandez, University of Texas Rio Grande Valley | Lightning Talk | Research, ScholComm, & Digital Humanities

Citation