OpenRefine: or How I Learned to Stop Worrying and Love Data Transformation Track Presentations




Long, Kara
Stuhr, Darryl

Journal Title

Journal ISSN

Volume Title



The Baylor University Library launched its first digitization project in 1999, with the Spencer Collection of American Popular Sheet Music. The first phase of the project was to scan and place online 1,000 pieces of music, out of nearly 30,000 pieces in the print collection. The online collection now comprises over 7,000 titles of American sheet music from the 18th to the 20th centuries. A major challenge throughout the project has been generating rich metadata for the digital objects. In 2008, the Library contracted with Flourish Music Cataloging to outsource the creation of MARC records describing the print collection. These records are transformed into metadata describing the digital collection as well.

This presentation will cover the history of this project and the evolving workflow, as well as demonstrate the most recent change – implementing OpenRefine into the workflow to quality check and transform the metadata from the catalog records. This transformed metadata is used to generate a CONTENTdm load file. This presentation will interest metadata librarians, especially those interested in OpenRefine, and CONTENTdm administrators.


Presentation for the 2016 Texas Conference on Digital Libraries (TCDL).