Introducing MAGPIE (Metadata Assignment GUI Providing Ingest and Export)
MetadataShow full item record
The Libraries at Texas A&M University have curated immense output from graduate programs for many decades. With the advent of the Vireo ETD (Electronic Thesis and Dissertation) submittal system, dissertations have been submitted in digital format and made available for download from TAMU’s OAKTrust institutional repository. However, many older dissertations are only discoverable through TAMU’s Voyager based online card catalog and are publicly available to visiting researchers in print format. A current digitization effort will make available these dissertations online at OAKTrust. The tool being developed for this purpose is designated MAGPIE (Metadata Assignment GUI Providing Ingest and Export). For the dissertation use case, librarians specified that the tool should display scanned PDF files and OCR (optical character recognition) text output from a file system. The tool then presents these data to annotators (typically, student workers) to augment and amend metadata. The presentation interface reads metadata, in this case MARC records, from TAMU’s Voyager card catalog database, thereby pre-populating important fields, such as the title and author name. However, a number of other fields, such as the abstract and names of committee members, do not exist in the card catalog but are available in the document itself. The annotator can simply copy and paste these character strings from the source document into a metadata input form specifically configured for the legacy dissertation digitization and preservation project. The MAGPIE workflow allows a manager to amend, reject, or approve these metadata entries, and to push approved documents into the OAKTrust repository with a single click. The MAGPIE tool has been developed using the Weaver framework, an open source web-development front-end and web service code-base from TAMU Libraries. The web service is built on top of Spring-boot, which is a popular framework with a large and growing community with documentation and support. The front-end of the web-stack consists of AngularJS and Bootstrap. The Weaver framework offers certain advantages, such as automatic updates of document status in the browser window without a page reload. The MAGPIE tool has also been developed with future projects in mind – the importation of content is modular and customizable, as is the metadata import service, the metadata form, and the export/push functionality. We anticipate that the MAGPIE tool will find use for metadata enhancement and automatic repository deposit of newspapers, images, and other institutional collections with or without existing metadata. In this talk, we will examine the initial use case of scanned legacy dissertations, provide some background on the MAGPIE software and its development, demonstrate the functionality of the tool, and conclude with an overview of future ambitions.