Browsing by Subject "ingest"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Introducing MAGPIE (Metadata Assignment GUI Providing Ingest and Export)(2015-05-26) Welling, William; Elmquist, Stephanie; Creel, James; Huff, Jeremy; Savell, Jason; Mathew, Rincy; Hahn, Doug; Bolton, Michael; Texas A&M UniversityThe Libraries at Texas A&M University have curated immense output from graduate programs for many decades. With the advent of the Vireo ETD (Electronic Thesis and Dissertation) submittal system, dissertations have been submitted in digital format and made available for download from TAMU’s OAKTrust institutional repository. However, many older dissertations are only discoverable through TAMU’s Voyager based online card catalog and are publicly available to visiting researchers in print format. A current digitization effort will make available these dissertations online at OAKTrust. The tool being developed for this purpose is designated MAGPIE (Metadata Assignment GUI Providing Ingest and Export). For the dissertation use case, librarians specified that the tool should display scanned PDF files and OCR (optical character recognition) text output from a file system. The tool then presents these data to annotators (typically, student workers) to augment and amend metadata. The presentation interface reads metadata, in this case MARC records, from TAMU’s Voyager card catalog database, thereby pre-populating important fields, such as the title and author name. However, a number of other fields, such as the abstract and names of committee members, do not exist in the card catalog but are available in the document itself. The annotator can simply copy and paste these character strings from the source document into a metadata input form specifically configured for the legacy dissertation digitization and preservation project. The MAGPIE workflow allows a manager to amend, reject, or approve these metadata entries, and to push approved documents into the OAKTrust repository with a single click. The MAGPIE tool has been developed using the Weaver framework, an open source web-development front-end and web service code-base from TAMU Libraries. The web service is built on top of Spring-boot, which is a popular framework with a large and growing community with documentation and support. The front-end of the web-stack consists of AngularJS and Bootstrap. The Weaver framework offers certain advantages, such as automatic updates of document status in the browser window without a page reload. The MAGPIE tool has also been developed with future projects in mind – the importation of content is modular and customizable, as is the metadata import service, the metadata form, and the export/push functionality. We anticipate that the MAGPIE tool will find use for metadata enhancement and automatic repository deposit of newspapers, images, and other institutional collections with or without existing metadata. In this talk, we will examine the initial use case of scanned legacy dissertations, provide some background on the MAGPIE software and its development, demonstrate the functionality of the tool, and conclude with an overview of future ambitions.Item Managing Digital Assets from Curation to Exhibition(2017-05-25) Welling, William; Creel, James; Huff, Jeremy; Savell, Jason; Frazier, Simon; Hahn, Douglas; Bolton, MichaelAs the Libraries at Texas A&M University continue to accumulate digital assets and cultivate a Digital Asset Management Ecosystem (DAME), an urgent need is developing for metadata annotation workflows and the marshalling of documents from digitization to exhibition. Last year the Libraries presented an alpha version of MAGPIE (Metadata Assignment GUI Providing Ingest and Export) and demonstrated the curation of a scanned dissertation and its export to a DSpace repository. Since then we have built upon MAGPIE to integrate it into the DAME more broadly. MAGPIE is positioned to serve multiple projects including scanned legacy dissertations, historic and modern agricultural serials, and special image collections destined for Spotlight exhibits. The original use case of MAGPIE was to shepherd previously cataloged historic dissertations from scanning and OCR through additional curation and finally into the OAKTrust institutional repository (IR). However, as additional digitization and curation projects have manifested, MAGPIE has been a natural fit to accommodate the varying types of IR, metadata authorities, suggestion providers, and exporters. The application includes a repository interface enabling publishing metadata and assets into an IR as a single item or batch. Current implementations include Fedora and DSpace, both via their REST APIs. The interface for consuming authoritative metadata has been implemented for Voyager (again, over a REST API) and CSV spreadsheets on disk. The application also has an implementation interfacing with a metadata suggestion service using the National Agriculture Library Thesaurus. Implementations of the exporter interface provide CSV metadata spreadsheets for download and Archivematica metadata spreadsheets and DSpace Simple Archive Format (SAF) direct to the server filesystem. As applied to the scanned legacy dissertation project, MAGPIE prepopulates document MARC metadata from a Voyager authority, enables enhancement of the metadata by curators who can read the PDF or extracted text, and facilitates publication into a DSpace Repository. Batch publications can be done with an exported SAF or on an item-by-item basis in the UI with a RESTful push. The Agricultural Research Bulletins have their metadata prepopulated by a provided CSV and further enhanced by providing suggestions via semi-automatic indexing. This project also has the same ability to publish into DSpace. The Spotlight exhibit project is accommodated with the following workflow: MAGPIE prepopulates image metadata via a CSV authority, RESTfully pushes items in batch to Fedora, and allows export CSV for ingest into Spotlight. In this talk, we will examine how MAGPIE is accommodating rapid growth of our architecture, provide some background on continuing software development and improvements, demonstrate the functionality with multiple repositories and the Spotlight exhibit software, and conclude with the future direction.