Approaches developed to ensure accuracy and consistency of metadata for TRAIL reports

Date

2017-05-24

Authors

Rosenbeck, Craig

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The TRAIL collection is a compilation of technical reports funded by government research and published primarily ranging from 1920’s to 1980’s about a variety of topics. The TRAIL consist of National Advisory Committee for Aeronautics (NACA) and Non-NACA objects and TRAIL stands for Technical Report Archive and Image Library. The collection consists of approximately twenty thousand objects spanning thirteen decades, covering fifty states and forty-seven countries.

The University of North Texas (UNT) libraries harvested records NACA reports from the NASA website with existing metadata. There were some reports harvested from other TRAIL initiatives. The scanning of the Non-NACA reports were mostly done at the UNT libraries. In the early phases of the project, which was approximately around 2010, partial records were done, with intention of completing them at a later date. After the initial phase of the project, metadata is based on MARC records and editing was done at the UNT libraries in order to meet UNT libraries’ standards. Metadata came from different sources and had different levels of remediation. We needed a way to evaluate which records are most in need of editing and to identify problems we specifically wanted to target, instead of editing every record.

There are special aspects for quality control implemented to safeguard accuracy and consistency in the digital library. The UNT libraries chose two primary issues pertaining to the TRAIL project, which could be identified and quantitatively measured. The first issue relates to incomplete records, records not having all eight required fields. The UNT libraries metadata guidelines define a minimally-viable record as having values for each of the eight required fields include: (main title, language, content description, subject, collection, institution, resource type, and format). Since, field values can be measured by the system, we can easily find records that are not “complete” based on that criteria and keep track of the number of records completed. The second primary issue refers to records that have creation dates at the first of month. This criterion is crucial because UNT libraries discovered there were discrepancies in the frequency of dates falling at the beginning of the month, mostly records harvested from NACA. The possible reason is due to a requirement or an issue entering dates into the database. An identifiable/measurable action can be implemented making minor changes to increase overall accuracy of these specific records. The priority is given to completing and fixing records of scanned objects.

The graphs will display measured progress improving consistency and completeness of metadata in collection. An explanation will be given on why UNT libraries chose the criteria and other problems occurred that are not measurable. A description will be given on how improvement of findability and user experience.

This poster may assist other institutions in identifying measurable problems in their metadata related to accuracy, consistency, or completeness. The implications presented in this poster, shows UNT libraries designed a plan, justified the plan, and able to show measureable results; which allows other institutions to begin implementing consistent steps to improve records.

Description

Poster presentation for the 2017 Texas Conference on Digital Libraries (TCDL).

Citation