Preparing Linguists’ Field Recordings and Datasets for Ingest into a Digital Library System: Lessons from Creating the Lamkang Language Resource at the UNT-DL
MetadataShow full item record
Language documentation is the subfield of linguistics dedicated to preserving linguistic diversity. A core concern of the field of language documentation is the long term preservation of and universal access to language data which may tap into typologically rare phenomena and which represents the intangible and irreplaceable heritage of indigenous communities. This language data can be in the form of audio, video, picture, and various digital formats. In order to prepare the source files of a documentation project for ingest to a digital library, linguists and digital librarians must develop a common procedure that takes into consideration (1) provenance and varying formats of materials and (2) necessary metadata specific to linguistic datasets. Digital collections created as products of language documentation projects may be created by a single collector, but more typically (and optimally) are produced through the contribution of multiple contributors over several years. These contributors include speakers of the languages being documented, donated legacy materials, and materials collected for linguistic analysis by a researcher lead and team. In creating the Lamkang Language Resource for the UNT Digital Library (https://digital.library.unt.edu/explore/collections/SAALT/), we found that, because of the personalized file naming of the many collaborators, files presented for ingest included instances of the same material (1) in multiple formats (e.g., .mp3, .wav, .wma), and (2) at different stages of editing (trimmed and untrimmed). Lessons learned from the depositing of this dataset in a digital repository will be used to streamline the file naming and organizational structures for future projects. Another challenge is the mismatch between the elements present in common digital library metadata schemes and linguistic data sets which include audio, video and various derivatives of the source files (e.g., acoustic analyses, linguistics analysis). Information crucial for linguists and community members is not represented optimally within the elements provided, suggesting the need for further discussions between field linguists and digital repository staff to better tailor metadata tools and guidelines for linguistic data sets. For example, metadata guidelines should allow for non-Western naming conventions (e.g., caste names and varied order of given and family names) and more granular representation of contributor roles (e.g, speaker, participant in a linguistic experiment, elicitor). In this presentation, we provide an overview of the steps and missteps in creating the Lamkang Language Resource, providing specific examples of file naming, file versioning, and metadata concerns. We provide suggestions for creating a workflow and data management that could assist in improving the experience for both depositors and the digital librarian.