Machine Learning Tools for Audio-Visual Transcriptions, Captions, and Text Analysis in Digital Libraries

Hicks, William

Machine Learning Tools for Audio-Visual Transcriptions, Captions, and Text Analysis in Digital Libraries

Files

TCDL 2023 logo.png (556.87 KB)

Date

2023-05-17

Authors

Hicks, William

Publisher

Texas Digital Library

Abstract

Rapid advances in inexpensive or free-to-use artificial intelligence and text-processing applications now make it possible for digital libraries to produce affordable, relatively high-quality text derivatives (captions, transcripts, subtitles, translations, etc.) of many audio-visual (AV) materials held in repositories and expose these materials to a wider audience than would otherwise be possible. While not perfect, recently released systems allow for outputs that often meet or exceed the accuracy of text-based OCR, and natural language processing on these outputs holds promise for generating metadata or performing other research-oriented tasks. Members of the UNT digital libraries team will discuss recent work they have explored in this area, comparing the quality of outputs, costs with other creation methods, resource commitments, and demonstrate other lessons learned along the way.

Description

TCDL 2023 Session 2C, Wednesday, 5/17/2023, 10:00 am to 10:045 am | Moderated by Susan Elkins, Sam Houston State University | Presentation | Technology & Software Development