Programming for Open Access: Using Python to Promote Open File Formats in the Texas Data Repository

Date

2022-05-23

Authors

Goodale, Ian

Journal Title

Journal ISSN

Volume Title

Publisher

Texas Digital Library

Abstract

The preponderance of proprietary file formats being used for scholarly purposes poses an issue for the truly open dissemination of information. This was one of the key points identified by a working group I participated in at the University of Texas at Austin, in which working group members explored ways to improve metadata and reduce proprietary file formats in the Texas Data Repository. As a result of my work on the group, I created a group of Python scripts designed to help promote use of open file formats in the repository. These include scripts that automatically convert specified proprietary file formats to open ones, and that search through uploads to the Texas Data Repository within a specified date range and output a .xlsx or .csv with the dataverses and their files, flagging files with non-open extensions. My poster will describe and demonstrate this evolving resource, which is hosted on GitHub and freely available for others to modify and contribute to, and explain how it aims to make dataverse content more openly accessible to all.

Description

Citation