The Archivo Histórico de la Policía Nacional Digital Preservation Project at UT Libraries
This poster will detail the digital preservation workflow developed for the Archivo Histórico de la Policia Nacional de Guatemala (AHPN) Digital Archive. The AHPN is a collection of approximately 80 million records of police activity in Guatemala from the 19th century to the 1990s. It contains information vital to the study of Guatemalan history, particularly the National Police's role in human rights abuses that took place during the Guatemalan Civil War, which lasted from 1960-1996. After the records were discovered in 2005, work began quickly to digitize and preserve them in Guatemala, and in 2011 UT Austin entered into a formal agreement with the AHPN to preserve and publish the digital collection. The sheer size of the collection means creating a digital surrogate is a complex undertaking. As of 2018, the AHPN has digitized approximately 21 of the full archive's 80 million documents. The digital collection consists of more than 8 TB of small TIFF images with arbitrary file names and directory structures, both of which are generated automatically during the scanning process. The structure of the physical collection is recreated digitally via a complex SQL database, rather than in the file or directory names. As a result, the digital collection cannot be easily broken into discrete intellectual units, rather it must be kept together even as it grows past 8 TB. This opaque digital collection structure, as well as the collection's size, present a challenge for digital preservation. This poster will describe the collection and the heightened need to digitally preserve it in light of recent developments in Guatemala. It will then detail the digital preservation work on the collection undertaken at UT Libraries beginning in spring 2018. This preservation process involved several months of continuous technical metadata extraction, bagging, and writing to tape. The poster will also outline a proposed workflow for future preservation of the AHPN digital archive, which uses a combination of BagIt payload manifests and OpenRefine processing to identify and copy only new additions to the collection, obviating the need to write a complete copy of the archive to tape every time an update is delivered from Guatemala.