Repository Interoperability in the Texas Digital Library Through the Use of OAI-ORE




Maslov, Alexey
Creel, James
Mikeal, Adam
Phillips, Scott

Journal Title

Journal ISSN

Volume Title



One of the more prominent projects undertaken by the Texas Digital Library is the creation and maintenance of a federated collection of Electronic Theses and Dissertations (ETDs) from its member institutions. Currently, the maintenance of this collection is performed via a manual process, leading to scalability issues as the collection grows. The DSpace OAI Harvesting project was started with the aim of improving current federation methods. It relies on integrating two key technologies into the DSpace repository platform: OAI-PMH and OAI-ORE.

The Open Archives Initiative’s (OAI) Protocol for Metadata Harvesting (OAI-PMH) is a well-established mechanism for harvesting metadata between repository systems. The DSpace platform supports metadata dissemination through OAI-PMH, allowing collections to be regularly harvested by external agents such as Google, or the NDLTD’s Union Catalog of ETDs. This protocol’s ubiquity is well-deserved: it is simple and flexible, allowing for selective harvest by date ranges and sets, as well as specific metadata formats. Although dissemination through OAI-PMH has been a feature of DSpace for some time, harvesting support was missing, and was added as part of this project.

As its name implies, OAI-PMH is concerned with metadata; it cannot transmit actual content. This need is addressed by another standard from OAI, called Object Reuse and Exchange (OAI-ORE). This protocol allows us to describe abstract sets of Web resources as nested groups called aggregations. The second part of this DSpace OAI Harvesting project was to make DSpace “ORE-aware”, so that when the harvesting engine encounters ORE descriptions, it is able to fetch the content from the remote repository and create a new local copy.

This presentation will describe the OAI Harvesting project, and discuss its impact on the various TDL repositories, all of which use the DSpace platform. For the federated ETD collection, this technology will enable the maintenance of the collection to move from a manual process to an automatic one. It also opens up interesting possibilities for specializing various repositories for specific tasks; for example using a DSpace instance solely for ETD workflow and management and then harvesting the results into the main repository. Finally, we will discuss the impact of this project on repository architectures in general.


Presentation slides for the 2009 Texas Conference on Digital Libraries (TCDL).