2009 Texas Conference on Digital Libraries

Permanent URI for this collectionhttps://hdl.handle.net/2249.1/56865

Browse

Now showing 1 - 12 of 12

The Texas Digital Library Preservation Network
(2009-05-27) Bolton, Michael; Texas A&M University
The TDL is developing a Preservation Network to support the repositories and collections of the scholarly output of the State of Texas. The Preservation Network will be a network of repository and redundant systems, located initially across the State of Texas, which will be able to manage and co-locate data for future research endeavors, as well as preserving data against potential failure at any one location. The Preservation Network is now in its first stages, with testing underway between Austin and College Station locations. TDL will describe the project, provide updates as to their current status, and goals for the near and far future of the project.
Repository Interoperability in the Texas Digital Library Through the Use of OAI-ORE
(2009-05-27) Maslov, Alexey; Creel, James; Mikeal, Adam; Phillips, Scott; Texas A&M University
One of the more prominent projects undertaken by the Texas Digital Library is the creation and maintenance of a federated collection of Electronic Theses and Dissertations (ETDs) from its member institutions. Currently, the maintenance of this collection is performed via a manual process, leading to scalability issues as the collection grows. The DSpace OAI Harvesting project was started with the aim of improving current federation methods. It relies on integrating two key technologies into the DSpace repository platform: OAI-PMH and OAI-ORE. The Open Archives Initiative’s (OAI) Protocol for Metadata Harvesting (OAI-PMH) is a well-established mechanism for harvesting metadata between repository systems. The DSpace platform supports metadata dissemination through OAI-PMH, allowing collections to be regularly harvested by external agents such as Google, or the NDLTD’s Union Catalog of ETDs. This protocol’s ubiquity is well-deserved: it is simple and flexible, allowing for selective harvest by date ranges and sets, as well as specific metadata formats. Although dissemination through OAI-PMH has been a feature of DSpace for some time, harvesting support was missing, and was added as part of this project. As its name implies, OAI-PMH is concerned with metadata; it cannot transmit actual content. This need is addressed by another standard from OAI, called Object Reuse and Exchange (OAI-ORE). This protocol allows us to describe abstract sets of Web resources as nested groups called aggregations. The second part of this DSpace OAI Harvesting project was to make DSpace “ORE-aware”, so that when the harvesting engine encounters ORE descriptions, it is able to fetch the content from the remote repository and create a new local copy. This presentation will describe the OAI Harvesting project, and discuss its impact on the various TDL repositories, all of which use the DSpace platform. For the federated ETD collection, this technology will enable the maintenance of the collection to move from a manual process to an automatic one. It also opens up interesting possibilities for specializing various repositories for specific tasks; for example using a DSpace instance solely for ETD workflow and management and then harvesting the results into the main repository. Finally, we will discuss the impact of this project on repository architectures in general.
Metadata Quality Assurance: The University of North Texas Libraries Experience
(2009-05-27) Alemneh, Daniel; Tarver, Hannah; University of North Texas
Libraries exist to connect users with resources and information. In a traditional library, a card or online catalog is merely one aspect of finding and accessing holdings, albeit an essential one. In the same way, for digital libraries, metadata records allow users to gain access to the digital resources. Successful metadata adds value – allowing users to click and delve deeper than they’d be able to with a traditional static representation. Maintaining usable and sustainable digital collections necessitates maintaining high-quality metadata about those digital objects. The two aspects of digital library data quality are the quality of the data in the objects themselves, and the quality of the metadata associated with the objects. Because poor metadata quality can result in ambiguity, poor recall and inconsistent search results, the existence of robust quality assurance mechanisms is a necessary feature of a well-functioning digital library. Metadata Quality at the UNT The University of North Texas (UNT) Libraries participate in a number of collaborative digital initiatives. Recognizing the critical role of quality metadata in digital resource life cycle management, the UNT Libraries employ a number of metadata quality assurance procedures and tools at each (pre-ingest and post-ingest) stage. Pre-Ingest From the start, we provide intensive training (both face to face and online tutorials) and supplement it with detailed documentation. While providing continuous support, we also instruct metadata creators and/or editors on proper formatting and tools available for them to check their own work. Among other tools and procedures, the following pre-ingest activities facilitate the metadata creation process: A metadata creation template (web-based form for creating records) partially automates data population, validates metadata values, and checks formats, links, etc. The UNTL controlled vocabularies and dropdown lists draw different terms and concepts into one single preferred word/phrase to ensure consistency. Template readers provide firsthand visual checking capability. Post-Ingest Our web-based metadata analysis tools allow us to compare field values across a particular collection or our entire holdings to easily identify errors including misspellings, incorrect formatting, empty (null) values, and other likely mistakes. The following tools, among others, allow us to view, analyze, and check for errors in uploaded records: List or Browse metadata values: All Values for each element (refined or enhanced by Qualifiers — Use/Ignore, Highlighter-On/Off) Null Values (e.g. for mandatory elements) Authorities Values Other visualization and graphical reporting tools: Clickable Maps by Institution and Collection Word Clouds by elements Records added overtime Quality services depend on good metadata. Incorrect information: errors, omissions, or ambiguities in the metadata affect the consistency of search results and can limit the ability of the service provider to include special functions and creative services. In order for end users to benefit fully from the development of digital libraries, responsible and viable service providers need to address metadata quality issues. Based on the UNT Libraries experiences, this presentation will discuss issues related to metadata quality management and demonstrate a number of tools, workflows and quality assurance mechanisms.
Digital Repository 2.0: Lessons Learned and Applied
(2009-05-27) Nordstrom, Kurt; Fredericks, Brandon; University of North Texas
The Portal to Texas History at The University of North Texas Libraries is a comprehensive system for storing and providing access to a very large number of digital objects of historical significance to the state of Texas. The Portal is currently in its second iteration of development, and in this presentation we hope to examine some of the lessons learned from our initial efforts and how they shaped the decisions made in the current system and the future. We will briefly overview the first system that was put in place, and goals that we had in mind for it. We will cover some of the ways that it was successful, and some of the limitations that we encountered. The things that we will highlight about the former system include: Data Model Overview of the old format and limitations encountered. Technologies Utilized Issues with products that have a small user and developer base. Our XSLT experience. Architecture A look at the single-machine model and how it relates to scalability and redundancy. Development and Workflow The “learning project management as we go” adventures. Next, we will cover the goals behind the current version of the Portal. As there are several aspects of the system, we’ll be looking at different areas of importance that drove our decisions. The topic areas will parallel those covered in the previous system. Data Model The improvements made to our current data format, based on the limitations of the old format. Technologies Utilized Technologies we used to build our new system. Reasons for selecting these specific technologies. A brief overview of the potential of web frameworks, focusing on Django. Architecture Explanation of and benefits of the “shared-nothing” approach. Scalable and Deployable. Development and Workflow Moving from one developer to several. Content management and workflow tools and procedures. Division of tasks and collaboration. The motivation behind “rolling our own”. Our goal in this presentation is not so much to present a “this is how everybody should be doing it” argument, but rather to highlight some of the issues that we encountered and our approaches to resolving them. It is our hope that other groups can learn from our mistakes and successes as they seek to implement their own Digital Repository systems.
Train to Share: Statewide Interoperability Training for Cultural Heritage Institutions
(2009-05-28) Plumer, Danielle; Frizzell, Karen; Texas State Library and Archives Commission
In 2008, the Texas State Library and Archives Commission, working with the University of North Texas Libraries, Amigos Library Services, and a variety of additional partners and participants, was awarded an IMLS Laura Bush 21st Century Librarian Grant to develop “Train to Share: Interoperability Training for Cultural Heritage Institutions,” a project of the Texas Heritage Digitization Initiative (THDI). In this three-year project, we will address the need, identified nationally but equally evident at the local level, for quality sharable metadata, metadata produced within specific traditions of practice that can nonetheless be shared to create rich experiences for both today’s user and the user of tomorrow. Through activities including outreach, observation, education, and production, the “Train to Share” project will assist metadata specialists in envisioning, developing, and sustaining digital products that can be combined seamlessly to provide a rich experience for the ultimate audience of the project, the end user community consisting of students, teachers, and researchers interested in Texas history and heritage. In this presentation, we will review our project goals and objectives, introduce the ten participant teams that will be involved in the training, and invite feedback from conference attendees to assist us as we develop our training workshops and supplemental materials. The “Train to Share” project activities will include three phases. In the first phase, outreach and observation, we will work with separate communities of practice from libraries, archives, museums, government agencies, and other cultural heritage institutions. Our goal will be to identify training needs and to establish the depth of resources, skills, and knowledge already available. In the second phase, education, trainers from TSLAC and Amigos Library Services will adapt the “Digital Library Environment” workshop series from the Library of Congress to incorporate the needs and traditions of the separate communities of practice. Participant teams and other interested individuals will be trained using the adapted workshop series, which will require a minimum of five two-day workshops offered at locations across the state, plus two additional online-only offerings. In the final phase, production and evaluation, our participant teams will put what they have learned into practice through the development of a total of ten digital products. The three-phase structure of the project is designed to provide maximum support to learners as they acquire new skills and develop trust in the partnerships that will be fostered as a consequence of this project. The intended outcomes of the “Train to Share” project will be significant increases in knowledge by and among participating metadata specialists, as measured by improved metadata quality and consistency; improved access to the rare and unique materials held by cultural heritage institutions, as measured by the number and type of objects available from project participants at the end of the project; and new and sustainable partnerships vital to the ongoing development of digital projects across the state.
Effective Tools for Digital Object Management
(2009-05-27) Moore, Jeremy; Fisher, Sarah Lynn; University of North Texas
The organization of digital files during the development of digital collections is as important as the organization of physical objects on library shelves. Digital objects can be lost as quickly as they are created without an established system for file naming and digitization workflow, resulting in lost time and productivity. The UNT Digital Projects Unit (DPU) has developed several methods for the digital object management of book and image collections including, but not limited to, standardizing the process of object file naming and using logical folder organization matched to an internal Wiki. These methods, while unique to our lab, are based on simple principles that can be implemented at any institution. A digitization workflow begins with the organization of the physical objects in a logical and traceable way so retrieving the correct item to digitize does not interfere with production. In an environment where multiple technicians are creating digital content from the same source material, it is key that everyone knows the current status of the project so material does not get scanned repeatedly or not get scanned at all. The DPU utilizes a combination of physical tags, numbered carts, and an internal Wiki allowing for a parallel work environment. The internal Wiki also matches folder hierarchies in the digital realm and provides a layer of redundancy for when — not if — a file is misplaced. Before the first digital object is created it is important to have identifiers assigned to each object. A unique and persistent identifier is used throughout the digitization process for file naming, structuring folders, and linking metadata records with digital objects. Upon digitization, the digital object is moved through a series of folders stacked in order of process from 1 to 7: ToQC ToDekew ToResize ToOCR ToMetadata ToUpload Uploaded Ordering files into folders based upon the action needing to be applied allows more time to be spent on processing the files than determining what needs to be done next. Additionally, this workflow is broad enough to allow for the multifarious projects undertaken by the DPU. Books require different handling than image collections. Photographic image collections can be more straightforward in their management as digital objects as each physical photograph usually has only one digital constituent (or two if you scan the back of the photo), but when a book is digitized many images are created. The DPU uses MagickNumbering, an internal naming schema for books that logically handles both object-order and pagination, while offering a robust quality control method. Through the use of MagickNumbers, an entire book can be managed in a single folder as master TIFF files. By applying these methods the DPU is able to work on multiple projects at a given time while keeping a high level of organization, quality control, and output. The methods mentioned are also scalable from a project consisting of only a handful of photographs to a 100-year run of yearbooks to 10,000 negatives.
Redesigning the Portal to Texas History: A User-centered Design Approach Involving the Genealogical Community
(2009-05-28) Murray, Kathleen; University of North Texas
In 2007, the Digital Projects Unit of the University of North Texas Libraries began a two-year effort to redesign the interface to the Libraries’ Portal to Texas History. The Portal provides a digital gateway to collections of historical and cultural materials from Texas libraries, museums, archives, historical societies, and private collections. It contains primary source materials, including maps, books, manuscripts, diaries, photographs, and letters. The Portal went online in 2004 with five collaborative partners contributing to four collections of materials related to Texas history and culture. These collections were comprised of 489 objects represented by 6,688 digital files. By 2008, the Portal included 34 collections from 91 contributing partners. Its collections include 40,089 objects and 324,023 digital files. Likewise, usage has grown from approximately 1,000 unique visitors per month in 2004 to 105,000 per month in 2008. Keeping pace with this growth occupied the Portal’s support staff during the early years and precluded major enhancements to the user interface. By 2007, the original interface of the Portal needed to be refurbished; it was dated both in terms of its look-and-feel and its feature-functionality. Additionally, the Portal’s underlying infrastructure had increased its capability to support features not possible in the original implementation. It was decided to redesign the interface using a user-centered design approach. Previous research indicated that family history researchers are a significant, growing, and under-studied user group of online cultural heritage collections. Believing this applicable to the Portal’s user population, funding was obtained from the Institute of Museum and Library Services to study the information seeking behavior of persons conducting family history research in order to identify their functional requirements in regard to the Portal to Texas History. These requirements would serve as the basis for redesigning the Portal’s user interface. Members of local genealogical societies were recruited for the redesign effort. The user studies included individual interviews, focus group discussions, and usability tests of the existing Portal. Analysis of the data from these studies informed a set of functional requirements for the redesign effort. The requirements are specific to typical functional areas of a digital library, such as searching, browsing, evaluating search results, and navigating, but also include requirements in the areas of metadata practice, obtaining objects, getting help, and contributing comments. Users’ requirements were classified as either: feasible in the near term, feasible in the long term, or not feasible. This classification highlighted the gaps between user expectations and realistic satisfaction of those expectations by the Portal to Texas History, which is typical of many online cultural heritage collections. These insights are a direct result of the user studies and demonstrate the importance of such studies for digital libraries. This presentation will report the major findings of the research and the current status of the redesign effort.
Matching a Digitization Project’s Workflow to the Collection and Its Owner
(2009-05-28) Logan, Tim; Stuhr, Darryl; Baylor University
While there is a general pattern to the workflow for digitization projects, each collection and its owner-sponsor offer a different set of challenges and opportunities for managing the collection processing, especially regarding the creation of metadata for the collection. This presentation will include a general description of workflow for digitization projects: intake, digitization, file processing, metadata, and destinations, and then focus on a description of options for the workflow for individual digitization projects, with particular emphasis on metadata creation. Additionally, there will be samples from various examples project instruction manuals, each of which is adapted to the specific needs of the collection and its materials. The Digitization Projects Group of Baylor’s Electronic Library serves in part as a digitization service provider for other campus libraries and collections at the university. The Digitization Projects Group is relatively new, and the number and complexity of digitization projects it manages continue to expand. Through experience with previous projects, the group has learned that a clearly defined workflow is critical to success, with special emphasis placed on the creation of effective metadata. We have learned that one size does not fit all; each collection is unique, and the workflow should be adapted to take advantage of the knowledge and skills of the library of entity that owns the physical materials, or other provisions must be made for cataloging. We continuously revise and adapt our project workflow model to match the characteristics of a given collection with the time and skill sets of the personnel involved with the project. Since the Electronic Library has no physical holdings of its own, candidate projects are brought to the EL from a variety of sources — donors, potential donors, campus libraries, and other university collections. After a description of the general workflow model, the presentation will use as examples 5 different digitization projects that are currently in progress or have been recently completed. Each project illustrates a different schema for the distribution of metadata responsibilities, leveraging the resources available from various sources to accomplish the necessary work. Examples will include: Guthrie Civil War letters, which had descriptive data assigned by a skilled technician who was not a trained librarian; Gospel Music Restoration Project, which involves a complex metadata schema in XML created by a trained and experienced metadata librarian; 19th Century Women Poets Collection, which pulls existing catalog information from the university’s integrated library system; Oral History transcripts, which merges data from the ILS and a stand-alone FileMaker database, managed by an MLS librarian employed outside the library; Spencer Sheet Music collection, for which cataloging has been outsourced to a professional company that works from scanned images of the shelf list cards and of the original materials. The various projects illustrate that there are multiple solutions for managing the responsibility for creation of metadata, and the best method is often determined by the nature of the collection and the skill set of the sponsoring collection owner.
Collection Development for an Environmental Science Digital Library
(2009-05-27) Hall, Nathan; University of North Texas
This presentation will focus on University of North Texas Libraries’ strategies for creating digital collections and services from datasets for users outside of formal education and research in support a proposed international digital library for environmental science. Some collection development for the Environmental Science Digital Library (ESDL) will stem from harvested web content from the government domain (.gov). These materials will include environmental policy and documentation from the websites of various federal agencies and departments from before the 2008 election, after the 2008 election, and following the 2009 inauguration. As a result, the collection will allow users to see how federal policies changed during the transition from the Bush administration to the Obama administration. The ESDL will also host content contributed from institutional partners. This would include white papers, datasets, images, video, simulations, and applications. Much of the ESDL content will be born digital. This will provide the opportunity and challenge of generating new content and services by compiling information from discrete data sources to create new applications. An example of such a service would be a map that imported data from one source, measuring soil, water, and air quality. Layering that map over another one with regions encoded by environmental policy would be useful for determining how environmental policy has an impact on measures of environmental quality. The team developing the ESDL believes that the values and consequences of environmental science are important to a broader range of users than many other academic disciplines. The target audience of the digital library will be citizens and policy makers due to the ongoing needs of these groups for reliable information about environmental science and policy. This presentation will address the TCDL 2009 topics of interest by discussing how the ESDL project will create digital library services for a broad range of users, and how the digital library will add value to its collections through the use of imported datasets.
Toward Automatic Metadata Assignment in the Texas A&M University Digital Repository
(2009-05-27) Creel, James; Maslov, Alexey; Mikeal, Adam; Phillips, Scott; Texas A&M University
We are researching and developing tools to automatically assign high-quality metadata to documents in the Texas A&M University Digital Repository, a DSpace repository. A reliable automation of the metadata assignment process confers several advantages. First, the submission workflow may be streamlined by providing the user with pre-filled input fields. Second, existing collections may have their items automatically augmented with new metadata fields, usually a lengthy and painstaking process for human catalogers. Finally, the repository may be systematically reviewed for metadata errors. Metadata in a DSpace repository exist at several levels and in various forms. Some forms are suitable for automatic assignment whereas others remain beyond the scope of modern artificial intelligence and natural language processing. Though we typically think of metadata in a DSpace repository as being applied at the item level, metadata are also applied at the bistream, bundle, collection, and community levels. Among these levels, we are currently interested in the bistream and item levels. At the bistream level, the metadata are restricted to name (always a filename), type (typically a MIME type) and description. The type is currently determined on the basis of filename extensions. We propose to employ the UNIX file command to achieve less error-prone type assignment. The description field, consisting of unrestricted text, poses greater challenges and will be the focus of future research. At the item level, metadata are restricted only by schemas registered in the repository, typically including Dublin Core (DC). Among the unqualified DC fields, some are highly amenable to automatic assignment while others pose formidable difficulties in light of the high degree of accuracy required. At present, we find it appropriate to tailor metadata assignment tools on a collection-by-collection basis. This is so because the necessary metadata fields vary between collections; whereas geographic metadata may be relevant to a collection of maps, committee member metadata may be relevant to a collection of theses. Additionally, by focusing on a particular collection the software may assume particular structural consistencies between documents. For example, our experimental software can accurately assign title, author, committee member, degree level, abstract, and subject area metadata to TAMU ETDs by virtue of the fact that the locations of these strings are virtually guaranteed within these documents. Furthermore, stylistic and subject-matter consistencies between documents in a collection can facilitate statistical-linguistically informed inferences about those documents. Experimentally, we are employing named entity recognition software to identify references to people, places, and things within documents by using statistical linguistic clues. These identifications may be used to provide subject metadata, geographic metadata, and other types of metadata fields depending on the collection. In the future, we plan to integrate named entity recognition with a formal knowledge base of entities, events, and relations, which can dynamically grow with the repository. This knowledge-based integration is the first step in a long journey toward automatic assignment of complex fields like descriptions and summaries of documents.
ETD Management in DSpace
(2009-05-27) Mikeal, Adam; Creel, James; Maslov, Alexey; Phillips, Scott; Texas A&M University
The Texas Digital Library (TDL) is a consortium of public and private educational institutions from across the state of Texas. Founded in 2005, TDL exists to promote the scholarly activities of its members. One such activity is the collection and dissemination of ETDs. A federated collection of ETDs from multiple institutions was created in 2006, and has since grown into an all-encompassing ETD Repository project that is partially supported by a grant from the Institute for Museum and Library Sciences (IMLS). This project seeks to address the full life-cycle of ETDs, providing tools and services from the point of ingestion, through the review process, and finally to dissemination in the centrally federated repository. A primary component of this project was the development of Vireo, a web application for ETD submittal and management. Built directly into the DSpace repository, Vireo provides a customized submission process for students, and a rich, “Web-2.0″ style management interface for graduate and library staff. Because it is built directly in the DSpace repository, scalability is possible from a single department or college up to a multiple-institution consortium. In 2008, we reported the results of a demonstrator system that took place at Texas A&M University. Vireo has replaced the legacy application and is now the single point of entry for all theses and dissertations at that university. Rollout to other schools will follow a gradual, phased approach. This presentation examines the challenges faced as Texas A&M transitioned to a new ETD management system, and the architectural issues involved with scaling such a system to a statewide consortium. Finally, it will discuss the application’s release to the ETD community under an open-source license.
Texas Digital Repository Services Update
(2009-05-28) Steans, Ryan; Texas Digital Library
The Texas Digital Library (TDL) is a multi-university consortium formed to provide and support a fully online scholarly community for institutions of higher learning in Texas. The TDL is developing applications such as open-access institutional repositories, collections management tools, and an electronic thesis and dissertation submission and management system. The organization also promotes new scholarly communication models through faculty services such as blogs, wikis, and peer-reviewed electronic journals. The TDL has employed numerous open-source technologies in its tool-kit, from implementing DSpace as its platform for member repositories, to enabling cross-institution single sign-on for TDL members using the Shibboleth distributed authentication system. Other open-source platforms used by the TDL include WordPress, Open Journal Systems, and MediaWiki. Future plans include the implementation of Open Monograph Press and Open Conference Systems. This presentation will discuss the ways in which the TDL, by leveraging its commitment to open-source technologies and the principle of open access, is empowering its members to collect, preserve, and promote the scholarly output of Texas universities. By providing an update on the current status of TDL services for faculty and institutions, highlighting recent developments and outlining plans for expansion of services, we will describe how TDL is connecting users with digital libraries across the state, and how that effort is connecting Texas Digital Library with the world

Browse

Recent Submissions