2007 Texas Conference on Digital Libraries

Permanent URI for this collectionhttps://hdl.handle.net/2249.1/4514


Recent Submissions

Now showing 1 - 13 of 13
  • Item
    Shibboleth in the Texas Digital Library
    (2007-05-30) Paz, Jay; Texas Digital Library
    The Texas Digital Library (TDL) is taking advantage of the Shibboleth architecture to authenticate and authorize its users within its services. During the beginning of this year we have worked with UT Austin and Texas A&M Universities to create a TDL Federation and setup Identity and Service Providers at both locations. It is our goal to have each participating institution to join the TDL Federation and install an Identity Provider. This will allow each of the TDL services to be accessible to the entire set of member institutions. The presentation will describe the federation, identity and service providers, and describe their current implementation within the TDL Repositories Project.
  • Item
    Developing a Common Submission System for ETDs in the Texas Digital Library
    (2007-05-30) Mikeal, Adam; Brace, Tim; Texas A&M University; University of Texas at Austin
    The Texas Digital Library (TDL) is a consortium of universities organized to provide a single digital infrastructure for the scholarly activities of Texas universities. The four current Association of Research Libraries (ARL) universities and their systems comprise more than 40 campuses, 375,000 students, 30,000 faculty, and 100,000 staff; while non-ARL institutions represent another sizable addition in both students and faculty. TDL's principal collection is currently its federated collection of ETDs from three of the major institutions; The University of Texas, Texas A&M University, and Texas Tech University. Since the ARL institutions in Texas alone produce over 4,000 ETDs per year, the growth potential for a single state-wide repository is significant. To facilitate the creation of this federated collection, the schools agreed upon a common metadata standard represented by a MODS XML schema. Although this creates a baseline for metadata consistency, there exists ambiguity within the interpretation of the schema that creates usability and interoperability challenges. Name resolution issues are not addressed by the schema, and certain descriptive metadata elements need consistency in format and level of significance so that common repository functionality will operate intuitively across the collection. It was determined that a common ingestion point for ETDs was needed to collect metadata in a consistent, authoritative manner. A working group was formed that consisted of representatives from five universities, and a state-wide survey of the state of ETDs was conducted, with varied levels of engagement with ETDs reported. Many issues were identified, including policy questions such as open access publishing, copyright considerations and the collection of release authorizations, the role of infrastructure development such as a Shibboleth federation for authentication, and interoperability with third-party publishers such as UMI. ETD workflows at six schools were analyzed, and a meta-workflow was identified with three stages: ingest, verification, and publication. It was decided that Shibboleth would be used for authentication and identity management within the application. This paper reports on the results of the survey, and describes the system and submission workflow that was developed as a consequence. A functional prototype of the ingest stage has been built, and a full prototype with Shibboleth integration is slated for completion in June of 2007. Demonstrators of the application are expected to be deployed in fall of 2007 at three schools.
  • Item
    A Manakin Case Study Visualizing geospatial metadata and complex items
    (2007-05-30) Maslov, Alexey; Green, Cody; Mikeal, Adam; Phillips, Scott; Weimer, Kathy; Leggett, John; Texas A&M University
    Increasingly, repositories are responsible for preserving complex items, and items with specific/unique metadata, such as geospatial metadata. These collections present unique challenges for the repository interface, and traditional approaches often fail to provide adequate visualization mechanisms. This presentation is a case study of a particular collection that exhibits a Manakin solution to both of these challenges. The Geologic Atlas of the United States is a series of 227 folios published by the USGS between 1894 and 1945. Each folio consists of 10 to 40 pages of mixed content -- including maps, text, and photographs -- with an emphasis on the natural features and economic geology of the coverage area. Complex Items: The current visualization model in DSpace offers a cumbersome browsing experience for complex items, as the default item view in DSpace is not optimized for items that contain more than a few bitstreams. The logical organization of the folio collection was as a single DSpace collection with 227 items, where each item contained multiple bitstreams representing each page of the folio. The result was an uninformative list of filenames, each linking to a very large (approximately 100 MB) image file. Manakin allowed us to create a new detail view for the folio items using an image gallery-style viewing interface. This new view has thumbnails for each page and lower-resolution surrogates for screen viewing. It also allows a viewer to download either the full archival-quality TIFF or a reduced-quality JPEG. The combination of thumbnail surrogates and the ability to see all pages of a folio at once serves to increase the ease with which the collection is navigated and understood. Unique Metadata: The current DSpace interface is unable to leverage the potential of atypical metadata, such as the geospatial metadata attached to the folio collection. Although geographic elements were added to the DSpace metadata registry following Dublin Core Metadata Initiative (DCMI) recommendations, the only visualization mechanism DSpace could offer was a flat listing of the metadata values. Manakin allowed us to exploit the unique geospatial properties of the folio collection. It was determined that a map-based interface for browsing and searching would help a user to quickly determine the coverage area of a particular folio visually, as well as place the title in its geographic context. Both of the challenges presented by this case study could have been addressed using the existing JSP interface. However, the awkward nature of such an implementation would be impractical to create and maintain; furthermore, no mechanism exists to restrict such changes to an individual collection. Manakin's modular architecture made the creation of this interface achievable by a small team in a matter of days. Currently, the interface is available online at http://handle.tamu.edu/1969.1/2490, and has been featured as an Editor's Pick on Yahoo.com for its use of the Yahoo! Maps API.
  • Item
    The Texas Digital Library Preservation Network
    (2007-05-30) Maslov, Alexey; Texas A&M University
    The Texas Digital Library is a collaborative project between public and private institutions across Texas that aims to provide curation, preservation, and access to digital scholarly information for the State. The preservation component of this mission means that TDL is committed to the long-term maintenance of its digital assets. Accomplishing this goal necessitates the creation of a TDL-wide preservation network. An effective preservation solution would encompass the following characteristics: • No single point of failure: by sharing copies of the same data between multiple geographically distributed locations, we ensure that failure of any one location does not result in permanent data loss. • Local allocation of resources: any member institution that joins the network would retain full control over the utilization of the resources they commit to the network. • Shared responsibility: responsibility for preserving digital assets is shared across all of the members of the network, eliminating reliance on any one institution’s resources. • Architectural flexibility: new locations can efficiently be added to the network, allowing for unforeseen growth. The TDL Preservation Network is a current project that seeks to address these issues. To accomplish these goals, we have designed a system with the following layered architecture: • User layer: represents the pool of users that have access to the preservation network system. This pool will be determined by the established policies and submission agreements at the institution level. • Application layer: contains the set of applications that can generate the data for the network, such as institutional repositories, e-journals, courseware management systems, and faculty archives. • Service layer: consists of a federation of data locations that implement preservation polices. This is the layer where the actual replication of data is performed and agreements between locations are brokered and recorded. • Storage layer: responsible for maintaining the individual copies of the preserved artifacts, and can be implemented with any number of standard technologies. This presentation will describe the current progress toward the implementation of the TDL Preservation Network, and the long-term goals for data preservation in the Texas Digital Library.
  • Item
    Digital Archive Services (DASE) at UT Austin
    (2007-05-30) Keane, Peter; University of Texas at Austin
    Digital Archive Services (DASE) is a web-based application for managing digital images, sound files, video, documents, and web resources. In addition to the search interface and built-in presentation tools, DASE includes a set of simple, dynamic web services on top of which new applications can be built. Developed by Liberal Arts Instructional Technology Services at UT Austin, the initial goal for DASE was to provide a way for faculty members to get web-based access to image collections (both physical and digital) scattered around departments within the College of Liberal Arts. As we worked through specific issues regarding metadata schemes and desired search, browse, and save functionality, it became clear that other more general issues related to the management of digital content could be addressed as well. Every attempt was made to keep the DASE application architecture and data model as simple as possible. Towards that end, and given the diversity of collections involved, we allowed each collection to define its own set of metadata attributes. Attributes are simply flagged as 'searchable' (or not), thus allowing efficient cross-collection searching. For more fine-tuned searching, users can go to an individual collection and quickly and easily browse all of the available attributes. For those instances when a standard metadata schema IS necessary for proper interoperability with other systems, as in the case of RSS feeds, we simply "map" the attributes in the collection to the appropriate attributes in the RSS specification. Thus, collections that include audio or video files (as many now do) have a built-in means to provide "podcasting" functionality. In addition, by defining a simple set of web services (both RSS feeds and DASE-specific XML-over-HTTP) we have found new uses for DASE collections. DASE can easily serve the functions of a database-in-a-box for web site developers who would like to add simple dynamic capabilities to media-rich web sites. In working on the DASE project, we have seen time and again the same questions arising: How do we get our digital content on the web so as to share it with students and colleagues? How do we manage the huge amount of new content being produced and discovered every day? How do we maximize the opportunities for "repurposing" our content? How to we organize and preserve our digital assets? All of these are questions that we have attempted to address with DASE. While DASE does not pretend to provide a single comprehensive solution, it does provide solutions to a myriad of immediate problems and minimizes the risk involved in two ways: One, DASE is simple and offers a low barrier to entry. The technologies are all free and open source, and therefore can be implemented quickly and inexpensively. Even aside from the actual DASE application, the principles and architecture underlying DASE can be applied wholesale or in parts to address the challenges of content management. Two, DASE is based on well-defined and open standards and exposes a clear and transparent architecture. Moving from DASE to some other system in the future should be a very simple and straight-forward process.
  • Item
    Digital Initiatives at the University of North Texas Libraries
    (2007-05-30) Hartman, Cathy Nelson; University of North Texas
  • Item
    Map and GIS Resources in an Institutional Repository : Issues and Recommendations
    (2007-05-30) Weimer, Kathy; Texas A&M University
    Map librarians are increasingly digitizing and making available scanned map images over the internet. These digitized map collections are growing quickly in size and number. The issue of access and long term preservation to these scanned map collections is still in the early stages. Libraries suffer from a communication gap between the groups actively scanning maps and their IR staff. This is evident with the number of map scanning registries which are not part of an IR nor larger digital library initiative. The registries are increasing and both overlap and compete with each other. The benefit of an IR over both a basic web presentation and a digitized map registry is clear, due to the Google Scholar search capability and those configured as an OAI-PMH data provider, which result in freely harvested metadata. CNI conducted a survey to assess the deployment of IRs in the United States and among their findings was that nine repositories had map materials in their IR, twelve planned to include maps in the next by 2008. One example of a successful collaboration between a map librarian and IR staff is the Geologic Atlas project at Texas A&M University Libraries. In 2004, the Texas A&M University Libraries deployed dSpace. The Libraries digitized and uploaded the complete 227 folio set of the Geologic Atlas of the United States to dSpace. It was published by USGS between 1894 and 1945, and contains text, photographs, maps and illustrations. This collection serves as a pilot project to study scientific map and GIS resources in an IR, generally, and specifically, the use of geographic coordinates in metadata in building a map-based search interface, and the addition of GIS files in an IR environment. For this set, geographic coordinates were added to the metadata, including “coverage.spatial,” “coverage.box” and “coverage.point”. Fortunately the maps in this set are a very regular rectangle and coordinates were readily available. The map coordinates supported the creation of a YahooMap! interface. Each folio is located on a map of the US and can be readily found with a visual interface. The digitized maps are being converted into GIS files, and will be used to assess feasibility of GIS resources in the IR. These are some excellent examples of advanced geospatial data libraries which can serve as a model: NGDA (National Geospatial Digital Archive- UCSB and Stanford libraries), NCGDAP (North Carolina Geospatial Data Archiving Project), CUGIR (Cornell University Geospatial Information Repository) and GRADE (Geospatial Repository for Academic Deposit and Extraction) project. These groups and others are tackling the issue of long term preservation of GIS data in digital libraries. There are increasing numbers of map resources in digital libraries and IRs. The maps serve an important role in communicating scholarly information. Map librarians should collaborate on scanning standards and metadata creation. Map librarians and digital libraries staff should increase their communication and collaborate in order to improve the access to these collections.
  • Item
    University of Texas at Austin’s Texas Digital Library Bridge Group
    (2007-05-30) Thompson-Young, Lexie; University of Texas at Austin
  • Item
    To Stream the Impossible Stream: Liberating the Texas Tech University Libraries' Sound Recording Collection
    (2007-05-30) Thomale, Jason; Starcher, Christopher; Texas Tech University
    The Texas Tech University Libraries’ sound recording collection consists of more than 4,000 compact discs that feature art music, jazz, and folk music from around the world. The collection sees substantial use from students and faculty alike, but the medium on which the recordings exist is not optimally accessible—it requires patrons to come to the library building and allows only one patron to listen to a recording at a time. For this reason the collection was a prime candidate for being incorporated into the Texas Tech digital library; as a digital library collection, it would be accessible anytime, anywhere via the web. Thus, the concept for the Streaming Sound Collection (SSC) was born. In implementing the SSC, the project team faced a wide variety of challenges that are common to many digital-collections-building projects—the ways in which the team overcame these challenges are instructive for others embarking on similar journeys. The initial complication was the most obvious: copyright. In an age when corporations feel compelled to prosecute children and the elderly for relatively minor offenses, it was hardly an issue that a large state university could ignore. It was imperative that the content be protected. There were two areas of concern for which we could not equivocate—who shall have access and what type of access they shall have. These two issues drove many of the decisions that were made, including such crucial decisions as format and delivery mechanism of the content. The objects that make up the SSC are not simple. Providing access to them so that they would be both findable and usable was a key consideration in building the collection. The initial step toward this end was to decide on the system where the objects would reside. The project team first considered putting them in the catalog and later toyed with contracting a programmer to invent a completely customized web application—but both of these solutions proved untenable because neither comprehensively served the complete set of library needs, digital library needs, and collection needs. In the end, the project team developed a solution that successfully balanced all of these needs sets. Efficiently creating quality metadata for the collection was the third major challenge. Jane Greenberg, E. D. Liddy, and others have deftly described this as the “metadata bottleneck.” Indeed—if one views metadata creation similarly to library cataloging, in which a trained expert must carefully examine an object and use an arcane syntax to record minute details about it, then the process quickly gums up what might otherwise be an efficient project. The SSC project, however, by the way it leverages existing catalog records and workflows, serves as an example of how creative automatic metadata processing can help widen the bottleneck. It also demonstrates how an early understanding of the collection’s metadata needs and foresight about how one might process existing data has helped the resulting metadata become more than the sum of its MARC.
  • Item
    Manakin Architecture: Understanding Modularity in Manakin
    (2007-05-30) Phillips, Scott; Green, Cody; Maslov, Alexey; Mikeal, Adam; Leggett, John; Texas A&M University
    Manakin is the second release of the DSpace XML UI project. Manakin introduces a modular interface layer, enabling an institution to easily customize DSpace according to the specific needs of a particular repository, community, or collection. Manakin’s modular architecture enables developers to add new features to the system without affecting existing functionality. This presentation will introduce Manakin’s modular architecture from a technical perspective, with an emphasis on extending Manakin’s feature set to meet local needs. First the project’s goals will be introduced, followed by a discussion of Manakin’s relationship with DSpace. Next an architectural overview of the primary components will be given: • DRI: The Digital Repository Interface (DRI) is an XML schema defining a language that allows aspects and themes to communicate. Manakin uses DRI as the abstraction layer between the repository’s business logic and presentation. The schema is adapted for digital repositories through the use of embedded METS-based metadata packages. • Aspects: Manakin aspects are components that provide features for the digital repository. These modular components can be added, removed, or replaced through simple configuration changes, enabling Manakin’s features to be extended to meet the needs of specific repositories. Aspects are linked together forming an “aspect chain”. This chain defines the set of features of a particular repository. • Themes: Manakin themes stylize the look-and-feel of the repository, community, or collection. The modular characteristics of themes enable them to encapsulate all the resources necessary to create a unique look-and-feel into one package. Themes may be configured to apply to a range of objects, from an entire repository down to a single page. Finally the presentation will close with a walkthrough of the Manakin architecture detailing how these components work together to form the Manakin framework.
  • Item
    Interoperability Options: Lessons Learned from an IMLS National Leadership Grant
    (2007-05-30) Plumer, Danielle Cunniff; Texas State Library and Archives Commission
    In 2005, the Texas State Library and Archives Commission received an IMLS National Leadership grant to develop a multi-component federated search tool on behalf of the Texas Heritage Digitization Initiative that can search across digital collections of cultural heritage materials in Texas libraries, archives, and museums. Successful digitization projects in other states have focused on creating one or more centralized repositories of electronic resources and associated metadata. In contrast, the THDI project provides a single interface to decentralized repositories across the state. This interface, which will be available June 1, 2007 at http://www.texasheritageonline.org, has three components: a federated or broadcast search application, which uses Z39.50 to interact with library systems in real time; an OAI harvester operated by the University of North Texas Libraries to harvest metadata from institutions that do not have a Z39.50 front end; and increasing amounts of support for other APIs such as those used by A9 and Yahoo! In the process of developing this application and connecting collections, the THDI development team has learned some useful lessons. In particular, this presentation will focus on OAI-PMH implementation issues and the need for sharable metadata. Many projects, including the National Science Digital Library, have reported on the difficulty of combining metadata from multiple institutions, even when commons standards and controlled vocabulary sources are required. The solutions we have developed, including automated segmentation of OAI harvests and development of custom XSL transformations to map harvested metadata into common formats, are relevant to institutional repositories as well as to participants from the cultural heritage sector. The THDI development team has also gained experience working with lightweight search protocols including SRU and RESTful APIs, such as those from Yahoo! and A9, which are remarkably simple to implement when contrasted with the Z39.50 protocol, still widely used in library catalogs. REST, or Representational State Transfer, is a stateless, cacheable client/server architecture allows collections to share data over HTTP. Because THDI has worked with a wide variety of institutions, we are confident that this approach is both scalable and transferable to other types of digital library architectures. In a state the size of Texas, digital libraries cannot be "one size fits all." Instead, they must be flexible, adaptable, and offer institutions local control. The lessons learned from the THDI IMLS National Leadership Grant can help institutions develop new models of collaboration and distributed interaction in digital library development.
  • Item
    From books to bytes: Accelerating digitization at TTU Libraries with Kirtas BookScan APT 2400
    (2007-05-30) Lu, Jessica; Callender, Donell; Texas Tech University
    In 2006, the Texas Tech University Libraries purchased a high speed book scanner capable of scanning over 2000 pages per hour from Kirtas Technologies Inc. Funded in part with a $130,000 grant from the Lubbock-based Helen Jones Foundation, the purchase of the BookScan APT 2400 is the first of its kind by a university in the United States. The Kirtas scanner turns book pages with a vacuum head, delivering puffs of air that lift and separate pages. Books are secured on a cradle that uses laser technology to maintain focus for dual, 16-megapixel cameras that capture high-resolution page images in color. Because it uses picture technology rather than scanning technology it operates faster than its scanning counterparts. Because the book cradle and automatic page turning device is a mechanical set up that requires a clamp to hold down the pages, only books within a certain size range can take full advantage of the high speed machine. However, TTU libraries have successfully digitized tiny books with manual page turning – and it still saves significant time and facilitates post processing because of the dual-camera set-up. The companion software (APT Manager) enables automated processing in which the user can set up templates to crop pages, remove clamps, de-skew, center, sharpen images, etc., for the entire book. The default output of the book scanner is JPEG. You can determine with the template specific file formats you want the software to convert to. When templates are set for each book, the processing instructions are saved to an XML file and all the files can be run through super batch without human interference, thus significantly increasing production. Technical metadata is automatically generated during the operation and saved to an XML file. Descriptive metadata can be retrieved through a catalog search or manual data entry and output to the designated content management system. The software package also includes an OCR (Optical Character Recognition) Manager that outputs to a variety of file types: PDF, WORD, TEXT and XML. To preserve the look of the original, the “image over text” option allows users to see the PDF document as the original photographed image, but it enables full-text searching with the underlying OCRed text. Super batch can be applied to the OCR process to speed up production. All the scanned files and associated metadata files are directly saved to SAN storage via fiber connection. This acquisition has significantly accelerated TTU Libraries’ digitization efforts. The Libraries already have a variety of book scanning projects in queue ranging from rare books to theses and dissertations. Currently the digital lab is testing the new workflow and set up with a pilot project featuring donor materials. We look forward to sharing lessons learned with anyone interested.