2012 Texas Conference on Digital Libraries

Permanent URI for this collectionhttps://hdl.handle.net/2249.1/56867


Recent Submissions

Now showing 1 - 20 of 23
  • Item
    Filter Media TEST REMOVE
    (Texas Digital Library, 2018-05-31)
  • Item
    Outreach & Collaboration: Strategies for Digital Repositories
    (2012-05-25) Waugh, Laura; University of North Texas
    The University of North Texas (UNT) launched the UNT Scholarly Works repository in October 2010. Since that time, UNT Scholarly Works has continued to grow as a tool for promoting access to the research, scholarship, and creative activities from the university's community. This digital repository was built into an existing infrastructure and its increasing growth has relied heavily on successful faculty outreach and collaboration within the UNT community. This presentation traces the development of our digital repository and discusses strategies for reaching faculty, developing relationships within an organization and beyond, and collaboration to support digital repositories and promote open access.
  • Item
    Digital Collaboration: Effective Partnerships & Repository Management
    (Texas Digital Library, 2012-05-25) Tarver, Hannah; Moore, Jeremy; University of North Texas
    The UNT Libraries Digital Projects Unit regularly collaborates with other departments, campus entities, and external institutions. We currently have over two hundred partners of various kinds contributing to the more than 260,000 digital objects in our system. Our presentation will discuss procedures and techniques that can help to streamline collaborative projects, and outline some of the concerns that institutions may want to keep in mind when starting similar projects. We will focus on providing suggestions to help others have more successful collaborative digital projects including: considerations at the initial point of contact, managing the practical aspects of the process to make digitization run smoothly, and the benefits of collaborative projects for participants and the users that access their digital items.
  • Item
    The Browning Letters Online
    (Texas Digital Library, 2012-05-25) Stuhr, Darryl; Baylor University
    The Baylor University Electronic Library Digitization Group partnered with the special collection Armstrong Browning Library in summer 2011 to digitize, and place online, 2,800 letters written to and from Robert Browning and Elizabeth Barrett. Wellesley College also joined the partnership and offered to share 573 of their digitized letters with Baylor to help develop the virtual collection of Browning Letters. Baylor was excited to partner with Wellesley because they own the original love letters of Robert Browning and Elizabeth Barrett. The presentation will share the Digitization Group’s experience and will cover the collaborative component of the project, in-house and outsourced digitization, project workflow including data migration between systems, batch loading metadata and objects into The Baylor Library digital collection access system CONTENTdm, and the handling of full-text transcripts to the digital objects. The target audience is libraries interested in mounting digital letter collections and those interested in collaborative digital projects.
  • Item
    TDL Systems and Support: Developing Our Connected Future
    (2012-05-25) Steans, Ryan; Texas Digital Library
    This presentation discusses Texas Digital Library (TDL) staffing changes, the helpdesk, software, and systems.
  • Item
    Student Scholarship: A Valuable Asset and Explaining Downloads
    (2012-05-25) Shields, Patricia M.; Rangarajan, Nandhini; Texas State University
    This presentation draws on the experience of a group of Texas State student papers which have been downloaded over 370,000 times (400 papers) for average downloads of over 900. These extraordinary download rates are a product of department and library collaboration. Using these Master of Public Administration, capstone papers as a kind of best practice the presentation will examine 1) compelling ethical arguments for widespread distribution of well vetted student papers; 2) successful library/academic department collaboration; 3) factors that influence downloads (using regression analysis); 4) ways student papers are being used; and 5) suggestions for ways enhance inter-university collaboration, and use of student papers.
  • Item
    Archiving on the Go: Facilitating Auto-Archiving of Evolving Digital Collections
    (2012-05-25) Scott, Bethany; University of Houston
    For more than 10 years now, archivists have proclaimed the importance of early intervention during the records creation process in order to assure their long-term preservation. In the academic context, the infrastructure and services needed are still in process of definition – while librarians are ready to provide instructional materials and guidance to implement metadata management plans, ongoing support for researchers creating their collections is not in place. Moreover, typical institutional repositories do not provide storage services for working /ongoing collections, or widespread support for issues like bulk uploads, overall amount of storage space, metadata creation, or privacy protection that such a collection requires. This project presents a case study of guiding an evolving digital collection that has expanded beyond the creator’s (and the IR’s) capability to easily manage and preserve it. In this presentation, we will first describe a unique collection of digital fine art photography, the working process and information management actions of the creator, and his needs for a digital archive system to easily store, search, and retrieve files for further editing. Through a detailed interview, we gained information about the artist’s process of working, from taking photographs, digitally processing them, and storing them on external hard drives. The current collection is very large, both in the number of individual files and in the typical file size – often over 3 GB per image. Because the collection currently spans over 100 individual hard drives, it is both unwieldy to search and manage images, and it is more vulnerable to data loss through hardware failure. A secure and easy to use remote storage solution will allow him to organize and view his entire collection at once, and this improvement will save time in processing activities, so that the artist can devote more time to creating new images. TACC provides the computational resources and research and development expertise to implement this system. We will discuss the benefits (such as the ability to work with developers to improve bulk uploads and metadata mapping ) and the limitations of this case study (such as the slow transfer speeds encountered through some networks, and the problems of human error in applying file naming conventions for automated metadata extraction). By becoming involved in the artist’s information management processes during this early point in the file life cycle, we not only allow him to more efficiently manage his own time, but also ensure that the files are accessible and well organized for the archivists and researchers who may be dealing with the collection in the future. As digital collections continue to evolve it will be crucial to provide long-term, secure storage and preservation. The increased high-performance storage resources now available facilitate this goal. More proactively approaching the creators of research collections to provide data management services complement the storage availability, allowing researchers to continue to create, curate and preserve their own collections.
  • Item
    The Wiki Method: All the Promises of Web 2.0 for Minimal Fuss
    (2012-05-25) Bueno, Natalie; Perrin, Joy Marie; Texas Tech University
    Since 2009, we have been looking for a way to revamp Texas Tech University’s Digital collections. In December of 2010, we started discussing the idea of using a wiki for displaying the collections. Using Mediawiki (the same software that Wikipedia uses) and the Mediawiki Semantic Extension, we processed a few key digital collections into the wiki. We have started testing the wiki digital collections with Faculty in Spring 2012. The benefits of the method have been immediately apparent. We have more control over how our collections are displayed than we ever have. We have more options for collaboration since wiki’s are designed to be edited from the browser. This has allowed us to do experiments in crowdsourcing OCR correction, and playing with semantic web concepts in digital collections. In this presentation, we will review the work we’ve done at Texas Tech. We will discuss the many options for how to organize digital collections in Mediawiki, some interesting uses of the Semantic Extension, and describe some of the response we’ve gotten from faculty during our testing period.
  • Item
    Streaming Texas - A Case Study of the Texas Archive of the Moving Imager
    (2012-05-25) Peck, Megan; Texas Archive of the Moving Image
    In an online environment proliferated by video, but in which few organizations are independently streaming their own content, the Texas Archive of the Moving Image (TAMI) has rapidly developed as a leader in the field. TAMI is an independent 501(c)3 organization dedicated to promoting the preservation of and access to Texas’ moving image heritage. The organization’s focus is to digitize and provide easy access to these materials via the web, communicating Texas history across the state, nation, and world. Since kick off of their main program, The Texas Film Round Up in 2008, the archive has digitized nearly 14,000 moving image items, of which over 1,500 have been described and uploaded for free public access in the Online Video Library. Elizabeth Hansen, Director of Outreach and Education, and Megan Peck, Digital Librarian, will present a case study of TAMI’s approach to connecting with communities, both online and real world. The discussion will address a number of considerations used to develop a holistic strategy for connecting with users. This strategy incorporates social media and other tools used to invite the public to the library, as well as measures to shape the library user’s experience, such as the building and curation of relevant collections, and the providing of a crowd sourcing tool to foster user participation and contribution. The panel will report on successes achieved and challenges faced in its implementation and management of this strategy, and show off some selections from the collection.
  • Item
    Texas Digital Library: Building a Connected Texas
    (2012-05-25) McFarland, Mark; Texas Digital Library
    The ability of scholars and institutions to provide access to research data, publications, and special collections of materials has become a necessity for all institutions of higher learning. This need provides a tremendous opportunity for libraries to provide leadership on their campuses, and the services and organization created by the Texas Digital Library consortium has become more relevant than ever. The membership of the Texas Digital Library has recently adopted a strategic plan that sets goals and provides structure to the paths of member institutions as they work to build a presence for digital libraries on their campuses and build connections that reach well beyond the borders of Texas. In order to achieve this vision, the Strategic Plan lays out four ambitious initiatives aimed at: 1. growing scholarly content hosted by TDL; 2. marketing TDL services effectively to our member institutions; 3. developing collaborative special collections; and 4. mining the talents and skills of members to augment the talent and expertise of TDL staff. In this presentation, we will discuss how TDL is working to execute on these initiatives, as well as provide updates on ongoing software development projects, collaborations with partners outside of Texas, and ongoing efforts to provide TDL members with the highest level of quality of service and organizational development.
  • Item
    Corral: A Texas-scale Repository for Digital Research Data
    (2012-05-25) Jordan, Chris; Texas Advanced Computer Center
    At the end of 2010, the Regents of the University of Texas funded an investment of $23 million in research cyberinfrastructure, including a replicated, multi-petabyte research data repository and a dedicated research network linking all 15 UT campuses. The Texas Advanced Computing Center has led the effort to design and deploy the research data repository, which is now operational and supporting researchers from Austin to El Paso. This presentation will discuss the human and technical challenges of deploying large-scale research data infrastructure, and provide an overview of the design from a non-technical standpoint, explaining the concept of operation for the project, how it is situated relative to projects like TDL, and how staff at TACC have worked to engage and educate researchers from all UT campuses to make use of this valuable resource. Future directions for the project, including long-term preservation, will also be addressed.
  • Item
    Enhancing Educational Access to Art
    (2012-05-25) Higgins, Jessica; Karadkar, Unmil P.; Pavelka, Karen; Zinser, Catherine; University of Texas at Austin
    Art museums are an important unit on several university campuses. These museums bring value to the university community by serving as custodians of paintings, sculptures, prints, and drawings. These museums serve as a resource of unparalleled importance in education related to art, architecture, language, and culture by providing instructors with access to rare artifacts of cultural significance. While the museum staff is committed to helping faculty locate items of interest, they are hard pressed for time and do not always possess the domain-specific vocabulary used by instructors in diverse disciplines. Artifacts in the museums are organized and described by museum professionals, while they are used by academics. The resulting disconnect between the expectations of both groups affects the use of these artifacts. We aim to address this issue by enhancing a collection of prints and drawings at the Blanton Museum of Art with a rich, domain-specific description that meets the expectations of a multi-disciplinary faculty. Instructors in several departments at UT Austin use the Prints and Drawings Collection as a teaching tool. This collection includes over 13,000 artifacts, which were executed over four centuries. This is a closed collection and the collection manager provides access to specific prints and drawings upon request. The metadata related to the prints can be accessed only through computers situated in the museum, further limiting access to it. Thus, instructors are unable to browse the collection at their convenience and rely heavily on the Blanton staff to provide suggestions for relevant works. This practice results in a small pool of items being viewed repeatedly, while other prints of interest go unnoticed. We take a used-centered design approach to create a prototype of a richly described repository of artifacts from this collection. We started by conducting interviews of faculty in the areas of Art, Art History, French, and Architecture to gain an understanding of their challenges in accessing the collection and their needs for effectively locating items of interest. Based on the responses from these instructors, we have made two modifications to the infrastructure: firstly, we populated a repository using CollectiveAccess, an open source repository software, with representative samples of prints used by these instructors to enable long-distance, internet-based access. We also augmented the metadata contained in the museum’s proprietary cataloging software to include fields and content desired by the instructors using the Getty Institute’s CDWA Lite schema. The resulting repository is thus based on open standards, improving the potential for its use by various demographics on campus, as well as, improving its visibility for remote users and repositories through interoperability protocols. We are currently evaluating this prototype repository. In the first stage, we are evaluating our design with the help of the instructors who set the expectations for this repository. This evaluation will help us fine tune the interface features, repository architecture, as well as our use of the CDWA Lite schema.
  • Item
    International Collaboration and Digital Archives: The Guatemalan National Police Historical, Archive Project at UT Austin
    (2012-05-25) Diaz, Jade; Norsworthy, Kent; University of Texas at Austin
    This presentation will showcase a large-scale collaborative digital initiative undertaken at UT Austin in 2011: the Guatemalan National Police Historical Archive (AHPN) project. We believe the AHPN project provides a compelling example of how libraries can leverage the power of collaborative relationships—both across campus and globally—to build digital research resources that are transformative in nature. The presentation will provide a description and overview of the processes used by the University of Texas Libraries to construct a universally accessible digital archive from a collection of over 10 million digitized pages of records from Guatemala. The AHPN project story begins in 2005 when Guatemalan investigators fortuitously discovered nearly 8,000 linear meters of documents created by the Guatemalan National Police in a series of rat- and cockroach-infested abandoned buildings. The documents included hundreds of thousands of identification cards, vehicle license plates, photographs, and police logs. More importantly, they included loose files on kidnappings, murders, and assassinations created during nearly four decades of intense civil conflict beginning in the 1960s. The government and police had long denied the existence of this National Police archive, particularly during truth commission investigations in the 1990s. After this discovery, the Human Rights Ombudsman office assumed custody of the Archive under an order issued by the nation’s Civil Court. In 2009, responsibility for the AHPN was transferred to the Ministry of Culture where it is under the direction of the Archivo General de Centroamérica (AGCA), Guatemala’s national archive. With over 80 million pages of documents, the AHPN represents the largest single repository of documents ever made available to human rights investigators. Following years of painstaking work to clean, identify, classify, organize, describe and digitize the documents, in 2009 the AHPN opened a professionally-staffed public reading room to provide access to the digitized documents for anyone who could visit the Archive in person. Staff continue to digitize around the clock and, as of March 2012, they had completed scanning of over 12.5 million documents, predominantly those from the most intense years of conflict, 1975-1985. In December 2011, the AHPN and UT Austin took the bold and unprecedented step of putting the entirety of the digitized collection online with unfettered universal access. At ahpn.lib.utexas.edu, created at UT through a campus-wide partnership, users can now search or browse the entire contents of the digital archive. In this way, an important part of the nation’s historical patrimony has been preserved and opened up for all citizens to consult as they work to discover and make sense of their own history. In addition to detailing the background on the AHPN itself, this presentation will cover the following areas: • Collaborative nature of the project • Technologies used for the digital archive • Process used to build the web presence • Challenges in the areas of design, development, and the project itself • Lessons learned
  • Item
    Connecting with Users beyond Language Boundaries through Multilingual Information Access for Digital Collections
    (2012-05-25) Chen, Jiangping; University of North Texas
    Very few digital collections in the United States support multilingual information access (MLIA) that enables non-English users to search, browse, recognize, and use information from multilingual digital objects. In the increasingly global knowledge society, libraries and museums need to design and implement effective and efficient MLIA in order to serve broader user groups and to sharing information with global societies. This presentation will discuss and demonstrate a research project titled "Enabling Multilingual Information Access to Digital Collections: An Investigation of Metadata Records Translation," which is a collaboration of four entities: The Department of Library and Information Sciences in the College of Information at the University of North Texas(UNT); the UNT Libraries Digital Projects Unit (DPU); the School of Information Management at Wuhan University, China; and the Autonomous University of the State of Mexico (UAEM) in Mexico. The project is jointly funded by U.S. Institute for Museum and Library Services (IMLS: http://www.imls.gov/) and UNT. It aims to evaluate the extent to which current machine translation technologies generate adequate translation for metadata records, and to identify the most effective metadata records translation strategies for digital collections. During the first year of this project, the research team developed HeMT (http://txcdk-v10.unt.edu/HeMT/): a multilingual participatory platform for human evaluation of machine translation. HeMT is used by three types of users including translators, evaluators, and reviewers. It consists of six major modules: User Management, Manual Translation, User Training, Evaluation, Result Visualization, and Multilingual Lexicon Management. HeMT can be used by digital libraries and machine translation communities for conducting manual translation and machine translation evaluation tasks. A usability testing has been conducted during the development of HeMT. Evaluators recruited from China and Mexico have used HeMT to perform the evaluation of metadata records machine translation. The evaluation results can be visually presented and viewed in real-time. The second phase of this project will focus on exploring effective Multi-engine Machine Translation (MEMT) strategies in order to provide guidance for digital libraries that are interested in implementing MLIA. In order to train our MEMT system, we are seeking for collaborations with libraries in China and Mexico through our partners in these two countries. Specifically, we expect to obtain metadata records in Chinese and Spanish to develop the language models for metadata records for English-Chinese and English-Spanish machine translation. Our future work will focus on evaluating and implementing the metadata records translation strategies identified from this project through collaborating with 1-2 digital collections in different subject domains. Digital libraries should connect effectively with their users and collaborators for sustainable development and services. This presentation will also discuss challenges and benefits of crowdsourcing and collaboration based on our experience in this project.
  • Item
    A Digital Repository for the World of Physical Culture
    (2012-05-25) Caldwell, Lesley; Sipes, Brent; H.J. Lutcher Stark Center for Physical Culture and Sports; University of Texas at Austin
    The H.J. Lutcher Stark Center is a burgeoning special collection library, archive, and museum that celebrates the world of physical culture and sports. The collection includes hundreds of thousands of cataloged -- and mostly uncatalogued items -- ranging from historical nutrition texts to decades old weightlifting equipment. Led by scholar-athletes Jan and Terry Todd, the Stark Center opened its doors in 2008. One of the largest donors to the Stark Center was Ottley Coulter who is recognized as the first historian of bodybuilding. A circus strongman, writer, and one of the founding fathers of weightlifting in the United States, Coulter was a hobbyist collector saving thousands of newspaper clippings, correspondence, and weightlifting publications. The Todds met Coulter in 1964, and eventually acquired his personal collection in 1975. Housing the Coulter collection and other rare materials, the Stark Center’s collection is the largest of its kind and the staff recognized the need to make their collection accessible to researchers unable to travel to their University of Texas location. The presentation will begin with the assessment process we went through when selecting our software solution. As a new institution with one dedicated librarian and no in-house technical staff, we had to consider our limited funding and staff size in our planning. Ultimately we chose to go with the open-source solution DSpace. Next the presentation will cover the steps necessary to create a digital repository, from testing and training to data transfer. With that we will share the issues we faced throughout the development cycle. For example, though Ottley Coulter was a collector, he was not a librarian. The items in his collection are briefly labeled if labeled at all. Hence, issues with controlled vocabulary stemmed from the fractured and chaotic nature of the Coulter clippings. Features of the presentation will include our transfer of 3,500 files from File Maker Pro into DSpace, the benefits and limits of optical character recognition (OCR) for archival documents, our rationale for and design of a one-page external submission form, and a tour of our DSpace collection. Our presentation will give insight into the struggles and successes of setting up a new library and establishing best practices for building an information repository in a start-up institution. The platform that DSpace provides for organizing rare materials and making them accessible to sport historians across the world is something Ottley Coulter could never have imagined when he started clipping newspaper stories one-hundred years ago.
  • Item
    The Story of the Realia Collections at UT Austin: How Three-Dimensional Teaching Objects Can Intersect With Digital Libraries
    (2012-05-25) Buckley, Annette; University of Texas at Austin
    In the theme of collaborative digital projects between cross-campus university entities, I will present the story of The Realia Collections (TRC) at the University of Texas at Austin, spanning the following topics: • Survey of how comparable Association of American Universities members promote their physical object collections online • TRC’s purpose and creation process • Potential for expansion and evolution • Possible replication at other institutions TRC was entirely conceived and built within one semester (Fall 2011) by a group of ambitious master’s candidates. Not only was the project an interdisciplinary effort between UT’s School of Information and Art History Department, but as evidenced by the over two dozen people on its Acknowledgements page, it exemplifies the enthusiasm of departments for contributing to new digital tools to aid in scholarship. TRC is an online directory-type finding aid that lets individuals locate the realia (three-dimensional objects used for teaching or research) scattered throughout the dauntingly large UT Austin system. Individuals may then follow up with those collections’ administrators to conduct research. The website does not discriminate against housing institution; collections are in university departments, cultural centers, research labs, museums, etc. It also allows for group maintenance via both a chief administrator and individual collection managers, who may log in and update relevant metadata about their collections. Overall TRC serves a viable model for uniting physical objects through a digital environment, which not only promotes the respective collections and departments, but also helps differentiate the university and its holdings from comparable research institutions in the AAU. Besides giving the back story of TRC, I will also highlight the ways that my group dealt with particular challenges we faced, e.g. determining how much metadata to include, choosing a controlled vocabulary for finding similar types of collections, and customizing the back-end of the website to offer long-term administrative usability.
  • Item
    The Sissy Farenthold Papers Digitization Project: Creating an Online Exhibit Through Cross-Departmental Collaboration
    (2012-05-25) Bastone, Gina; Bernard and Audre Rapoport Center for Human Rights and Justice; Dolph Briscoe Center for American History; University of Texas at Austin
    Frances T. “Sissy” Farenthold is a well-known, important figure in Texas politics and the national women’s movement. She also spent much of the last four decades working on global human rights issues. The Bernard and Audre Rapoport Center for Human Rights and Justice partnered with the Dolph Briscoe Center for American History to identify, organize, and digitize Farenthold’s papers relating to human rights. The Rapoport Center has created an online exhibit focusing on Farenthold’s human rights work, placing it in the context of her life as a lawyer, legislator, activist, and important mentor to numerous women. It is aimed at human rights and women’s rights researchers and historians, as well as a more general audience that is interested in Farenthold and the issues she is so passionate about. The Rapoport Center has completed a website featuring a selection of scanned documents from Farenthold’s papers, as well as video interviews with Farenthold and Genevieve Vaughan, her collaborator on a number of projects. The scanned documents and interviews focus on Farenthold’s work with the anti-nuclear peace movement of the 1980s, particularly her efforts with women’s groups for nuclear disarmament and for women’s human rights. The website includes contextual and historical background information about Farenthold, the organizations with which she worked, and the larger historical events of the time (such as significant peace movement protests and the Reagan-Gorbachev Summits). You can view the website here: http://www.utexas.edu/law/centers/humanrights/farenthold/ In order to bring this project to life, multiple departments across the University of Texas have played a role, making it a truly collaborative effort. The project initially started as a capstone project for a Master’s student in the School of Information. Faculty and staff from the School of Information and the Center Women and Gender Studies gave the student consultation and advisement. Support staff from the Rapoport Center, the Briscoe Center, and the School of Law had a hand in making the website, through help with back-end site design, photo and document scanning, and video editing. Several undergraduate interns worked on every step of the project, from an initial inventory of the papers to the final proofreading of the website copy. Our presentation will focus on the history of our project and a short demo of the site. We will also share some lessons we learned from the process. Here are a few of those lessons: • Know what you are capable of and what you cannot do • Do not be afraid t ask for help • Be sensitive to others’ workloads but also be assertive • Be open to change – every draft and iteration can be improved • When working in a non-profit setting (such as in archives and the human rights sector), utilize free resources as much as possible
  • Item
    10 Weeks to Success: How to Quickly and Effectively Build a Collaborative Digital Collection
    (2012-05-25) Allen, Christy; University of Texas at Arlington
    Creating a digital collection typically requires a lot of time and thoughtful planning. But what would happen if you only had 10 weeks to plan and build a digital collection from scratch? That was the dilemma faced by the University of Texas at Arlington when the Center for Greater Southwestern Studies, the Library, and the Department of Modern Languages collaborated on the digital collection “A Continent Divided: The U.S. – Mexico War.” This ambitious effort involved scanning and describing dozens of items, writing detailed essays and biographies, translating Spanish language materials, and designing/building a MySQL database and website to access the collection. All of this and more was accomplished in less than 3 months. Digital Projects Librarian, Christy Allen, will discuss the project and offer insights, guidance, and lessons learned, relevant to anyone who may be implementing a digital collection in a brief period of time.
  • Item
    “Mapping the Southwest”: UNT-UTA Collaborative Project
    (2012-05-25) Alemneh, Daniel; Jones, Jerrell; Hodges, Ann; University of North Texas; University of Texas at Arlington
    Mapping the Southwest is a 3-year project (2010 to 2013) funded by a National Endowment for the Humanities (NEH) We the People grant. For this project, the University of North Texas (UNT) Libraries partner with the University of Texas at Arlington (UTA) Library’s Special Collections to digitize 5,000 historically-significant (mostly) rare maps. The collection includes maps dating from 1493 to the present and features noted cartographers. While containing maps of all parts of the world, the collection particularly emphasizes the region of the Gulf Coast and the Greater Southwest, which has been defined as the area comprising the state of Texas and the other southwestern states annexed by the United States after the U.S. War with Mexico of 1846-1848. All of the materials digitized for this grant project will be available online for free public access through The Portal to Texas History. More than 1,000 items are already available at http://texashistory.unt.edu/explore/collections/UTAM/browse/. We have registered almost 20,000 uses, and as we complete the project, we expect even more users around the world to access this new collection. In addition to showcasing the cartography of the region, the Mapping the Southwest project seeks to promote best practices and to advance the capacity of academic libraries to reliably curate, preserve, and provide seamless access to historic maps, atlases, and related wide-format items. This panel brings together diverse stakeholders and provides information on the project’s background, deliverables, workflow, and major areas of activity. The participants on this panel will discuss a number of issues from both institutions’ perspectives: • The UTA group will discuss the importance of the collection, selection criteria, cataloging and metadata operations (including workflow for maps without existing MARC records), preparation of the maps for transporting, and the possible impact of the project in facilitating access to such unique and valuable resources. • The UNT team will discuss organization and management of collaborative activities, workflow for capturing and processing digital images of the maps, assessments and enhancements of the quality of the digital images and metadata records, ensuring long-term access and key lessons learned along the way. As we are now starting the second half of the Mapping the Southwest project lifecycle, the project team looks forward to sharing its progress at the upcoming 2012 TCDL Conference.
  • Item
    Web Archives and Large-Scale Data: Perliminary Techniques for Facilitating Research
    (2012-05-25) Woodward, Nicholas; Norsworthy, Kent; Texas Advanced Computing Center; University of Texas at Austin
    The Latin American Government Documents Archive (LAGDA) is a collaborative project of the University of Texas Libraries, The Nettie Lee Benson Latin American Collection, and the Latin American Network Information Center (LANIC) at The University of Texas at Austin that seeks to preserve and facilitate access to a wide range of ministerial and presidential documents from 18 Latin American and Caribbean countries. Web crawling is conducted quarterly using the Internet Archive’s Archive-It application. The resulting Archive contains copies of the Websites of approximately 300 government ministries and presidencies between 2005 and the present. Currently, LAGDA is comprised of approximately 66.6 million documents archived from the Internet, totaling 5.6 terabytes of data. The collection increases in size by an additional 250 gigabytes with each quarterly crawl. Content in the Archive includes not only the full-text versions of official documents, but also original video and audio recordings of key regional leaders, all archived in the ARC file format produced by the Heritrix web crawler. Archive contents include thousands of annual and "state of the nation" reports, plans and programs, and speeches by presidents and government ministers. The data include HTML-formatted pages, Microsoft Word documents, Adobe PDF files and RTF documents, as well as various audio and video formats. The collection includes only sparsely populated metadata. Promoting research of the collection is a central component of the LAGDA project, and towards those ends staff has collaborated with researchers at the Texas Advanced Computing Center (TACC) using the LAGDA data to develop text-mining methods for document representation and classification. This includes implementing several strategies to mechanically classify and categorize information contained in the Archive in order to facilitate search and browse capabilities. Additionally, LANIC and TACC have worked together to create methods for research on sub collections in the Archive, e.g. presidential speeches or ministerial documents. Preliminary results of these efforts have been encouraging, and they are the initial steps on the path towards solutions that will make large-scale data more accessible to researchers. The challenges presented in LAGDA are similar to those faced by academic libraries across the country as they are increasingly faced with “big data” collections that necessitate new strategies for data analysis tools. Nascent projects such as LAGDA provide some initial insights into how academic libraries can work collaboratively to facilitate research on the types of large-scale collections that are increasingly prevalent in today’s digital world. The presentation will focus on the following components: Challenges presented by Web archived data “Big data” and data-driven research The role of libraries in data analysis The future of “big data” and libraries