Browsing by Subject "Metadata"

Now showing 1 - 13 of 13

ePADD: Email Archiving for Beginners
(Texas Digital Library, 2023-05-17) Banuelos, Chris
As electronic communication becomes more and more ubiquitous, what steps are organizations taking to archive and provide access to emails? Here at Rice University, the University Archives have been mandated to preserve all of the email correspondences from our newly outgoing university president. Since we've not done this before, we've started experimenting with a software called ePADD. This open-source software allows digital archivists and librarians to process email collections, preview the content for personal information that may need redaction, provide metadata, map the metadata to a local finding aid, and act as a point of contact for patrons requesting access to the content. After testing the software, we are almost ready to start archiving the collection. I'd like to take this opportunity to share with the community what I've learned from our ePADD tests. Additionally, since I've yet to formally begin the project, I'd like to ask the community to share with me any and all experiences thay have had with email archiving. My hope is that this session will be informational not only for the attendees but for the presenter as well.
Evaluation of a System Layer Design for the Visual Knowledge Builder
(2012-02-14) Gomathinayagam, Arun Bharath
When users are searching for documents, they must sift through a collection of potentially relevant documents assessing, categorizing and prioritizing them based on the current task at hand, a process we refer to as document triage. Since users' time is precious, as much information as possible should be presented to them to aid the process of document triage. This thesis presents a simple visualization and a set of features that can help users in identifying information of interest. As a part of this thesis, the System Layer of the Visual Knowledge Builder (VKB) was developed as a tab strip container. Each of the tabs presents a different type of information about Web Documents. The types of information currently included in VKB are: a summary of the Web Document, keywords based on users' interests provided by the Interest Profile Manager (IPM), popular keywords from a social bookmarking site, metadata about the Web Document, a list of outgoing links of the Web Document, and the history of the Web Document. We performed a heuristic evaluation to assess the usefulness of the new visualization and features. During the evaluation, participants were asked to rate the usefulness of each of the new web document features over a scale of 1 to 7, where a value of 1 indicated strong disagreement, and 7 indicated strong agreement. Our results indicate that the document summary, the keywords from IPM, popular tags, and the history of the Web Document are expected to be most useful during the process of document triage.
Geospatial metadata and an ontology for water observations data
(2009-05) Marney, Katherine Anne; Maidment, David R.; McKinney, Daene
Work has been successfully performed by the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) to synthesize the nation’s hydrologic data. Through the building of a national Hydrologic Information System, the organization has demonstrated a successful structure which promotes data sharing. While data access has been improved by the work completed thus far, the resources available for discovering relevant datasets are still lacking. In order to improve data discovery among existing data services, a model for the storage and organization of metadata has been created. This includes the creation of an aggregated table of relevant metadata from any number of sources, called a Master SeriesCatalog. Using this table, data layers are easily organized based on themes, therefore simplifying data discovery based on concepts.
Identifying, selecting, and organizing the attributes of Web resources
(2004) Pasch, Grete; Miksa, Francis L.
The basic human approach for referring to the real world is to represent the observed objects by their attributes, be it in natural language or in formal data models. Library cataloging is no different in using attributes to represent information resources, but its approach to data modeling is implicit and does not provide methodologies for attribute analysis. This is a critical problem when representing web resources since they differ significantly from the kinds of resources typically handled by library catalogs. The purpose of this dissertation is systematically to identify, select, and organize the attributes of web resources by means of an alternative to the traditional, library-based bibliographic model. Here, an alternative methodology is explored that combines data modeling principles from information systems theory, concepts from bibliographic modeling, and Gerard Genette's paratextual theory. The proposed methodology is applied to a working collection of 300 web resources listed by the LANIC (Latin American Network Information Center) center of the Institute of Latin American Studies, University of Texas at Austin. A semi-automatic process is used to extract attributes from the HTML code's HEAD and BODY sections, the information provided by the browser, the data about each locally stored file, and the LANIC directory pages. Attributes are also manually marked up and extracted from each pageview. As a result, a total of 290 attributes were identified and selected. The attributes were then organized according to two methods. First, a direct mapping into Dublin Core (DC) highlights the shortcomings of the traditional approach: two thirds of the attributes found do not match any DC elements, and questions about the structure and meaning of the DC elements are underscored. Second, the matching of each attribute to its parent entity resulted in a model with 35 entities grouped into four categories: agents, binders, components, and original documents. These entities highlight the origin of each attribute, help model the life cycle of the information entities, and offer an alternative source for attribute values. The 37 unmatched attributes (expressive, navigational, and directive attributes) hint at the possible application of a social relativist approach for modeling them further.
Out of the Woods: Charting Metadata with Digital Tools
(Texas Digital Library, 2022-05-23) Ramirez, Ada Laura; Smith, Marian; Bowaniya, Salima; Weidner, Andrew
In the fall of 2021, a metadata working group was created and charged to streamline the process of evaluating and refining metadata for a retrospective thesis and dissertations digitization (TDD) project at the University of Houston Libraries. The group took to their task by improving existing workflows and reworking scalability through the introduction of an updated automated tool kit created for the team by another member involved with the TDD project. Using MARC records as an existing foundation, metadata was transformed into Dublin Core formatted records with MARCEdit and OpenRefine. Group members then evaluate each Dublin Core metadata record and edit and enhance metadata as needed. As part of the workflow, copyright status is also evaluated and noted in the metadata record. The automated tool kit aids in scaling production by allowing for batch metadata verification, file sorting, and writing EXIF data to the PDF files. This poster highlights the MARC to Dublin Core metadata transformation and the use of the automation tool kit to streamline the metadata process, a necessary step in a large-scale digitization project that promotes accessibility to scholarly materials.
Separating data from metadata for robustness and scalability
(2014-12) Wang, Yang, Ph. D.; Alvisi, Lorenzo; Dahlin, Michael
When building storage systems that aim to simultaneously provide robustness, scalability, and efficiency, one faces a fundamental tension, as higher robustness typically incurs higher costs and thus hurts both efficiency and scalability. My research shows that an approach to storage system design based on a simple principle—separating data from metadata—can yield systems that address elegantly and effectively that tension in a variety of settings. One observation motivates our approach: much of the cost paid by many strong protection techniques is incurred to detect errors. This observation suggests an opportunity: if we can build a low-cost oracle to detect errors and identify correct data, it may be possible to reduce the cost of protection without weakening its guarantees. This dissertation shows that metadata, if carefully designed, can serve as such an oracle and help a storage system protect its data with minimal cost. This dissertation shows how to effectively apply this idea in three very different systems: Gnothi—a storage replication protocol that combines the high availability of asynchronous replication and the low cost of synchronous replication for a small-scale block storage; Salus—a large-scale block storage with unprecedented guarantees in terms of consistency, availability, and durability in the face of a wide range of server failures; and Exalt—a tool to emulate a large storage system with 100 times fewer machines.
Session 2B | Achieving Unified Search across Digital Repository Platforms
(2022-05-24) Creel, James; Huff, Jeremy; Savell, Jason; Welling, William; Laddusaw, Ryan; Day, Kevin
"Texas A&M University Libraries have begun going live with production exhibits in the new open-source Solr AGgregation Engine (SAGE). SAGE has been in development since 2019 and consists in two complimentary feature sets: (1) The aggregation of multiple Solr indices into a target index with arbitrary fields, and (2) Curator-configurable views of any Solr index with custom filters, facets, and display fields. The ubiquity of Solr indices in library applications like VuFind, Blacklight/Spotlight, DSpace, Fedora, and others, juxtaposed with the tantalizing prospect of one search interface across the myriad of library holdings, led readily to the concepts behind SAGE. Once the development team implemented the aggregation functionality, it proved straightforward enough to make a configurable display of the fields of the bespoke Solr documents SAGE was writing. Soon, the development team was demonstrating rough views of synthesized collections of DSpace, Fedora, and Spotlight documents. However, when it came time to prepare these views for curatorial management and public display, numerous issues arose for the product owners. Among other things, the views posed problems with formatting and normalization of metadata, uniquely identifying objects, and providing viewers for content like images, PDFs, and A/V. Resolving these issues to the satisfaction of product owners has yielded the first production hybrid collection in SAGE, the Apfelbaum collection of World War I Postcards. In this presentation, we will describe the means whereby this content from DSpace and Fedora repositories was brought together in a harmonious view."
Session 2C | Wrangling Serial Titles and Place Names in the UNT Libraries’ Digital Collections
(2022-05-24) Phillips, Mark
The UNT Libraries’ Digital Collections has grown to include over 3 million unique digital resources including maps, newspapers, photographs, audio, and video records. These digital collections use the UNTL metadata format, that is based on Dublin Core and includes qualifiers that allow for more specificity about a field to be represented. While the UNTL metadata format works well in describing a wide range of digital resources held in our collections, one thing that has not been modeled well historically is the concept of a “Title” such as a serial title for a newspaper, like the Austin American-Statesman or a “Place” such as Denton, Texas. This past year we have taken the first steps to manage titles and place names in a more robust way in the UNT Libraries’ Digital Collections. This involved the creation of a system to model the concept of a Title and the concept of a Place that could be populated with information that provides descriptive and specificity to adequately represent these concepts. Trying not to reinvent the wheel, this approach leveraged data from the Library of Congress databases to link title records with existing LCCN and OCLC numbers. Likewise places are linked with Geonames and Wikidata to provide equivalences between systems. Finally appropriate user interface elements were integrated into the system to expose this information to the end user so that they are able to make use of this effort in identification and disambiguation of these concepts. This presentation will present the problem we were facing, explain the approach, and provide examples of next steps in this space.
Session 3A | An attempt at metadata enhancement through machine learning
(2022-05-25) Peters, Todd; Long, Jason
This presentation will share what learned about machine learning and applicability to generate metadata to enhance discoverability during a pilot project. Object detection through neural networks is a rapidly developing field. Using machine learning large sets of images can be analyzed, objects detected and classified. We used the pretrained models COCO, Inception, ResNet, VGG19, and Xception to classify objects in images in our San Marcos Daily Record newspaper negative collection. Our initial use of these models did not yield usable metadata, however it did provide a useful first step into machine learning and knowledge to develop future research.
Session 3D | Evaluation and Adaptation: How Change Allowed Us to Thrive
(Texas Digital Library, 2022-04-25) Speer, Elizabeth
As a participant in the Program for Cooperative Cataloging (PCC) Wikidata Pilot, Texas A&M University (TAMU) created linked data in the Wikidata platform for a selected sample of mechanical engineering students, their doctoral dissertations and faculty advisors. This presentation will provide an overview of how an inter-departmental team of four cataloging/metadata librarians and one curator used tools such as OpenRefine and Mix’n’Match to populate Wikidata with metadata from OAKTrust, the TAMU institutional repository, as well Scholars.tamu.edu, its VIVO database of faculty member profiles. It will also describe efforts to manually enhance the items that were created and issues that were encountered, as well as experimentation with SPARQL queries to demonstrate the value of the transformed data. Finally, this presentation will cover potential implications that Wikidata may have for library workflows regarding the management and disambiguation of persons and other entities in the TAMU Libraries’ catalog and institutional repository.
Session 3D | Giving CRediT Taxonomy its Due
(Texas Digital Library, 2022-05-25) Barba, Shelley; Chapman Tripp, Hannah; Kapacinskas, Natalia; Lowe, David; Thompson, Santi
A subgroup of the TDL-sponsored Research Integrity Working Group has been meeting monthly to discuss curriculum development and planning for a patron-focused workshop themed around authority issues in research and publishing with a focus on the CRediT taxonomy. Join us as we discuss the path of our work and what we’ve learned thus far about the use, implementation, and scholarly controversy of the CRediT taxonomy, which defines 14 roles related to creating and authoring research-related works. As an academic publishing topic, the CRediT Taxonomy has implications for scholarly communication. As a tactic for managing power relationships (such as between a graduate student and tenured faculty member), it has implications for equity, diversity, and inclusion initiatives. As a bibliographic feature, it has implications for metadata and indexing specialists. As a crediting mechanism, it has implications stretching from evaluative processes to the integrity of academic research. We invite you to help us as we think through effective ways to introduce this seldom discussed topic with our faculty and students.
The Obama Administration and digital content : a case study of Healthcare.gov
(2016-05) Gant, Alia; Wickett, Karen M.; Towery, Stephanie
The United States government has made enormous strides to adapt and evolve with the digital era in the 21st century. Initially the Clinton Administration in the 1990s showed a sense of acceptance and willingness to work with the changing times in regards to technology. The subsequent administrations also continued to support platforms that utilized digital programs such as the Internet. This Master’s Report will examine government websites under the Obama Administration, in particular Healthcare.gov, however through the perspective of information professionals. The report will describe and analyze the information pertinent to users to accessing health needs for insurance plans. The report will discuss and apply frameworks from information studies, including metadata, digital libraries and community informatics Lastly, the report will provide critiques, suggestions, and ways to research this topic in the future.
Themes in videogame research : a content analysis of scholarly articles
(2010-08) Broussard, Ramona Lindley; Geisler, Gary; Feinberg, Melanie
In trying to provide access to videogame materials for scholars, collecting organizations must build standards for building and structuring collections, and in turn information professionals must assess the information needs of users. In order to begin the assessment, this paper presents a content analysis of scholarly videogame articles. The results of the analysis will provide the basis for structuring videogame archives, libraries, or databases. Metadata schemas are important to access, and to collecting. That metadata will aid patrons is widely accepted, but too often schemas and vocabularies are based on only experts’ opinions without taking into account patrons’ ideas of what is important. To address this dearth, the content analysis presented in this paper combines historical ideas of metadata standards from expert archivists with an analysis of what themes are important, common, and sought for in the literature of videogame scholars, who are the likely users of videogame collections.

Browsing by Subject "Metadata"

Results Per Page

Sort Options