Browsing by Subject "Computer science"

Now showing 1 - 9 of 9

Computer applications in developing countries
(Texas Tech University, 1991-08) Umerah, Gabriel Azuka
Training in computer applications initially started in the United States (without planning) as a result of local concerns for citizens to become computer literate. It was later coordinated by state and federal concerns to allow for more orderly planning. Developing countries generally have embraced the introduction of new technologies without planning. It is not advantageous for countries struggling to service their debts to developed nations to embark on computer applications without planning. To do so means foregoing the experiences and benefits that can be derived from over two decades of research on computer applications. It might mean economic suicide for developing countries. It is important that developing nations take advantage of such benefits as software portability, CAI, CMI, programming, results of research on students, and computer planning. Nigeria, a developing country, seems to be following the path of the early days of computer applications in the United States and has not trained teachers in order to introduce computers to schools. The research focused on the design of a computer application model for teacher training colleges in the state of Anambra, Nigeria by: (1) using a general survey to determine the needs and attitudes about computer applications; (2) using Delphi techniques to establish factors that affect computer applications; (3) utilizing planning; and (4) evaluating the model by the Nigerian Delphi panel of experts.
Development of mobile platform for inventory and inspection applications in nuclear environments
(2015-12) Anderson, Robert Blake; Landsberger, Sheldon; Pryor, Mitchell Wayne
The efforts made towards deploying a mobile robotic system at Los Alamos National Laboratory are detailed in this thesis. The platform application is non-contact tasks related to inspection, inventory, and radiation surveying. It is intended for a Special Nuclear Material storage facility featuring a high radiation environment and a variety of storage modes. New robotic capabilities have been developed using several mobile platforms to address the requirements of this application. Many of challenges are common to any warehouse application, such as autonomous task planning, vision, navigation, and inventory data management. Others are specific to a nuclear laboratory environment, such as radiation measurement and analysis, response to radioactive contamination, criticality safety, and restrictive security measures. This thesis describes the progress made towards meeting these challenges, outstanding issues, and future work that is necessary to complete the project. Nuclear facilities are under ever-increasing demands to reduce worker radiation exposure. Since the vault is a high radiation area, it is one of the first targets at Los Alamos for the application of novel solutions. The deployment of this system promises to enhance worker safety by reducing their presence inside the vault and therefore total occupational dose. As robotic systems become more trusted in the nuclear weapons complex, it also has the potential to reduce total operator labor by performing time-consuming tasks autonomously.
Impact of the LZW-based common subexpression elimination algorithm on SAT-solving efficiency
(2012-05) Jn Charles, Jeriah; Zhang, Yuanlin; Gelfond, Michael; Watson, Richard
The Satisfiability (SAT) problem is the problem of finding an assignment that satisfies a given propositional formula. SAT is effective in solving many important problems in areas such as automated reasoning, computer-aided design, and planning in Artificial Intelligence. The need to solve these problems in a reduced amount of time has geared considerable research in improving the performance of SAT solvers resulting in many solver algorithms being created or modified. This research investigates how the removal of common subexpressions in a formula via the Lempel–Ziv-Welch (LZW)-based approach can affect the efficiency of SAT solving. By substituting common subexpressions for new variables in the original formula, we compare the results of passing the original formula and the new equivalent formula through a SAT solver. In this LZW-based approach, we modify the Lempel–Ziv-Welch data compression algorithm to find and substitute the common subexpressions in the formula.
Inducing grammars from linguistic universals and realistic amounts of supervision
(2015-05) Garrette, Daniel Hunter; Baldridge, Jason; Mooney, Raymond J. (Raymond Joseph); Ravikumar, Pradeep; Scott, James G; Smith, Noah A
The best performing NLP models to date are learned from large volumes of manually-annotated data. For tasks like part-of-speech tagging and grammatical parsing, high performance can be achieved with plentiful supervised data. However, such resources are extremely costly to produce, making them an unlikely option for building NLP tools in under-resourced languages or domains. This dissertation is concerned with reducing the annotation required to learn NLP models, with the goal of opening up the range of domains and languages to which NLP technologies may be applied. In this work, we explore the possibility of learning from a degree of supervision that is at or close to the amount that could reasonably be collected from annotators for a particular domain or language that currently has none. We show that just a small amount of annotation input — even that which can be collected in just a few hours — can provide enormous advantages if we have learning algorithms that can appropriately exploit it. This work presents new algorithms, models, and approaches designed to learn grammatical information from weak supervision. In particular, we look at ways of intersecting a variety of different forms of supervision in complementary ways, thus lowering the overall annotation burden. Sources of information include tag dictionaries, morphological analyzers, constituent bracketings, and partial tree annotations, as well as unannotated corpora. For example, we present algorithms that are able to combine faster-to-obtain type-level annotation with unannotated text to remove the need for slower-to-obtain token-level annotation. Much of this dissertation describes work on Combinatory Categorial Grammar (CCG), a grammatical formalism notable for its use of structured, logic-backed categories that describe how each word and constituent fits into the overall syntax of the sentence. This work shows how linguistic universals intrinsic to the CCG formalism itself can be encoded as Bayesian priors to improve learning.
A new approach to detecting failures in distributed systems
(2015-08) Leners, Joshua Blaise; Alvisi, Lorenzo; Aguilera, Marcos K; Shmatikov, Vitaly; Walfish, Michael; Witchel, Emmett
Fault-tolerant distributed systems often handle failures in two steps: first, detect the failure and, second, take some recovery action. A common approach to detecting failures is end-to-end timeouts, but using timeouts brings problems. First, timeouts are inaccurate: just because a process is unresponsive does not mean that process has failed. Second, choosing a timeout is hard: short timeouts can exacerbate the problem of inaccuracy, and long timeouts can make the system wait unnecessarily. In fact, a good timeout value—one that balances the choice between accuracy and speed—may not even exist, owing to the variance in a system’s end-to-end delays. ƃis dissertation posits a new approach to detecting failures in distributed systems: use information about failures that is local to each component, e.g., the contents of an OS’s process table. We call such information inside information, and use it as the basis in the design and implementation of three failure reporting services for data center applications, which we call Falcon, Albatross, and Pigeon. Falcon deploys a network of software modules to gather inside information in the system, and it guarantees that it never reports a working process as crashed by sometimes terminating unresponsive components. ƃis choice helps applications by making reports of failure reliable, meaning that applications can treat them as ground truth. Unfortunately, Falcon cannot handle network failures because guaranteeing that a process has crashed requires network communication; we address this problem in Albatross and Pigeon. Instead of killing, Albatross blocks suspected processes from using the network, allowing applications to make progress during network partitions. Pigeon renounces interference altogether, and reports inside information to applications directly and with more detail to help applications make better recovery decisions. By using these services, applications can improve their recovery from failures both quantitatively and qualitatively. Quantitatively, these services reduce detection time by one to two orders of magnitude over the end-to-end timeouts commonly used by data center applications, thereby reducing the unavailability caused by failures. Qualitatively, these services provide more specific information about failures, which can reduce the logic required for recovery and can help applications better decide when recovery is not necessary.
Ontology as a means for systematic biology
(2011-05) Tirmizi, Syed Hamid Ali; Miranker, Daniel P.; Batory, Don; Grauman, Kristen; Gutell, Robin; Porter, Bruce
Biologists use ontologies as a method to organize and publish their acquired knowledge. Computer scientists have shown the value of ontologies as a means for knowledge discovery. This dissertation makes a number of contributions to enable systematic biologists to better leverage their ontologies in their research. Systematic biology, or phylogenetics, is the study of evolution. “Assembling a Tree of Life” (AToL) is an NSF grand challenge to describe all life on Earth and estimate its evolutionary history. AToL projects commonly include a study a taxon (organism) to create an ontology to capture its anatomy. Such anatomy ontologies are manually curated based on the data from morphology-based phylogenetic studies. Annotated digital imagery, morphological characters and phylogenetic (evolutionary) trees are the key components of morphological studies. Given the scale of AToL, building an anatomy ontology for each taxon manually is infeasible. The primary contribution of this dissertation is automatic inference and concomitant formalization required to compute anatomy ontologies. New anatomy ontologies are formed by applying transformations on an existing anatomy ontology for a model organism. The conditions for the transformations are derived from observational data recorded as morphological characters. We automatically created the Cypriniformes Gill and Hyoid Arches Ontology using the morphological character data provided by the Cypriniformes Tree of Life (CTOL) project. The method is based on representing all components of a phylogenetic study as an ontology using a domain meta-model. For this purpose we developed Morphster, a domain-specific knowledge acquisition tool for biologists. Digital images often serve as proxies for natural specimens and are the basis of many observations. A key problem for Morphster is the treatment of images in conjunction with ontologies. We contributed a formal system for integrating images with ontologies where images either capture observations of nature or scientific hypotheses. Our framework for image-ontology integration provides opportunities for building workflows that allow biologists to synthesize and align ontologies. Biologists building ontologies often had to choose between two ontology systems: Open Biomedical Ontologies (OBO) or the Semantic Web. It was critical to bridge the gap between the two systems to leverage biological ontologies for inference. We created a methodology and a lossless round-trip mapping for OBO ontologies to the Semantic Web. Using the Semantic Web as a guide to organize OBO, we developed a mapping system which is now a community standard.
Parallelization Methods for the Distribution of High-Throughput Bioinformatics Algorithms
(2011-05) Rees, Eric; Youn, Eunseog; Dowd, Scot E.; San Francisco, Michael
The development of high-throughput bioinformatics technologies has caused a massive influx of biological data over the course of the past decade. During this same span of time, computational hardware has also been rapidly increasing in speed while decreasing in price, multi-core processors have become standard in home and office environments, and distributed and cloud based computing has become affordable and readily available to researchers with implementations such as Amazon’s S3, Microsoft’s Azure, Google’s App Engine, and the 3Tera Cloud. Bioinformatics software tools such as BLAST, a tool for finding local alignments between a set of unknown genetic sequences versus a set of known genetic sequences, have simple interfaces and few installation requirements often so biologists can use them easily in the laboratory without needing an in-depth knowledge of how computer systems work. This, however, is rarely the case for distributed implementations of bioinformatics tools which often require the user to first set up and configure the underlying program that will handle the distribution, such as the Message Passing Interface (MPI). Once the underlying distribution algorithm is chosen, many of the software tools require the user to then configure the program to work with their chosen method and, in some cases, write the necessary source code to link the program with the underlying service. These are difficult steps for most computer scientists and are near impossible for the average biologist. By constructing a modularized set of methods that can connect to, broadcast to, and read from a multicast created by the methods, future bioinformatics software developers will be able to construct the underlying message passing system without requiring the end-user, often a biologist, to set up and configure one of their own. Using these multicast methods will allow any program the ability to seek out and track any nodes on the network that will be used in the distributed system. This communication method allows the program to easily scale up and down depending on available nodes without direct user intervention to alter the size of the system. This system is then tested by creating a program that connects NCBI’s Basic Local Alignment Search Tool to the multicast system to allow the BLAST algorithm to then be distributed across multiple nodes. This new system will demonstrate how future programs could then connect stand alone tools, such as BLAST, to the multicast system to create programs that will execute on a distributed system and automatically scale depending on the network size without altering the tools source code.
Randomness extractors for independent sources and applications
(2007-05) Rao, Anup, 1980-; Zuckerman, David I.
The use of randomized algorithms and protocols is ubiquitous in computer science. Randomized solutions are typically faster and simpler than deterministic ones for the same problem. In addition, many computational problems (for example in cryptography and distributed computing) are impossible to solve without access to randomness. In computer science, access to randomness is usually modeled as access to a string of uncorrelated uniformly random bits. Although it is widely believed that many physical phenomena are inherently unpredictable, there is a gap between the computer science model of randomness and what is actually available. It is not clear where one could find such a source of uniformly distributed bits. In practice, computers generate random bits in ad-hoc ways, with no guarantees on the quality of their distribution. One aim of this thesis is to close this gap and identify the weakest assumption on the source of randomness that would still permit the use of randomized algorithms and protocols. This is achieved by building randomness extractors ... Such an algorithm would allow us to use a compromised source of randomness to obtain truly random bits, which we could then use in our original application. Randomness extractors are interesting in their own right as combinatorial objects that look random in strong ways. They fall into the class of objects whose existence is easy to check using the probabilistic method (i.e., almost all functions are good randomness extractors), yet finding explicit examples of a single such object is non-trivial. Expander graphs, error correcting codes, hard functions, epsilon biased sets and Ramsey graphs are just a few examples of other such objects. Finding explicit examples of extractors is part of the bigger project in the area of derandomization of constructing such objects which can be used to reduce the dependence of computer science solutions on randomness. These objects are often used as basic building blocks to solve problems in computer science. The main results of this thesis are: Extractors for Independent Sources: The central model that we study is the model of independent sources. Here the only assumption we make (beyond the necessary one that the source of randomness has some entropy/unpredictability), is that the source can be broken up into two or more independent parts. We show how to deterministically extract true randomness from such sources as long as a constant (as small as 3) number of sources is available with a small amount of entropy. Extractors for Small Space Sources: In this model we assume that the source is generated by a computationally bounded processes -- a bounded width branching program or an algorithm that uses small memory. This seems like a plausible model for sources of randomness produced by a defective physical device. We build on our work on extractors for independent sources to obtain extractors for such sources. Extractors for Low Weight Affine Sources: In this model, we assume that the source gives a random point from some unknown low dimensional affine subspace with a low-weight basis. This model generalizes the well studied model of bit-fixing sources. We give new extractors for this model that have exponentially small error, a parameter that is important for an application in cryptography. The techniques that go into solving this problem are inspired by the techniques that give our extractors for independent sources. Ramsey Graphs: A Ramsey graph is a graph that has no large clique or independent set. We show how to use our extractors and many other ideas to construct new explicit Ramsey graphs that avoid cliques and independent sets of the smallest size to date. Distributed Computing with Weak Randomness: Finally, we give an application of extractors for independent sources to distributed computing. We give new protocols for Byzantine Agreement and Leader Election that work when the players involved only have access to defective sources of randomness, even in the presence of completely adversarial behavior at many players and limited adversarial behavior at every player. In fact, we show how to simulate any distributed computing protocol that assumes that each player has access to private truly random bits, with the aid of defective sources of randomness.
Supervision for syntactic parsing of low-resource languages
(2016-05) Mielens, Jason David; Baldridge, Jason; Erk, Katrin; Mooney, Ray; Dyer, Chris; Beavers, John
Developing tools for doing computational linguistics work in low-resource scenarios often requires creating resources from scratch, especially when considering highly specialized domains or languages with few existing tools or research. Due to practical considerations in project costs and sizes, the resources created in these circumstances are often different from large-scale resources in both quantity and quality, and working with these resources poses a distinctly different set of challenges than working with larger, more established resources. There are different approaches to handling these challenges, including many variations aimed at reducing or eliminating the annotations needed to train models for various tasks. This work considers the task of low-resource syntactic parsing, and looks at the relative benefits of different methods of supervision. I will argue here that the benefits of doing some amount of supervision almost always outweigh the costs associated with doing that annotation; unsupervised or minimally supervised methods are often surpassed with surprisingly small amounts of supervision. This work is primarily concerned with identifying and classifying sources of supervision that are both useful and practical in low-resource scenarios, along with analyzing the performance of systems that make use of these different supervision sources and the behaviors of the minimally trained annotators that provide them. Additionally, I demonstrate several cases where linguistic theory and computational performance are directly connected. Maintaining a focus on the linguistic side of computational linguistics can provide many benefits, especially when working with languages where the correct analysis for various phenomena may still be very much unsettled.

Browsing by Subject "Computer science"

Results Per Page

Sort Options