Browsing by Subject "Electronic data processing--Distributed processing"

Now showing 1 - 7 of 7

Algorithms for distributed caching and aggregation
(2007-12) Tiwari, Mitul; Plaxton, Greg
In recent years, there has been an explosion in the amount of distributed data due to the ever decreasing cost of both storage and bandwidth. There is a growing need for automatic distributed data management techniques. The three main areas in dealing with distributed data that we address in this dissertation are (1) cooperative caching, (2) compression caching, and (3) aggregation. First, we address cooperative caching, in which caches cooperate to locate and cache data objects. The benefits of cooperative caching have been demonstrated by various studies. We address a hierarchical generalization of cooperative caching in which caches are arranged as leaf nodes in a hierarchical tree network, and we call this variant Hierarchical Cooperative Caching. We present a deterministic hierarchical generalization of LRU that is constantcompetitive when the capacity blowup is linear in d, the depth of the cache hierarchy. Furthermore, we show that any randomized hierarchical cooperative caching algorithm with capacity blowup b has competitive ratio Ω(log d b ) against an oblivious adversary. Thus we establish that there is no resource competitive algorithm for the hierarchical cooperative caching problem. Second, we address a class of compression caching problems in which a file can be cached in multiple formats with varying sizes and encode/decode costs. In this work, we address three problems in this class of compression caching. The first problem assumes that the encode cost and decode cost associated with any format of a file are equal. For this problem we present a resource competitive online algorithm. To explore the existence of resource competitive online algorithms for compression caching with arbitrary encode costs and decode costs, we address two other natural problems in the aforementioned class, and for each of these problems, we show that there exists a non-constant lower bound on the competitive ratio of any online algorithm, even if the algorithm is given an arbitrary factor capacity blowup. Thus, we establish that there is no resource competitive algorithm for compression caching in its full generality. Third, we address the problem of aggregation over trees with the goal of adapting aggregation aggressiveness. Consider a distributed network with nodes arranged in a tree, and each node having a local value. We consider the problem of aggregating values (e.g., summing values) from all nodes to the requesting nodes in the presence of writes. The goal is to minimize the total number of messages exchanged. The key challenges are to define a noix tion of “acceptable” aggregate values, and to design algorithms with good performance that are guaranteed to produce such values. We formalize the acceptability of aggregate values in terms of certain consistency guarantees. We propose a lease-based aggregation mechanism, and evaluate algorithms based on this mechanism in terms of consistency and performance. With regard to consistency, we adapt the definitions of strict and causal consistency to apply to the aggregation problem. We show that any lease-based aggregation algorithm provides strict consistency in sequential executions, and causal consistency in concurrent executions. With regard to performance, we propose an online lease-based aggregation algorithm, and show that, for sequential executions, the algorithm is constant competitive against any offline algorithm that provides strict consistency. Our online lease-based aggregation algorithm is presented in the form of a fully distributed protocol, and the aforementioned consistency and performance results are formally established with respect to this protocol. We also present experimental results to show that the algorithm performs well under various workloads.
Essays on market-based information systems design and e-supply chain
(2005-12) Guo, Zhiling, 1974-; Whinston, Andrew B.
Mechanisms and algorithms for large-scale replication systems
(2004) Venkataramani, Arunkumar; Dahlin, Michael
SAR: semantic-aware replication
(2005) Gao, Lei; Dahlin, Mike
This dissertation presents a replication framework that facilitates semantic-aware data replication (SAR) in wide area networks (WANs). WAN data replication is fundamentally difficult. As a result, generic replication algorithms must make compromises among Consistency, Availability, Response time, and Partition resilience (CARP) when used in WANs. This dissertation seeks to design algorithms based on specific semantics of the shared data sets (e.g. data properties, workload characteristics, and update patterns) to achieve the optimized CARP trade-offs. Integrating a set of semantic-aware algorithms using distributed objects to form the SAR framework, we implement a practically important e-commerce application, the distributed TPC-W benchmark. Our prototype evaluations show significant improvements on system availability and response time while preserving the consistency guarantees desired by the TPC-W benchmark. The primary focus of the dissertation is on the development of the SAR framework. Within the framework, contributions include (a) exploiting application semantics using the object-oriented approach, (b) employing a hybrid method that integrates a number of novel replication algorithms to make an important class of applications work, (c) proposing a novel replication algorithm for the multi-writer/multi-reader replication scenario with a high access locality, and (d) outlining a general purpose replication library that uses semantic-aware objects for building other distributed applications in WANs.
A scalable information management middleware for large distributed systems
(2005) Yalagandula, Praveen; Dahlin, Michael
Techniques for analyzing distributed computations
(2002) Mittal, Neeraj; Garg, Vijay K. (Vijay Kumar), 1963-
Transparent replication
(2006) Nayate, Amol Pramod; Dahlin, Mike
Increasing user expectations and demands have caused the evolution of web services away from single-server systems and toward distributed systems for their ability to provide improved throughput, improved availability and reduced response times. However, for a service to run on a distributed system, each running instance must be able to access data that are shared among the instances. Although existing off-the-shelf replication systems - e.g. distributed file systems [52, 61, 32, 38, 41], replicated databases [64, 75], distributed hash tables [58, 59, 63, 34], etc. - simplify access to shared data by exporting wellresearched interfaces, their implementations are typically not engineered for the unique environments presented by many web services. For example, replication systems that require synchronization across multiple nodes to handle modified data [38, 12] or systems that require all nodes to keep a copy of all data [64, 75] may not be practical for use in such services. Although the problem of general replication is not possible to solve [11, 62, 33] we focus our study on a class of single-writer services that we denote Information Dissemination Services that form a restrictive but important set of web services. Our research makes two key contributions. First, we show that for a class of single-writer services that we denote Information Dissemation Services TRIP replicates dynamic data in a manner that is nearly transparent to the service. We (1) develop a novel dual-channel replication algorithm for TRIP that utilizes spare network background traffic to speculatively replicate data in a safe, non-interfering fashion, (2) show how to integrate safe speculative replication with mechanisms that use invalidates to provide consistency, and (3) demonstrate how our combination of consistency and safe speculative replication allows us to provide near-ideal consistency, performance, and availability for Information Dissemination Services. Second, we show that the core principles behind building TRIP can be extended to build a new replication framework and more general replication toolkit. In particular, we show that it is possible to extend our dual-queue mechanisms developed for TRIP to a multi-writer environment where nodes can synchronize multiple incoming streams of data and consistency information. Our extension allows providing various forms of consistency for arbitrary topologies - two key properties provided by the PRACTI [6] (Partial Replication, Arbitrary Consistency, Topology Independence) architecture.

Browsing by Subject "Electronic data processing--Distributed processing"

Results Per Page

Sort Options