Browsing by Subject "Byzantine fault tolerance"

Now showing 1 - 2 of 2

Highly available storage with minimal trust
(2012-05) Mahajan, Prince; Dahlin, Michael
Storage services form the core of modern Internet-based services spanning commercial, entertainment, and social-networking sectors. High availability is crucial for these services as even an hour of unavailability can cost them millions of dollars in lost revenue. Unfortunately, it is difficult to build highly available storage services that provide useful correctness properties. Both benign (system crashes, power out- ages etc.) and Byzantine faults (memory or disk corruption, software or configuration errors etc.) plague the availability of these services. Furthermore, the goal of high availability conflicts with our desire to provide good performance and strong correctness guarantees. For example, the Consistency, Availability, and Partition- resilience (CAP) theorem states that a storage service that must be available despite network partitions cannot enforce strong consistency. Similarly, the tradeoff between latency and durability dictates that a low-latency service cannot ensure durability in the presence of data-center wide failures. This dissertation explores the theoretical and practical limits of storage services that can be safe and live despite the presence of benign and Byzantine faults. On the practical front, we use cloud storage as a deployment model to build Depot, a highly available storage service that addresses the above challenges. Depot minimizes the trust clients have to put in the third party storage provider. As a result, Depot clients can continue functioning despite benign or Byzantine faults of the cloud servers. Yet, Depot provides stronger availability, durability, and consistency properties than those provided by many of the existing cloud deployments, without incurring prohibitive performance cost. For example, in contrast to Amazon S3’s eventual consistency, Depot provides a variation of causal consistency on each volume, while tolerating Byzantine faults. On the theoretical front, we explore the consistency-availability tradeoffs. Tradeoffs between consistency and availability have proved useful for designers in deciding how much to strengthen consistency if high availability is desired or how much to compromise availability if strong consistency is essential. We explore the limits of such tradeoffs by attempting to answer the question: What are the semantics that can be implemented without compromising availability? In this work, we investigate this question for both fail-stop and Byzantine failure models. An immediate benefit of answering this question is that we can compare and contrast the consistency provided by Depot with that achievable by an optimal implementation. More crucially, this result complements the CAP theorem. While, the CAP theorem defines a set of properties that cannot be achieved, this work identifies the limits of properties that can be achieved.
UpRight fault tolerance
(2010-12) Clement, Allen Grogan; Alvisi, Lorenzo; Dahlin, Mike; Druschel, Peter; Walfish, Michael; Witchell, Emmett
Experiences with computer systems indicate an inconvenient truth: computers fail and they fail in interesting ways. Although using redundancy to protect against fail-stop failures is common practice, non-fail-stop computer and network failures occur for a variety of reasons including power outage, disk or memory corruption, NIC malfunction, user error, operating system and application bugs or misconfiguration, and many others. The impact of these failures can be dramatic, ranging from service unavailability to stranding airplane passengers on the runway to companies closing. While high-stakes embedded systems have embraced Byzantine fault tolerant techniques, general purpose computing continues to rely on techniques that are fundamentally crash tolerant. In a general purpose environment, the current best practices response to non-fail-stop failures can charitably be described as pragmatic: identify a root cause and add checksums to prevent that error from happening again in the future. Pragmatic responses have proven effective for patching holes and protecting against faults once they have occurred; unfortunately the initial damage has already been done, and it is difficult to say if the patches made to address previous faults will protect against future failures. We posit that an end-to-end solution based on Byzantine fault tolerant (BFT) state machine replication is an efficient and deployable alternative to current ad hoc approaches favored in general purpose computing. The replicated state machine approach ensures that multiple copies of the same deterministic application execute requests in the same order and provides end-to-end assurance that independent transient failures will not lead to unavailability or incorrect responses. An efficient and effective end-to-end solution covers faults that have already been observed as well as failures that have not yet occurred, and it provides structural confidence that developers won't have to track down yet another failure caused by some unpredicted memory, disk, or network behavior. While the promise of end-to-end failure protection is intriguing, significant technical and practical challenges currently prevent adoption in general purpose computing environments. On the technical side, it is important that end-to-end solutions maintain the performance characteristics of deployed systems: if end-to-end solutions dramatically increase computing requirements, dramatically reduce throughput, or dramatically increase latency during normal operation then end-to-end techniques are a non-starter. On the practical side, it is important that end-to-end approaches be both comprehensible and easy to incorporate: if the cost of end-to-end solutions is rewriting an application or trusting intricate and arcane protocols, then end-to-end solutions will not be adopted. In this thesis we show that BFT state machine replication can and be used in deployed systems. Reaching this goal requires us to address both the technical and practical challenges previously mentioned. We revisiting disparate research results from the last decade and tweak, refine, and revise the core ideas to fit together into a coherent whole. Addressing the practical concerns requires us to simplify the process of incorporating BFT techniques into legacy applications.

Browsing by Subject "Byzantine fault tolerance"

Results Per Page

Sort Options