Operating system transactions
Abstract
Applications must be able to synchronize accesses to operating system (OS) resources in order to ensure correctness in the face of concurrency and system failures. This thesis proposes system transactions, with which the programmer specifies atomic updates to heterogeneous system resources and the OS guarantees atomicity, consistency, isolation, and durability (ACID).
This thesis provides a model for system transactions as a concurrency control mechanism.
System transactions efficiently and cleanly solve long-standing
concurrency problems that are difficult to address with other
techniques.
For example, malicious users can exploit
race conditions between distinct system calls in privileged applications,
gaining administrative access to a system.
Programmers can eliminate these vulnerabilities by eliminating these
race conditions with system transactions.
Similarly, failed software installations can leave a system unusable.
System transactions can roll back an unsuccessful software installation
without disturbing concurrent, independent updates to the file system.
This thesis describes the design and implementation of TxOS,
a variant of Linux 2.6.22 that implements
system transactions. The thesis contributes new implementation
techniques that yield fast, serializable transactions
with strong isolation and fairness between system transactions and
non-transactional activity.
Using system transactions,
programmers can build
applications with better performance or stronger correctness guarantees
from simpler code. For instance, wrapping an installation of
OpenSSH in a system transaction guarantees that a failed installation
will be rolled back completely. These atomicity properties are
provided by the OS, requiring no modification to the installer itself
and adding only 10% performance overhead. The prototype implementation of system transactions also
minimizes non-transactional overheads. For instance, a non-transactional
compilation of Linux incurs negligible (less than 2%) overhead on TxOS.
Finally, this thesis describes a new lock-free linked list algorithm, called OLF, for optimistic, lock-free lists. OLF addresses key limitations of prior algorithms, which sacrifice functionality for performance. Prior lock-free list algorithms can safely insert or delete a single item, but cannot atomically compose multiple operations (e.g., atomically move an item between two lists). OLF provides both arbitrary composition of list operations as well as performance scalability close to previous lock-free list designs. OLF also removes previous requirements for dynamic memory allocation and garbage collection of list nodes, making it suitable for low-level system software, such as the Linux kernel. We replace lists in the Linux kernel's directory cache with OLF lists, which currently requires a coarse-grained lock to ensure invariants across multiple lists. OLF lists in the Linux kernel improve performance of a filesystem metadata microbenchmark by 3x over unmodified Linux at 8 CPUs.
The TxOS prototype demonstrates that a mature
OS running on commodity hardware can provide system transactions at a
reasonable performance cost.
As a practical OS abstraction for application developers,
system transactions facilitate
writing correct application code in the presence of
concurrency and system failures.
The OLF algorithm demonstrates that application developers can have both
the functionality of locks and the performance scalability of a lock-free linked list.