Fast error detection with coverage guarantees for concurrent software
Coons, Katherine Elizabeth
MetadataShow full item record
Concurrency errors are notoriously difficult to debug because they may occur only under unexpected thread interleavings that are difficult to identify and reproduce. These errors are increasingly important as recent hardware trends compel developers to write more concurrent software and to provide more concurrent abstractions. This thesis presents algorithms that dynamically and systematically explore a program's thread interleavings to manifest concurrency bugs quickly and reproducibly, and to provide precise incremental coverage guarantees. Dynamic concurrency testing tools should provide (1) fast response -- bugs should manifest quickly if they exist, (2) reproducibility -- bugs should be easy to reproduce and (3) coverage -- precise correctness guarantees when no bugs manifest. In practice, most tools provide either fast response or coverage, but not both. These goals conflict because a program's thread interleavings exhibit exponential state- space explosion, which inhibits fast response. Two approaches from prior work alleviate state-space explosion. (1) Partial-order reduction provides full coverage by exploring only one interleaving of independent transitions. (2) Bounded search provides bounded coverage by enumerating only interleavings that do not exceed a bound. Bounded search can additionally provide guarantees for cyclic state spaces for which dynamic partial-order reduction provides no guarantees. Without partial-order reduction, however, bounded search wastes most of its time exploring executions that reorder only independent transitions. Fast response with coverage guarantees requires both approaches, but prior work failed to combine them soundly. We combine bounded search with partial-order reduction and extensively analyze the space of dynamic, bounded partial-order reduction strategies. First, we prioritize with a best-first search and show that heuristics that combine these approaches find bugs quickly. Second, we restrict partial-order reduction to combine approaches while maintaining bounded coverage. We specialize this approach for several bound functions, prove that these algorithms guarantee bounded coverage, and leverage dynamic information to further reduce the state space. Finally, we bound the partial order on a program's transitions, rather than the total order on those transitions, to combine these approaches without sacrificing partial-order reduction. This algorithm provides fast response, incremental coverage guarantees, and reproducibility. We manifest bugs an order of magnitude more quickly than previous approaches and guarantee incremental coverage in minutes or hours rather than weeks, helping developers find and reproduce concurrency errors. This thesis makes bounded stateless model checking for concurrent programs substantially more efficient and practical.