Scaling and certifying symbolic execution
Symbolic execution is a powerful, systematic program analysis approach that has received much visibility in the last decade. The key idea in symbolic execution is to explore all execution paths up to a bound on the path length, build path conditions that represent constraints on inputs that execute the corresponding paths, and solve the constraints using off-the-shelf constraint solvers to determine path feasibility (where possible). While systematic path exploration enables symbolic execution to find subtle bugs, scaling the approach remains a key challenge. Our thesis is that novel compositional, certifying, and distribution techniques can enhance the efficacy of symbolic execution. This dissertation designs, develops, and evaluates three techniques based on the primitives of composition, certification, and distribution in program analysis to enhance symbolic execution. Our composition technique CompoSE allows the overall symbolic execution results to be computed by composing intermediate results with respect to individual methods, rather than treating the entire program monolithically as is done traditionally. CompoSE first summarizes each method as a memoization tree that represents the key elements of symbolic execution of that method, and then uses these trees to efficiently replay the symbolic execution of the corresponding methods with respect to their calling contexts. The key novelty of CompoSE is that it allows composition in the presence of complex operations on the program heap. Our certification technique CertifiedSE allows symbolic execution analysis to be performed by one party, the producer, and utilized by another party, the consumer. The producer creates a certificate that can be checked efficiently by the consumer to validate the correctness of symbolic execution results. The key novelty of CertifiedSE is that it introduces the idea of certification in the context of symbolic execution, which enables a number of ways to enhance how symbolic execution is performed and used. Our distribution technique SynergiSE enhances symbolic execution in a novel two-fold integration approach. One, it integrates distributed analysis and constraint re-use to enhance symbolic execution using feasible ranges, which allows sharing of constraint solving results among different workers without communicating or sharing potentially large constraint databases (as required traditionally). Two, it integrates complementary techniques for test input generation, e.g., search-based generation and symbolic execution, for creating higher quality tests using unexplored ranges, which allows symbolic execution to re-use tests created by another technique for effective distribution of exploration of previously unexplored paths. The key novelty of Synergise is that it significantly reduces the amount of communication among different symbolic execution workers and enables an effective integration of heuristics-based and systematic approaches for test generation. We embody our techniques into prototypes based on the Symbolic PathFinder tool for analyzing Java programs. Experimental results using a variety of subjects show that our techniques can significantly improve symbolic execution.