Evaluating the Impact of Program Characteristics on Static Analysis Design Tradeoffs: A Java Numeric Analysis Case Study
Abstract
The behavior of static analysis is affected by the characteristics of its target program. Understanding this relationship is essential for developers and users to improve and adopt static analysis tools. However, this task is challenging due to the large spaces and complex interactions of program characteristics and analysis configurations. For example, how method invocations are implemented in a target program may significantly affect the precision and performance of a context-sensitive analysis. In this thesis, we present a systematic approach to study the impact of program characteristics on static analysis design choices. We instantiated the approach to study an important Java numeric analysis built with multiple design choices. First, we performed a manual investigation to collect 57 relevant method-level characteristics and made six observations on important data points, following a rigorously designed procedure. Second, we developed a new statistical analysis that selected 5 independent analysis options, 14 independent program characteristics and 50 interactions between program characteristics and analysis options. The selection was based on a statistical model with a minimized maximum predicting error considering the whole data points. Using a combination of this statistical model and data visualization, we drew six insights over 350,000 data points (i.e., analysis performance for methods). The key observations include that the performances of different heap abstractions are affected by several program characteristics and that a context-sensitive analysis using allocation-site abstraction is slower to analyze methods with moderate numbers (2-5) of callers, compared to other design choices.