Program Analysis and Transformation


Projects

Our group comprises several projects in the area of Program Analysis and Transformation, which are summarized below:

- Analysis framework: WALA
- Bug finding and test case generation: Snugglebug, Miniatur
- Software engineering: Change Impact Analysis, Safe-Commit Analysis
- Concurrency control and race detection tools: Atomic Sets
- Language constructs for object equality: Relation Types

WALA

The T. J. Watson Libraries for Analysis (WALA) provide static and dynamic analysis capabilities for Java bytecode and related languages. WALA has been developed over several years at the IBM T.J. Watson Research Center. It has been used both in several research projects and in commercial products. In 2006, IBM donated WALA to the community under the open source Eclipse Public License. WALA features include: a bytecode analysis library (Shrike), pointer analyses and call graph construction, a context-sensitive slicing framework, interprocedural dataflow analysis, a source code analysis framework, currently supporting Java and JavaScript, a dynamic load-time instrumentation library (Dila) and support for analyzing Eclipse projects. We welcome new users and contributors. For more information, see WALA.

Contact: Stephen Fink  

Snugglebug

Snugglebug is a new project at IBM Research applying program analysis to improving software quality for large Java applications. We are building a tool that infers partial specifications and involves the user in a specification feedback loop, in order to acquire richer specifications to drive bug finding and test case generation. The tool relies on symbolic underapproximate analysis to derive candidate specifications, find bugs, and generate JUnit tests as concrete witnesses.

Contact: Stephen Fink  

Miniatur

We have developed an approach for checking code against rich specifications, based on existing work that consists of encoding the program in a relational logic and using a constraint solver to find specification violations. We improve the efficiency of this approach with a new encoding of the program that effectively slices it at the logical level with respect to the specification. We have also developed new encodings for integer values and arrays, enabling the verification of realistic fragments of code that manipulate both. Our technique can handle integers of much larger ranges than previously possible, and permits large sparse arrays to be handled efficiently. We present a soundness proof for our slicing algorithm and a general condition under which relational formulae may be sliced. We implemented our technique and evaluated it by checking data structure invariants of several classes taken from the Java Collections Framework. We also checked for violations of Java's equality contract in a variety of open-source programs, and found several bugs. For more information, see our FSE'07 paper.

Contact: Mandana Vaziri

Change Impact Analysis

We are developing techniques and tools that help programmers with understanding the impact of changes, and helping them quickly track down the reason for test failures. We developed a tool, called Chianti, that determines: the (unit or regression) tests affected by a set of changes, and for each affected test, the subset of changes that affected it. For more details, please see our PASTE'01 and OOPSLA'04 papers. This is a joint project with Barbara Ryder, Xiaoxia Ren, and Ophelia Chesley at Rutgers University, and with Maximilian Stoerzer at the Technical University of Passau. The work is supported by NSF grant CCR-0204410.

Contact: Frank Tip

Safe-Commit Analysis to Facilitate Team Software Development

Software development teams exchange source code in shared repositories. These repositories are kept consistent by having developers follow a commit policy, such as "Program edits can be committed only if all available tests succeed.'' Such policies may result in long intervals between commits, increasing the likelihood of duplicative development and merge conflicts. Furthermore, commit policies are generally not automatically enforceable.

We have developed an analysis-based algorithm to identify committable changes that can be released early, without causing failures of existing tests, even in the presence of failing tests in a developer's local workspace! The algorithm can support relaxed commit policies that allow early release of changes, reducing the potential for merge conflicts. In experiments using several versions of Daikon with failing tests, 3 newly enabled commit policies were shown to allow a significant percentage of changes to be committed.

Note: this is a collaborative project with Jan Wloka and Xiaoxia Ren (Rutgers University) and Barbara Ryder (Virginia Tech). A paper about this work will be presented at ICSE'09.

Contact: Frank Tip

Atomic Sets

Concurrency-related bugs may happen when multiple threads access shared data and interleave in ways that do not correspond to any sequential execution. Their absence is not guaranteed by the traditional notion of ``data race'' freedom. We have developed a new definition of data races in terms of 11 problematic interleaving scenarios, and proved that it is complete by showing that any execution not exhibiting these scenarios is serializable for a chosen set of locations. Our definition subsumes the traditional definition of a data race as well as high-level data races such as stale-value errors and inconsistent views. We have also developed a language feature called atomic sets of locations, which lets programmers specify the existence of consistency properties between fields in objects, without specifying the properties themselves. We use static analysis to automatically infer those points in the code where synchronization is needed to avoid data races under our new definition. An important benefit of this approach is that, in general, far fewer annotations are required than is the case with existing approaches such as synchronized blocks or atomic sections. For more details, see our POPL'06 paper.

We have developed a dynamic tool for detecting data races under our new definition (ICSE'08), and a static tool based on model checking (VMCAI'09).

Contact: Mandana Vaziri

Relation Types

Object-oriented languages define the identity of an object to be an address-based object identifier. The programmer may customize the notion of object identity by overriding the equals() and hashCode() methods following a specified contract. This customization often introduces latent errors, since the contract is unenforced and at times impossible to satisfy, and its implementation requires tedious and error-prone boilerplate code. Relation types are a programming model in which object identity is defined declaratively, obviating the need for equals() and hashCode() methods. This entails a stricter contract: identity never changes during an execution. We formalize the model as an adaptation of Featherweight Java, and implement it by extending Java with relation types. Experiments on a set of Java programs show that the majority of classes that override equals() can be refactored into relation types, and that most of the remainder are buggy or fragile. For more information see our ECOOP'07 paper.

Contact: Mandana Vaziri



Last updated 16 Dec 2008

Research labs involved