Static Specification Mining and Verification


Ensuring that a program uses an Application Programming Interface (API) correctly may be a challenging task. It requires having a specification of that API and verifying that the program uses that API according to that specification. Unfortunately, specifications are not always available or, if they are, they may be embedded into the documentation of the API among other information and not easily accessible. Assuming that a specification is indeed available, it becomes then necessary to verify that the code of the program does not violate that specification. This also is a difficult task because programs can be arbitrarily large and complex, and a specification violation may be the result of subtle interactions of different parts of the program code.

As part of the SAFE project, we have designed, developed and published two static analysis tools:

  1. Effective Typestate Verifier (ETV). ETV is a novel framework for verification of typestate properties. It includes several new techniques to precisely treat aliases without undue performance costs. In particular, it includes a flow-sensitive, context-sensitive, integrated verifier that utilizes a parametric abstract domain combining typestate and aliasing information. To scale to real programs without compromising precision, ETV employs a staged verification system in which faster verifiers run as early stages which reduce the workload for later, more precise, stages. ETV was publised in the proceedings of the ACM SIGSOFT 2006 International Symposium on Software Testing and Analysis (ISSTA 2006) (where it won the ACM SIGSOFT Distinguished Paper Award), and in the 2008 issue of the ACM Transactions on Software Engineering and Methodology (TOSEM) journal.
  2. Static Specification Miner (SSM). SSM is a novel approach to client-side mining of temporal API specifications based on static analysis. SSM consists of an interprocedural analysis over a combined domain that abstracts both aliasing and event sequences for individual objects. The analysis uses a new family of automata-based abstractions to represent unbounded event sequences, designed to disambiguate distinct usage patterns and merge similar usage patterns. Additionally, SSM includes an algorithm that summarizes abstract traces based on automata clusters, and effectively rules out spurious behaviors. SSM was published in the proceedings of the ACM SIGSOFT 2007 International Symposium on Software Testing and Analysis (ISSTA 2007) (where it won the ACM SIGSOFT Distinguished Paper Award), and in the 2008 issue of the IEEE Transaction on Software Engineering (TSE) journal.

Although this work is more general than just security, it has a strong applicability to security APIs, as we showed in our papers. The reason for this is that using security (and, in particular, cryptography) APIs requires following well-defined specifications. For example, after instantiating a Java java.security.Signature object, a program can use that object to either sign or verify a document. The specification of the Signature API establishes veru precise usage patterns. Specifically, if the Signature object is to be used for signing, then its sign method be invoked, but that can only be done after invoking initSign. Similarly, if the Signature object is to be used for verifying, then its verify method must be invoked, but that can only be done after invoking initVerify. Any sequence of operations that does not respect one of these two patterns will inevitably generate a run-time error. Unfortunately, the compiler will not detect deviations from this specification. Our work can be used to detect this type of API specifications by mining client programs that use a given API, and by then verifying whether a certain client program uses an API according to the specification that was mined.

Please visit the SAFE Web site for more details on this work.