String Analysis is a static program analysis technique to infer strings arising at run time. The analysis is useful for statically detecting Web-application security problems, such as cross-site scripting (XSS) vulnerabilities and Structured Query Language (SQL) injection attacks. In fact, without executing the program, the analysis can detect illegal format of strings input by a malicious users. The analysis is also useful for automatic determination and validation of access-control policies, since the security-sensitive resources (such as files and network connections) that a program accesses are represented as strings.
We have designed and implemented a String Analysis Framework for Java, PHP, and JavaScript using the T.J. Watson Libraries for Analysis (WALA) engine. Our implementation has the following features:- Analysis based on formal language theories. String values are approximated by either a Context-Free Grammar (CFG) or an automaton. Those string representations are then transformed by functions on CFGs or automata. Such functions are called transducers and correspond to built-in string operations.
- Path-sensitive analysis. In a lot of cases, user input strings are validated by string-comparison and regular-expression pattern matching. To treat this kind of validation code precisely, our implementation can interpret the conditions in addition to string operations.
- Labeling. Our implementation can be extended with a novel labeling feature that annotates each character with a label. Such a label can represent the code location where a string value is created and manipulated. Thus, with this feature, we can verify a program more precisely by distinguishing user input strings from constant string values in the program.
Currently, we are exploring the following enhancements:- Modular string analysis
- Backward inference
Contributors
Takaaki Tateishi, Marco Pistoia and Yinnon Haviv.
Publications
Patents
- Marco Pistoia and Takaaki Tateishi. System, Method and Apparatus for Statically Mapping Possible String Values to Their Corresponding Definitions and Operating Program Points. Filed in the United States Patent and Trademark Office, April 2009.
- Takaaki Tateishi, Naoshi Tabuchi, Kohichi Ono and Mika Koganeyama. Systems, Methods and Computer Program Products for String Analysis with Security Labels for Vulnerability Detection. Filed in United States Patent and Trademark Office, December 2008.
- Julian Dolby, Emmanuel Geay, Marco Pistoia, Barbara Ryder, and Takaaki Tateishi. System, Method, and Apparatus for Modular, String-Sensitive, Access Rights Analysis with Demand-Driven Precision. Filed in the United States Patent and Trademark Office, August 2008.