Programming Languages Day 2008

The ninth annual Programming Languages Day will be held at the IBM Thomas J. Watson Research Center on Thursday, August 28, 2008. The day will be held in cooperation with the New Jersey and New England Programming Languages and Systems Seminars. The main goal of the event is to increase awareness of each other's work, and to encourage interaction and collaboration.

This year's Programming Languages Day features a keynote presentation and 12 regular presentations. Dr. Saman Amarasinghe, MIT, will deliver the keynote presentation this year.

You are welcome from 9AM onwards, and the keynote presentation will start at 10AM sharp. We expect the program to run until 5:15PM. The Programming Languages day will be held in room GN-F15 in the Hawthorne-1 building in Hawthorne, New York.

If you plan to attend the Programming Languages Day, please register by sending an e-mail with your name, affiliation, and contact information to rfuhrer@us.ibm.com so that we can plan for lunch and refreshments to be available.

Summary of Important Dates:
* July 23rd - talk titles/abstracts due
* July 30th - acceptance notification
* August 28th - PL Day 2008!

Program committee:
* Stephen Freund, Williams College
* Robert Fuhrer, IBM T.J. Watson Research Center
* David Walker, Princeton University

Program:

TimeTitleAuthors
9:00 - 10:00BREAKFAST
10:00 - 11:15KEYNOTE: StreamIt - A Programming Language for the Era of MulticoresSaman Amarasinghe (MIT)
11:15 - 11:35Experiences coding non-uniform parallelism using the CUDA GPGPU architectureB. Lerner (Univ. Washington), Y. Mandelbaum & T. Jim (Princeton Univ.)
11:35 - 11:55EventJavaK.R. Jayaram & P. Eugster (Purdue)
11:55 - 12:15Quotient LensesN. Foster & B. Pierce (Univ. Pennsylvania), A. Pilkiewicz (Ecole Polytechnique and INRIA)
12:15 - 1:30LUNCH
1:30 - 1:50Evidence-based AuditJ. Vaughan & L. Jia & K. Mazurak & S. Zdancewic (Univ. Penn)
1:50 - 2:10Expressive Sound Object Initialization with Masked TypesX. Qi & A. Myers (Cornell Univ.)
2:10 - 2:30Writing Chaotic Compilers with Higher-order RewritingK. Rose (IBM)
2:30 - 2:45BREAK
2:45 - 3:05Semantics-based Code SearchS. Reiss (Brown Univ.)
3:05 - 3:25No Bit Left Behind: The Limits of Heap Data CompressionJ. Sartor (UT Austin), M. Hirzel (IBM), K. McKinley (UT Austin)
3:25 - 3:45GC Assertions: Using the Garbage Collector to Check Heap PropertiesE. Aftandilian & S. Guyer (Tufts Univ.)
3:45 - 4:15BREAK
4:15 - 4:35A Constraint-based Approach to Program Analysis and Property InferenceS. Srivastava (Univ. Maryland)
4:35 - 4:55Miniatur: Finding Bugs in Code with a SAT SolverJ. Dolby & M. Vaziri & F. Tip (IBM)
4:55 - 5:15Efficient Runtime Invariant Checking: A Framework and Case StudyM. Gorbovitski & T. Rothamel & Y. A. Liu & S. D. Stoller (SUNY Stony Brook)

Abstracts:

10:00 - 11:15 KEYNOTE: StreamIt - A Programming Language for the Era of Multicores

S. Amarasinghe (MIT)

For the last four decades, a byproduct of Moore's Law has been the continuous and dramatic increase in the performance of sequential applications. Unfortunately, in the current and future generations of processors, doubling the number of transistors is not leading to any increase in sequential performance due to power and complexity issues. Thus, all major processor vendors are moving towards multicore processors. While architects have known how to build parallel processors for over a half a century, the main stumbling block for their wider acceptance has been the difficulty in programming them. In the first part of the talk I will discuss the path to multicores, address why parallel programming has been such a difficult problem to solve and speculate on our ability to crack it this time around.

One promising approach to parallel programming is the use of novel programming language techniques -- ones that reduce the burden on the programmers, while simultaneously increasing the compiler's ability to get good parallel performance. In the second part of the talk, I will introduce StreamIt: a language and compiler specifically designed to expose and exploit inherent parallelism in "streaming applications" such as audio, video, and network processing. StreamIt provides novel high-level representations to improve programmer productivity within the streaming domain. By exposing the communication patterns of the program, StreamIt allows the compiler to perform aggressive transformations and effectively utilize parallel resources. StreamIt is ideally suited for multicore architectures; recent experiments on a 16-core machine demonstrate an 11x speedup over a single core.

11:15 - 11:35 Experiences coding non-uniform parallelism using the CUDA GPGPU architecture

B. Lerner (Univ. Washington), Y. Mandelbaum & T. Jim (Princeton Univ.)

Recent years have seen a surge of interest in General Purpose Graphics Processor (GPGPU) computation as an inexpensive way to provide highly-parallel hardware for speeding up programs; Nvidia's CUDA framework is one of the cutting-edge efforts to simplify programming for these architectures. Graphics cards, even "general purpose" ones, are heavily optimized for small, uniform, and highly-independent kernels. So far, CUDA has been designed as a small extension to C/C++, to simplify the low-level boilerplate needed to get such code running on the card while retaining the low-level flexibility and performance of C. However, the language provides little higher-level support for problems that differ from the expectations above. In this talk, I will discuss some of the PL challenges I have encountered for CUDA programming in general, and some that arise particularly when attempting to implement programs with non-uniform parallelism. As a case study, I have been examining Earley parsing, a well-known algorithm for parsing any context-free grammar (including ambiguous ones), and trying to optimize its execution using CUDA. Potential parallelizations of this algorithm have been known for years, but the parallelism is data-dependent, making it a challenging fit for GPGPU-style computation. I use this example to highlight several promising areas where language support could greatly improve the programming experience, from small but effective usability enhancements to larger semantic improvements.

11:35 - 11:55 EventJava

K.R. Jayaram & P. Eugster (Purdue)

Many programming paradigms and abstractions as well as design - and architectural patterns are built on the idiom of application "events". However, programming support for event-based applications is still rather specific and rudimentary. This talk presents EventJava, an extension of Java with advanced support for event-based distributed programming. EventJava promotes strongly (1) asynchronous interaction seamlessly integrated with traditional synchronous method invocations, and unified with (2) support for multicasting of events; provides increased expressiveness by supporting (3) reactions to event collections, and especially (4) correlation predicates guarding those reactions. Last but not least, EventJava is generic in that it includes a (5) framework for customization of event semantics, to adapt event propagation and dispatching to various application types. The foundation for this customizability consists in a flexible notion of event "context". We illustrate EventJava through examples and elaborate on its syntax and semantics. We present its implementation, based on a compiler transforming specific primitives to Java, along with a default implementation of the framework classes. We also illustrate the benefits of EventJava's primitives through performance evaluation.

11:55 - 12:15 Quotient Lenses

N. Foster & B. Pierce (Univ. Pennsylvania), A. Pilkiewicz (Ecole Polytechnique and INRIA)

There are now a number of bidirectional programming languages, where every program can be read both as a forward transformation mapping one data structure to another and as a reverse transformation mapping an edited output back to a correspondingly edited input. Besides parsimony---the two related transformations are described by just one expression---such languages are attractive because they promise strong behavioral laws about how the two transformations fit together---e.g., their composition is the identity function. It has repeatedly been observed, however, that such laws are actually a bit too strong: in practice, we do not want them "on the nose," but only up to some equivalence, allowing inessential details, such as whitespace, to be modified after a round trip. Some bidirectional languages loosen their laws in this way, but only for specific, baked-in equivalences. In this work, we propose a general theory of quotient lenses--bidirectional transformations that are well behaved modulo equivalence relations controlled by the programmer. Semantically, quotient lenses are a natural refinement of lenses, which we have studied in previous work. At the level of syntax, we present a rich set of constructs for programming with canonizers and for quotienting lenses by canonizers. We track equivalences explicitly, with the type of every quotient lens specifying the equivalences it respects. We have implemented quotient lenses as a refinement of the bidirectional string processing language Boomerang. We present a number of useful primitive canonizers for strings, and give a simple extension of Boomerang's regular-expression-based type system to statically typecheck quotient lenses. The resulting language is an expressive tool for transforming real-world, ad-hoc data formats. We demonstrate the power of our notation by developing an extended example based on the UniProt genome database format and illustrate the generality of our approach by showing how uses of quotienting in other bidirectional languages can be translated into our notation.

1:30 - 1:50 Evidence-based Audit

J. Vaughan & L. Jia & K. Mazurak & S. Zdancewic (Univ. Penn)

Authorization logics provide a principled and flexible approach to specifying access control policies. One of their compelling benefits is that a proof in the logic is evidence that an access-control decision has been made in accordance with policy. Using such proofs for auditing reduces the trusted computing base and enables the ability to detect flaws in complex authorization policies. Moreover, the proof structure is itself useful, because proof normalization can yield information about the relevance of policy statements. Untrusted, but well-typed, applications that access resources through an appropriate interface must obey the access control policy and create proofs useful for audit. This paper presents AURA, an authorization logic based on a dependently-typed variant of DCC and proves the metatheoretic properties of subject-reduction and normalization. It shows the utility of proof-based auditing in a number of examples and discusses several pragmatic issues that must be addressed in this context.

1:50 - 2:10 Expressive Sound Object Initialization with Masked Types

X. Qi & A. Myers (Cornell Univ.)

This paper presents a type-based solution to the long-standing problem of object initialization. To support inheritance, conventional constructors either initialize fields with default values, as in Java, or limit how partially constructed objects may be used. The results are often surprising to programmers and lead to bugs. This paper introduces masked types, a type-state mechanism that explicitly tracks the initialization state of objects, preventing access to uninitialized fields. In the resulting language, constructors are ordinary methods that operate on uninitialized objects, and no special default value (null) is needed. Initialization of cyclic data structures is achieved with the use of conditionally masked types. Masked types are modular and compatible with data abstraction. The type system is presented in a simplified object calculus and is proved to soundly prevent access to uninitialized fields. Masked types have been implemented as an extension to Java, implemented using erasure. The paper reports on experience using this extension.

2:10 - 2:30 Writing Chaotic Compilers with Higher-order Rewriting

K. Rose (IBM)

In this talk I'll show how a simple compiler can be entirely implemented as a simple parser into higher-order terms, followed by generic normalization by higher-order rewriting rules that simultaneously type-check, analyze, optimize, and translate to generated code. I'll show how analysis in the form of inference rules is translated into rewrite rules, and how destructive interference is avoided while constructive interference is facilitated (such as eliminating dead code during type checking when appropriate). I'll demonstrate how the rewriter and parser/unparser generator from my net.sourceforge.crsx project can be used to implement such compilers as well as reason about their correctness using standard higher-order rewriting techniques. I'll comment on how the approach has been used to implement an IBM XPath/XQuery compiler.

2:45 - 3:05 Semantics-based Code Search

S. Reiss (Brown Univ.)

Our goal is to use the vast repositories of available open source code to generate specific functions or classes that meet a programmer's specifications. The key words here are specifications and generate. Current code search approaches use primarily keywords and provide the programmer with a large set of alternatives that need to be read, understood, and further evaluated. A better approach is to let the programmer specify what they are looking for as precisely as possible in the first place. To this end our framework lets the programmer specify keywords, class or method signatures, test cases, contracts, and security constraints. Retrieved open source code is unlikely to be an exact match to the programmer's specifications. Therefore our system defines an open set of program transformations that are used internally to transform the original open source code into a form that is more likely to do exactly what the programmer wants. These mappings range from the simple, for example changing the name of a method to match the name given by the programmer, to the relatively complex, for example finding a subset of the statements of a method that compute a value of the desired target type given only values of the parameter types. This approach is implemented in a prototype system for Java. The system provides a simple web-based user interface for defining the specifications. It uses various Internet and desktop code-search engines to find initial candidates. It then applies the set of transformations repeatedly to expand this set to get a large number of possible candidates. These candidates are then restricted to those matching the programmers specifications using signature checking, JML to test contracts, JUNIT to handle test cases, and a special Java security model to deal with security constraints. The system finally formats the results and presents them to the user. The prototype system has proven its utility over a variety of examples and is available at http://conifer.cs.brown.edu/s6.

3:05 - 3:25 No Bit Left Behind: The Limits of Heap Data Compression

J. Sartor (UT Austin), M. Hirzel (IBM), K. McKinley (UT Austin)

On one hand, the high cost of memory continues to drive demand for memory efficiency on embedded and general purpose computers. On the other hand, programmers are increasingly turning to managed languages like Java for their functionality, programmability, and reliability. Managed languages, however, are not known for their memory efficiency, creating a tension between productivity and performance. This talk examines the sources and types of memory inefficiencies in a set of Java benchmarks. Although prior work has proposed specific heap data compression techniques, they are typically restricted to one model of inefficiency. This talk generalizes and quantitatively compares previously proposed memory saving approaches and idealized heap compaction. It evaluates a variety of models based on strict and deep object equality, field value equality, removing bytes that are zero, and compressing fields and arrays with a limited number and range of values. The results show that substantial memory reductions are possible in the Java heap. For example, removing bytes that are zero from arrays is particularly effective, reducing the application's memory footprint by 41% on average. We are the first to combine multiple savings models on the heap, which very effectively reduces the application by up to 86%, on average 58%. These results demonstrate that future work should be able to combine a high productivity programming language with memory efficiency.

3:25 - 3:45 GC Assertions: Using the Garbage Collector to Check Heap Properties

E. Aftandilian & S. Guyer (Tufts Univ.)

This work introduces GC assertions, a system interface that programmers can use to check for errors, such as data structure invariant violations, and to diagnose performance problems, such as memory leaks. GC assertions are checked by the garbage collector, which is in a unique position to gather information and answer questions about the lifetime and connectivity of objects in the heap. We introduce several kinds of GC assertions, and we describe how they are implemented in the collector. We also describe our reporting mechanism, which provides a complete path through the heap to the offending objects. We show results for one type of assertion that allows the programmer to indicate that an object should be reclaimed at the next GC. We find that using this assertion we can quickly identify a memory leak and its cause with negligible overhead.

4:15 - 4:35 A Constraint-based Approach to Program Analysis and Property Inference

S. Srivastava (Univ. Maryland)

Traditionally, verification of program properties has been encoded as the task of discovering inductive program invariants. Unquantified invariants are useful for verifying properties of programs that do not manipulate the heap. On the other hand, quantified invariants allow reasoning about programs that manipulate unbounded data-structures such as lists and arrays. Some techniques exist for inferring unquantified invariants over restricted domains while no satisfactory techniques exist for quantified invariants. We present a constraint-based approach to program analysis that reduces the task of verification as well as inference of a wide variety of program properties to that of solving a set of SAT/SMT constraints. The user specifies the form of program properties or invariants by means of templates over some unknowns. We then generate constraints over the unknowns that appear in the user-provided templates, and solve for the unknowns in these constraints using off-the-shelf SAT/SMT solvers. This allows us to leverage the state-of-the-art advances in constraint solving technology to facilitate practical and efficient computation of program invariants. Specifically, we present approaches to infer unquantified invariants over the domain of linear arithmetic and predicate abstraction by reducing the problem to that of solving SAT constraints. We also present approaches to infer quantified invariants over the domain of predicate abstraction by reducing the problem to that of solving SMT constraints. We also show how our framework extends naturally to discovering weakest preconditions and strongest postconditions. This allows us to reason not only about verification of safety properties but also about other interesting applications like termination, and finding most-general counterexamples to both safety and termination properties. Previous applications of SAT/SMT solvers to program analysis have been mostly limited to verification (as opposed to inference) in the presence of user provided loop invariants. On the other hand, our technique uses SMT solvers to automatically infer required loop invariants (requiring the user to only specify invariant templates). We have implemented a prototype of our framework and used it to prove interesting properties of several sophisticated programs, including programs that have not been automatically analyzed before.

4:35 - 4:55 Miniatur: Finding Bugs in Code with a SAT Solver

J. Dolby & M. Vaziri & F. Tip (IBM)

We present an approach for checking code against rich specifications, based on existing work that consists of encoding the program in a relational logic and using a SAT solver to find specification violations. We improve the efficiency of this approach with a new encoding of the program that effectively slices it at the logical level with respect to the specification. We also present new encodings for integer values and arrays, enabling the verification of realistic fragments of code that manipulate both. Our technique can handle integers of much larger ranges than previously possible, and permits large sparse arrays to be handled efficiently. We implemented our technique and evaluated it by checking data structure invariants of several classes taken from the Java Collections Framework. We also checked for violations of Java's equality contract in a variety of open-source programs, and found several bugs. This talk is based on an FSE'07 paper.

4:55 - 5:15 Efficient Runtime Invariant Checking: A Framework and Case Study

M. Gorbovitski & T. Rothamel & Y. A. Liu & S. D. Stoller (SUNY Stony Brook)

This paper describes a general and powerful framework for efficient runtime invariant checking. The framework supports (1) declarative specification of arbitrary invariants using high-level queries, with easy use of information from any data in the execution, (2) powerful analysis and transformations for automatic generation of instrumentation for efficient incremental checking of invariants, and (3) convenient mechanisms for reporting errors, debugging, and taking preventive or remedial actions, as well as recording history data for use in queries. We demonstrate the advantages and effectiveness of the framework through implementations and case studies with abstract syntax tree transformations, authentication in a SMB client, and the BitTorrent peer-to-peer file distribution protocol. A paper describing this work appeared in WODA'08.