IBM Research welcomes members of the research community to our seminars. To ensure compliance with IBM security guidelines, we request you to contact the seminar host in advance. When you arrive at the Research lab, please provide the host's name to the receptionist.
| The Surprising Versatility of Trace Tree Compilation: How We Set Out To Build A Better Just-In-Time Compiler And Suddenly Found Ourselves At The Center Of The Mozilla-Google Browser War | ||||
| Prof. Michael Franz | On: | 4-Jun-2009 10:00 AM - 11:30 AM | ||
| UC Irvine | At: | Watson Research Center (Hawthorne), Room 1S-F40 | ||
| Host: | Calin Cascaval | |||
Abstract: Except for the fact that they operate at run-time and use smaller compilation units, just-in-time compilers have otherwise been quite similar to traditional batch compilers: they all first construct a model (graph) of a program's control flow, which is then traversed while generating code. My research group at UCI has been investigating an alternative technique in which the compiler never even attempts to build a complete control flow graph. Instead, our technique uses a novel (and patent pending) intermediate representation, the Trace Tree, which is constructed lazily on-demand while the program is simultaneously executed, incrementally compiled, and optimized. Our compilation technique is surprisingly competitive at much lower implementation complexity. Our academic prototype Java compiler has been able to attain a performance similar to commercial production compilers while using only about 1/7th of the memory footprint, 1/30th of the compile time, and 1/100th of the actual compiler size. Early experiments showed an even greater benefit for dynamically-typed languages such as JavaScript. Our academic experiments are now being validated in one of the largest "real world" trials imaginable. Mozilla recently selected our Trace Tree compiler as the new JavaScript engine for Firefox 3.1, with an expected 400 Million installations to come. Even in the current beta version, the Trace Tree compiler's JavaScript performance is a surprising 700% higher than that of FireFox's 3.0, and a staggering 15 times (1500%) faster than that of Internet Explorer. On the other hand, Google recently launched their Chrome browser, which uses a superbly well engineered but "traditional" control-flow based JavaScript compiler. As both browsers mature, the performance competition between them is likely to settle the question whether or not one should base future compilers on Trace Trees. Speaker biography: Michael Franz is a Professor of Computer Science in the Donald Bren School of Information and Computer Science at the University of California, Irvine (UCI) and has an additional courtesy appointment as a Professor of Electrical Engineering and Computer Science in the Henry Samueli School of Engineering. Franz received a Dr. sc. techn. degree in Computer Science (advised by Niklaus Wirth) and a Dipl. Informatik-Ing. degree, both from the Swiss Federal Institute of Technology, ETH Zurich. | ||||
| Interprocedural Extended Static Checking | ||||
| Domagoj Babic | On: | 4-May-2009 10:00 AM - 11:30 AM | ||
| University of British Columbia | At: | Watson Research Center (Hawthorne), Room GN-F15 | ||
| Host: | Vijay Saraswat | |||
Abstract: Automatic software verification and bug finding have been a long-held goal in software engineering. Many techniques exist, trading off varying levels of automation, coverage, precision, and scalability. Extended Stating Checking (ESC), a combination of static checking and decision procedures, has emerged as a powerful technique for improving software reliability. The major limitation of the ESC paradigm is that it requires programmers to write tedious pre- and post-conditions for every function. One way around this restriction is to perform a whole-program interprocedural analysis. Unfortunately, previously known scalable interprocedural analyses are not precise enough for verification purposes. This talk presents a practical new technique for scalable interprocedural path-sensitive analysis, named structural abstraction, prototyped in the Calysto extended static checker. While interprocedural analysis eliminates the need for manual code annotation with pre- and post-conditions, it imposes a heavy burden on decision procedures because it requires checking the validity of large, complex logical formulas (i.e., verification conditions). The second topic of this talk is a novel approach to solving such large verification conditions. The presented approach gave a 100-fold improvement in performance over the best previously known approaches, significantly improving the usability of modern ESC tools. Thanks to these advancements, Calysto discovered dozens of bugs, completely automatically, in hundreds of thousands of lines of production, open-source applications, with a low rate of false error reports. Speaker biography: Domagoj Babic received his Dipl.Ing. in Electrical Engineering and M.Sc. in Computer Science from the Zagreb University (Faculty of Electrical Engineering and Computing) in 2001 and 2003. He received his Ph.D. in Computer Science in 2008 from the University of British Columbia. Domagoj was a recipient of Microsoft Research Graduate Fellowship in 2005. Domagoj's research focuses on software verification, tools for software engineering, and decision procedures. Lately, he is spending most of his free time researching verification and design of concurrent programs. | ||||
| Measuring Perceptible Performance with Listener Latency Profiling | ||||
| Prof. Matthias Hauswirth | On: | 29-Apr-2009 10:00 AM - 11:30 PM | ||
| University of Lugano, Switzerland | At: | Watson Research Center (Hawthorne), Room 1S-F40 | ||
| Host: | Peter Sweeney | |||
Abstract: Characterizing the performance of modern interactive applications is difficult for two main reasons. First, measuring the time from start to end of the application is pointless because that time is dominated by the user's think time, not by the execution time of the application. So, how should one characterize application performance? Second, modern interactive applications are often built on a framework, such as IBM's Eclipse or Sun's NetBeans, and consist of hundreds of plugins. Different users running these applications will install and use different sets of plugins. So, which plugins should one include when assessing the performance of an application? In this talk we address these two issues. We present an approach, listener latency profiling, to characterize the perceptible performance of interactive applications, and we describe an implementation, LiLa, which is lightweight enough so it can be deployed in production settings to gather the performance of the configurations installed by the actual users. While traditional profilers capture a profile that summarizes the total time spent in each method, a listener latency profiler identifies individual user-initiated operations with perceptible latency, and it does so on the level of listeners (also called observers) that form the basis of object-oriented interactive applications. LiLa, our lightweight listener latency profiler, produces accurate profiles while incurring minimal runtime overhead. Thanks to its low overhead it can be deployed in production settings to gather latency profiles in the field and report them back to the developers. We describe the different aspects of our approach, evaluate how they impact accuracy, and present case studies using our profiler on several complex object-oriented interactive applications. Speaker biography: Matthias Hauswirth is an assistant professor at the University of Lugano in Switzerland. Matthias is interested in approaches for measuring, understanding, and improving the performance of modern, complex systems. He is particularly intrigued by the intricate interactions between the different system layers, from the hardware, over the operating system, virtual machine, application frameworks, all the way to the applications and their users. | ||||
| The Lively Kernel, and Others I Have Known | ||||
| Dan Ingalls | On: | 28-Apr-2009 02:00 PM - 03:00 PM | ||
| Distinguished Engineer | At: | Watson Research Center (Hawthorne), Room GN-F15 | ||
| Sun Microsystems | Host: | Brent Hailpern | ||
Abstract: The first half of this talk will survey a number of the projects Dan has masterminded, including language design, virtual machine design, graphics and user interfaces. Each of these has involved a certain amount of thinking outside the box, fitting systems into a box, and in some cases programming with boxes, and all have turned out to be a lot of fun because of their aliveness. In the second part, Dan will demonstrate his latest work, the Lively Kernel, a rethinking of the Web as a universe of active objects. The Lively Kernel starts with JavaScript and browser graphics and builds a complete, reflective, self-supporting programming environment from scratch. This is a live web application with no download or installation required -- it is simply a web page. That's what Dan likes to do, but the idea is that if you can do that, you can build lots of other useful Internet applications in the same manner. Speaker biography: Dan Ingalls, a Distinguished Engineer at Sun Microsystems, is best known for his work on the Smalltalk programming environment, which revolutionized computing for both users and developers through human- computer interaction, the object-oriented paradigm, and development in integrated environments. He also invented pop-up menus and advanced the state of graphics with BitBlt and its variations that support rotation and antialiasing. For his noteworthy contributions, he has received the ACM Grace Murray Hopper Award and the ACM Software System Award. His most recent work takes these ideas to the World Wide Web in the Lively Kernel Project. | ||||
| Finding and Exploiting Parallelism in Irregular Applications | ||||
| Dr. Milind Kulkarni | On: | 20-Apr-2009 10:00 AM - 11:30 AM | ||
| UT Austin | At: | Watson Research Center (Hawthorne), Room GN-F15 | ||
| Host: | David Grove | |||
Abstract: With the advent of multicore processors, the challenge of increasing program performance has become one of parallelization. While much research over the past three decades has focused on parallelizing dense-array and matrix programs, far less attention has been paid to irregular programs, which operate over pointer-based data structures such as trees and graphs, where traditional approaches have largely failed to uncover significant amounts of parallelism. In this talk, I will show that irregular programs do, indeed, have large amounts of parallelism, and that this type of parallelism, which I call "amorphous data-parallelism," can be exploited efficiently and easily. I will describe the Galois system, which uses high-level abstractions to expose amorphous data-parallelism in sequential irregular programs, and uses semantic properties of these programs to perform automatic parallelization. I will then present a number of optimizations which allow programmers to exploit locality in pointer-based data structures to improve scalability and reduce overheads. I will show that the Galois approach is able to extract significant amounts of parallelism from amorphous data-parallel programs with little programmer effort. Speaker biography: Milind Kulkarni is a Postdoctoral Research Associate at the Institute for Computational Engineering and Sciences at the University of Texas at Austin. His research focuses on developing language features, compiler techniques and runtime systems that can be used to easily and efficiently parallelize applications for multicore processors. He received his Ph.D. in Computer Science from Cornell University in 2008, where he worked with Keshav Pingali. | ||||
| Dynamically Fighting Bugs: Prevention, Detection, and Elimination | ||||
| Shay Artzi | On: | 16-Apr-2009 10:00 AM - 11:30 AM | ||
| At: | Watson Research Center (Hawthorne), Room GN-F15 | |||
| Host: | Frank Tip | |||
Abstract: In this talk, I will present three techniques, each offering a solution to one of the tasks mentioned in the title (prevention, detection, and elimination of bugs), and each based on dynamic analysis or a combination of static and dynamic analysis. In addition, I will present a combined static and dynamic analysis for classifying the mutability of references. This mutability classification increases the effectiveness of the prevention technique and the performance of the elimination technique. I will spend the bulk of my time on the detection technique. This technique leverages dynamic testing and explicit state model-checking to automatically expose and localize crashes and malformed dynamically-generated Web pages in dynamic Web applications. These common problems seriously impact the usability of Web applications and are not addressed adequately by current tools for Web-page validation, which cannot handle the dynamically-generated pages that are ubiquitous on today's Internet. My technique generates tests automatically, runs the tests capturing logical constraints on inputs, and minimizes the conditions on the inputs to failing tests, so that the resulting bug reports are small and useful. The detection technique was implemented in Apollo, a tool that automates my technique. Apollo found 304 failures in six open source web applications. | ||||
| Interprocedural Analysis and the Verification of Concurrent Programs | ||||
| Akash Lal | On: | 13-Apr-2009 10:00 AM - 11:30 AM | ||
| University of Wisconsin-Madison | At: | Watson Research Center (Hawthorne), Room GN-F15 | ||
| Host: | Olivier Tardieu | |||
Abstract: In the modern world, not only is software getting larger and more complex, it is also becoming pervasive in our daily lives. On the one hand, the advent of multi-core processors is pushing software towards becoming more concurrent. On the other hand, software is everywhere, inside nuclear reactors, space shuttles, cars, traffic signals, cell phones, etc. In meeting this demand for software, reliability will be a key factor, for which program verification plays an important role. My research addresses both of these challenges. First, I designed new techniques for the verification of concurrent programs. My thesis explores the problem of context-bounded verification, where only those concurrent behaviors that have a bounded number of context switches between threads are considered. We show that context-bounded verification provides a viable alternative to full verification because it can be very efficient and most bugs can be found in a few context switches. Second, along with colleagues at Wisconsin, I designed a model checker for machine code. In many scenarios, including those with malicious software, source code or debugging information may not be available. In those cases, it is necessary to have a tool that can understand machine code. My thesis shows how we went about systematically building an infrastructure for machine code for use inside a verification tool called McDash. We give evidence that McDash can understand the intricacies of low-level code, while competing well with source-level tools. | ||||
| Abstraction Mechanisms for Modular Software Construction | ||||
| Shan Shan Huang | On: | 6-Apr-2009 09:30 AM - 11:00 AM | ||
| Georgia Tech | At: | Watson Research Center (Hawthorne), Room GN-F15 | ||
| Host: | Erik Altman | |||
Abstract: The complexity of modern software can be better managed through modular software construction, where software is built through the composition of modular, reusable pieces of code. Language abstraction mechanisms are key in writing reuseable code: procedural abstraction, developed in the 1960's, allows code to be reused over sets of concrete values; type abstraction, developed in the 1970's, allows code to be reused over sets of concrete types. In this talk, I introduce a possible next step in the development of abstraction mechanisms: structural abstraction. Structural abstraction supports an even higher level of reuse, by allowing code to abstract over the concrete structure of other code. I introduce structural abstraction through my techniques, ""static type conditions"" and ""morphing""; I demonstrate the benefits of these techniques through large case studies; and I show that they are the first to offer this higher level of reusability, while maintaining the high level safety guarantees of separate type-checking. I also briefly touch upon my work in raising the abstraction level used in programming hardware (e.g., FPGAs), from gates, wires, and registers to high-level Object-Orientation, with features such as data encapsulation, dynamic dispatch, and automatic memory management. Speaker biography: Shan Shan Huang is a Ph.D. candidate at the Georgia Institute of Technology, advised by Prof. Yannis Smaragdakis. Her research focuses on increasing programmer productivity by raising the level of abstraction offered by programming languages. Shan Shan is the recipient of the National Science Foundation Graduate Fellowship and the Intel Corporation Ph.D. Fellowship. She has conducted language design research as a intern at IBM T.J. Watson Research Center and at Sandia National Laboratory. She received her Bachelors of Science in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology. | ||||
| Supporting Liquid Metal with a Flexible FPGA Environment | ||||
| Prof. Krste Asanovic | On: | 6-Apr-2009 01:00 PM - 02:30 PM | ||
| UC Berkeley | At: | Watson Research Center (Hawthorne), Room GNF15 | ||
| Host: | Rodric Rabbah | |||
Abstract: In this talk, I will describe work at UC Berkeley as part of the Liquid Metal OCR (Open Collaborative Research) project. The Liquid Metal project seeks to map code written in the LiME language to execute efficiently on a system with both FPGAs and traditional processors. The LiMe language is essentially a superset of Java, while FPGAs are traditionally programmed at the logic gate level. Bridging this large ""semantic gap"" requires a combination of extensive compiler transformations and an effective runtime system bridging both conventional processors and attached FPGAs. UC Berkeley has developed several generations of software and gateware layers that make it easier to use FPGAs to accelerate a range of computing tasks, including architectural simulation in the RAMP project. I will describe how we are adapting and extending these layers for use with LiMe, and also highlight several related developments at UC Berkeley in the areas of high-performance reconfigurable computing. Speaker biography: Krste Asanovic is an Associate Professor in the Computer Science Division of the EECS Department at the University of California at Berkeley. He received a B.A. Electrical Engineering and Information Sciences from Cambridge University in 1987 and a PhD in Computer Science from UC Berkeley in 1998. His main research areas are computer architecture, VLSI design, parallel programming, and operating system design. He was one of the original instigators of the multi-university RAMP project, which aims to make it easier to use FPGAs for architecture research. He helped found the Par Lab at UC Berkeley, which is tackling the design and programming of manycore architectures. He also leads the Architecture Group at the International Computer Science Institute, and holds a joint appointment with the Lawrence Berkeley National Laboratory. Previously at MIT, he led the SCALE group, investigating advanced architectures for energy-efficient high-performance computing. | ||||
| Social Software Engineering: Individual Communication, Coordination and Congruence | ||||
| Patrick Wagstrom | On: | 30-Mar-2009 10:00 AM - 11:30 AM | ||
| CMU | At: | Watson Research Center (Hawthorne), Room GN-F15 | ||
| Host: | Clay Williams | |||
Abstract: Often utilizing formal definitions, new processes, and alternative programming paradigms designed to increase success, software engineering is not typically portrayed as collaborative social process. Yet, a growing number of researchers and practitioners recognize that social interactions amongst developers, firms and foundations are a key element of building and maintaining high quality software in an open collaborative ecosystem. This talk examines one method of understanding how social interactions, task assignments, and dependencies between tasks interact at a team and individual level and their relation to overall team productivity through the lens of socio-technical congruence (STC). I explain how this organizational level view can be modified to provide increased relevance for individual team members. These modifications have the benefit that they allow the team member or manager to differentiate between general communication and communication that addresses dependencies within the project. I further address the practical implications of building tools around STC by evaluating the sensitivity of such networks to noise and architectural changes. | ||||
| Double-header talk: (1) Automatically Finding Patches Using Genetic Programming, (2) The Road Not Taken: Estimating Path Execution Frequency Statically | ||||
| Wes Weimer and Ray Buse | On: | 12-Mar-2009 10:00 AM - 11:30 AM | ||
| University of Virginia | At: | Watson Research Center (Hawthorne), Room GN-F15 | ||
| Host: | Mark Wegman | |||
Abstract: Talk 1 (Wes Weimer, ~20 minutes): Automatically Finding Patches Using Genetic Programming The automatic repair of programs has been a longstanding goal in software engineering, yet debugging remains a largely manual process. We introduce a fully automated method for locating and repairing bugs in software. The approach works on off-the-shelf legacy applications and does not require formal specifications, program annotations or special coding practices. Once a program fault is discovered, an extended form of genetic programming is used to evolve program variants until one is found that both retains required functionality and also avoids the defect in question. Standard test cases are used to exercise the fault and to encode program requirements. After a successful repair has been discovered, it is minimized using structural differencing algorithms and delta debugging. We describe the proposed method and report results from an initial set of experiments demonstrating that it can successfully repair a dozen different C programs totaling over 100,000 lines in under 200 seconds, on average. Talk 2 (Ray Buse, ~20 minutes): The Road Not Taken: Estimating Path Execution Frequency Statically A variety of compilers, static analyses, and testing frameworks rely heavily on path frequency information. Uses for such information range from optimizing transformations to bug finding. Path frequencies are typically obtained through profiling, but that approach is severely restricted: it requires running programs in an indicative environment, and on indicative test inputs. We present a descriptive statistical model of path frequency based on features that can be readily obtained from a program's source code. Our model is over 90% accurate with respect to several benchmarks, and is sufficient for selecting the 5% of paths that account for over half of a program's total runtime. We demonstrate our technique's robustness by measuring its performance as a static branch predictor, finding it to be more accurate than previous approaches on average. Finally, our qualitative analysis of the model provides insight into which source-level features indicate "hot paths." Speaker biography: Wes Weimer research interests: static and dynamic analyses, automatic program repair, (formal) specification mining, defect reporting and triage, program-level security, model checking, regression testing for web applications. Ray Buse research interests: large-scale program analyses, imprecise static analyses, software readability, automatic documentation inference (e.g., javadoc), statistical models, "tools that are useful for people in the trenches" (e.g., architect's workbench). | ||||
| Compositional Shape Analysis by means of Bi-Abduction | ||||
| Dr. Dino Distefano | On: | 9-Mar-2009 10:00 AM - 11:30 AM | ||
| Queen Mary, University of London | At: | Watson Research Center (Hawthorne), Room 1S-F40 | ||
| Host: | Eran Yahav | |||
Abstract: This talk describes a compositional shape analysis, where each procedure is analyzed independently of its callers. The analysis uses an abstract domain based on a restricted fragment of separation logic, and assigns a collection of Hoare triples to each procedure; the triples provide an over-approximation of data structure usage. Compositionality brings its usual benefits --increased potential to scale, ability to deal with unknown calling contexts, graceful way to deal with imprecision -- to shape analysis, for the first time. The analysis rests on a generalized form of abduction (inference of explanatory hypotheses) which we call bi-abduction. Bi-abduction displays abduction as a kind of inverse to the frame problem: it jointly infers anti-frames (missing portions of state) and frames (portions of state not touched by an operation), and is the basis of a new interprocedural analysis algorithm. We have implemented our analysis algorithm and we report case studies on smaller programs to evaluate the quality of discovered specifications, and larger programs (e.g., an entire Linux distribution) to test scalability and graceful imprecision. | ||||
| Protecting Java from Native Code | ||||
| Prof. Gang Tan | On: | 12-Feb-2009 10:00 AM - 11:15 AM | ||
| Lehigh University | At: | Watson Research Center (Hawthorne), Room 1NF28 | ||
| Host: | Martin Hirzel | |||
Abstract: Most software systems are multilingual; that is, they consist of components developed in multiple programming languages. For example, Sun's Java Development Kit 1.6 (JDK) contains around 2 million lines of Java code as well as 800,000 lines of C/C++ code for implementing native methods. It is well known that native methods in Java defeat Java's guarantees of safety and security and are constant sources of software bugs and vulnerabilities. This was confirmed by our recent empirical security study, which identified O(100) bugs in the native code of JDK 1.6. In this talk, we present language-based techniques for ensuring safety and security of Java-C interoperation. The first system, SafeJNI, employs static and dynamic checks to enforce security policies on C code (such as protecting the integrity of the JVM heap). The second system, ILEA, enables existing Java analyzers to understand foreign C code, by automatically extracting an approximate Java specification from C code. Finally, we discuss some ongoing work, including binary rewriting for enforcing security policies on native code. Speaker biography: Gang Tan is an assistant professor of Computer Science and Engineering at Lehigh University. He received his B.E. degree in Computer Science from Tsinghua University in 1999, and his Ph.D. degree from Princeton University in 2005. His research interests are in the areas of programming languages and computer security. | ||||
| Parrot VM: dynamic features for modern virtual machines | ||||
| Allison Randal | On: | 5-Feb-2009 10:00 AM - 11:30 AM | ||
| O'Reilly | At: | Watson Research Center (Hawthorne), Room GN-F15 | ||
| Host: | Michael Hind | |||
Abstract: Dynamic languages are a hot area in commercial and academic research today. Early R&D in modern dynamic languages (such as Perl, Ruby, Python, and PHP) was often marked by a conservative approach, taking dynamic features in tightly controlled doses to make them fit existing models developed for static languages. That attitude has begun to change in recent years, and the major commercial virtual machines are now seeking to support dynamic features. Parrot is unique in the field of virtual machines targeting modern dynamic languages. It incorporates an object-oriented assembly language, is register-based rather than stack-based, and employs continuations as the core means of flow control. It hosts a powerful suite of compiler tools tailored to dynamic languages and a next generation regular expression engine. This talk outlines the overall architecture of Parrot with emphasis on the features that set it apart. Speaker biography: Allison Randal is chief architect and lead developer of the open source project Parrot. In over 25 years as a programmer, she has developed everything from games to linguistic analysis tools, e-commerce websites, shipping fulfillment, compilers, and database replication systems, worked as a language designer, project manager, conference organizer, editor, and consultant, been president of an open source software foundation, written two books, and founded a tech publishing company. | ||||
| Automated Fault Localization: Current Research and Future Directions | ||||
| Prof. Mary Jean Harrold | On: | 15-Jan-2009 10:00 AM - 11:30 AM | ||
| Georgia Tech | At: | Watson Research Center (Hawthorne), Room GN-F15 | ||
| Host: | Frank Tip | |||
Abstract: The high cost of locating faults in software motivates the development of techniques that assist in fault localization--the task of finding the places in the code that must be changed to fix the faults. Fault-localization techniques attempt to automate this task by modeling the software using runtime information, and using these models to compute the suspiciousness of parts of the software. The effectiveness of these techniques depends on the type of information gathered, the way in which that information is used to compute the suspiciousness, and the composition of the test suite that is used to produce the runtime information. We have developed techniques that gather information at runtime, such as simple coverage, and techniques that gather more complex information, such as dependencies and state. Our techniques then use this runtime information to build statistical models on which we compute suspiciousness. We have shown that our techniques are useful for locating both single and multiple faults in the software. We have studied empirically the effectiveness of these different types of runtime information on the fault localization, and shown that our techniques outperform other existing techniques. We have also investigated the way in which the composition of the test suite affects the effectiveness of the fault localization, and shown that the test-suite composition is an important factor in the fault localization. In this talk, I will describe our fault-localization techniques and discuss the results of the empirical studies that demonstrate the effectiveness of the techniques relative to other techniques, the impact of using different types of coverage, and the implications of the test-suite composition. I will also discuss remaining problems in the area of fault localization and discuss next steps in assisting in debugging, such as fault comprehension, fault diagnosis, and fault repair. Speaker biography: Mary Jean Harrold is the ADVANCE Professor of Computing and the Associate Dean for Faculty Affairs in the College of Computing at Georgia Tech. She performs research in analysis and testing of large evolving software, in fault localization and failure identification using statistical analysis, machine learning, and visualization, and in monitoring deployed software to improve quality. She has received funding for her research from government agencies, such as NSF and NASA, and industries, such as Boeing Commercial Airplanes, Tata Consultancy Services, IBM, and Microsoft. She served on the editorial boards of ACM TOPLAS and TOSEM and JSTVR, served as program chair for the ACM ISSTA 2000, program co-chair of the ACM/IEEE ICSE 2001, and general chair for ACM SIGSOFT FSE 2008. She also serves as a member of the Board of Directors for the Computing Research Association (CRA). Professor Harrold actively works to increase the participation of women in computing. She is a member, and past co-chair, of CRA-W and a member of the Leadership Team of the National Center for Women and Information Technology (NCWIT). Professor Harrold received an NSF NYI Award and was named an ACM Fellow. She received the Ph.D. in computer science from the University of Pittsburgh. | ||||
| A tag-based approach for the design and composition of information processing applications | ||||
| Anand Ranganathan | On: | 12-Jan-2009 10:00 AM - 11:30 AM | ||
| IBM Research | At: | Watson Research Center (Hawthorne), Room 1S-F40 | ||
Abstract: This talk is based on a paper in the OOPSLA'08 Onward! track by Eric Bouillet, Mark Feblowitz, Zhen Liu, Anand Ranganathan, and Anton Riabov.   In the realm of component-based software systems, pursuers of the holy grail of automated application composition face many significant challenges. In this paper we argue that, while the general problem of automated composition in response to high-level goal statements is indeed very difficult to solve, we can realize composition in a restricted context, supporting varying degrees of manual to automated assembly for specific types of applications. We propose a novel paradigm for composition in flow-based information processing systems, where application design and component development are facilitated by the pervasive use of faceted, tag-based descriptions of processing goals, of component capabilities, and of structural patterns of families of application. The facets and tags represent different dimensions of both data and processing, where each facet is modeled as a finite set of tags that are defined in a controlled folksonomy. All data flowing through the system, as well as the functional capabilities of components are described using tags. A customized AI planner is used to automatically build an application, in the form of a flow of components, given a high-level goal specification in the form of a set of tags. End-users use an automatically populated faceted search and navigation mechanism to construct these high-level goals. We also propose a novel software engineering methodology to design and develop a set of reusable, well-described components that can be assembled into a variety of applications. With examples from a case study in the Financial Services domain, we demonstrate that composition using a faceted, tag-based application design is not only possible, but also extremely useful in helping end-users create situational applications from a wide variety of available components. | ||||
| Programming in Prose | ||||
| Prof. Jerry L. Potter | On: | 12-Jan-2009 01:00 PM - 02:15 PM | ||
| Colorado State University | At: | Watson Research Center (Yorktown), Room 20-001 | ||
| Host: | Craig Stunkel | |||
Abstract: As a major component of the CS community, IT deserves its own computing paradigm. Tasks such as spreadsheets, browsers, databases and data mining are very time consuming and require large volumes of data, fast I/O and benefit greatly from multi-core massive data parallelism, human direction and interaction. Yet the von Neumann/Fortran paradigm with its central processing unit bottleneck, which was developed in the 1950s for scientific and engineering applications with significantly different needs, still dominates. The prose model of computing is easy to use and understand. It supports user interaction with simple tabular data structures and natural language like commands. The model supports the use of data parallelism and can easily handle inputting, processing and outputting large volumes of data in real time without incurring intractable multi-processor scheduling and communication overheads. The pseudo natural language syntax, based on the use of multi-word identifiers, verbs and prepositions, has several advantages. First, the code is easy to read and is self documenting, making it easier to debug and therefore more efficient and reliable. Second, the commands can be verbalized so that Plain English can be used where keyboards and mice can not be used or are not otherwise needed, i.e. PDAs, cell phones, table PCs, etc. Third, programs do not need to be compiled off line so that programming is dynamic. That is, the user can generate Plain English commands as needed, based on the data being retrieved and processed. New sequences of commands can be saved and used to generate new commands, allowing the users to dynamically generate and update their own applications. | ||||
| High-Level Low-Level Programming and Immix Garbage Collection | ||||
| Prof. Steve Blackburn | On: | 11-Dec-2008 01:00 PM - 02:30 PM | ||
| Australian National University | At: | Watson Research Center (Hawthorne), Room GN-F15 | ||
| Host: | David Grove | |||
Abstract: I will present two short talks. The first explores high level languages for systems programming (VEE'09) and the second describes the Immix garbage collector (PLDI'08). The power of high level languages lies in their abstraction over hardware and software complexity, leading to greater security, better reliability and lower development costs. However, opaque abstractions are often show-stoppers for systems programmers, forcing them to either break the abstraction, or more often, simply give up and use a different language. My second short talk addresses the challenge of opening up a high-level language to allow practical low-level programming without forsaking integrity or performance. The contribution of this work is three-fold: 1) we draw together common threads in a diverse and piece-meal literature, 2) we identify a framework for extending high-level languages for low-level programming, and 3) we illustrate the power of this approach through two concrete case studies. The simple framework we describe allows a broad family of low-level extensions to high-level languages via just three core ideas: extending semantics via intrinsic methods, extending types via unboxing and architectural-width primitives, and controlling semantics via scoped semantic regimes. We develop these ideas through the context of a rich literature and substantial practical experience. We show that they provide the power necessary to implement substantial artifacts such as a high-performance virtual machine, while preserving the software engineering benefits of the host language. This is part of Daniel Frampton's PhD at ANU, and joint work with Dave Grove, Perry Cheng, Robin Garner, Eliot Moss and Sergey Salishev. A garbage collector can directly determine program performance by making a classic space-time tradeoff that seeks to provide space efficiency, fast reclamation, and mutator performance. The three canonical tracing garbage collectors: semi-space, mark-sweep, and mark-compact each sacrifice one objective. My first short talk describes a collector family, called Mark-Region, and introduces opportunistic defragmentation, which mixes copying and marking in a single pass. Combining both, we implement Immix, a novel high performance garbage collector that achieves all three performance objectives. We show that immix outperforms existing canonical algorithms, improving total application performance by 7 to 25% on average across 20 benchmarks. As the mature space in a generational collector, immix matches or beats a highly tuned generational collector, e.g. it improves jbb by 5%. These innovations and the identification of a new family of collectors open new opportunities for garbage collector design. This is joint work with Kathryn McKinley | ||||
| Towards a Science of Parallel Programming: From Backus to Codd | ||||
| Prof. Keshav Pingali | On: | 11-Dec-2008 01:00 PM - 02:30 PM | ||
| UT Austin | At: | Watson Research Center (Yorktown), Room 20-001 | ||
| Host: | Calin Cascaval | |||
Abstract: When parallel programming started in the 70's and 80's, it was mostly art: languages such as functional and logic programming languages were designed and appreciated mainly for their elegance and beauty. More recently, parallel programming has become engineering: conventional languages like FORTRAN and C++ have been extended with constructs such as OpenMP, and we now spend our time benchmarking and tweaking large programs no one understands to obtain performance improvements of 5-10%. In spite of all this activity, we have few insights into how to write parallel programs to exploit the performance potential of multicore processors. To break this impasse, we need a science of parallel programming. In this talk, I will argue that this requires replacing our current world view, which we owe to an IBM Turing award winner, John Backus, with that of another IBM Turing award winner, Ted Codd. I will then introduce a concept called "amorphous data-parallelism" that provides a simple, unified picture of parallelism in a host of diverse applications ranging from mesh generation/refinement/partitioning to SAT solvers, maxflow algorithms, stencil computations and event-driven simulation. Finally, I will present a natural classification of these kinds of algorithms that provides insight into the structure of parallelism and locality in these algorithms, and into appropriate language and systems support for exploiting this parallelism. Speaker biography: Keshav Pingali is the W.A."Tex" Moncrief Chair of Computing in the Department of Computer Sciences at the University of Texas, Austin. | ||||
| Practical Checking of Heap Assertions | ||||
| Bard Bloom | On: | 9-Dec-2008 10:00 AM - 11:30 AM | ||
| IBM Research | At: | Watson Research Center (Hawthorne), Room GN-K35 | ||
Abstract: Unrestricted use of heap pointers complicates understanding software systems. Incidental and accidental pointer aliasing results in unexpected side effects of seemingly unrelated operations, and are a major source of system failures. It is difficult or impossible to test and debug such failures with existing tools, especially when programs are concurrent or the assertions may inspect the entire heap. In this paper, we present a practical solution that enables the programmer to check ownership, sharing, reachability and other heap properties during program runtime. Technically, we allow the programmer to add expressive heap assertions into her code. Our approach uses a specialized virtual machine that efficiently evaluates the assertions using the components of a parallel garbage collector. We have implemented our approach on top of a production virtual machine. In this paper we demonstrate its usefulness by describing numerous real-world usage scenarios in which we found expressive heap assertions to be valuable. | ||||
| Dynamic Program Understanding | ||||
| Prof. Steven P. Reiss | On: | 5-Dec-2008 10:00 AM - 11:30 AM | ||
| Brown University | At: | Watson Research Center (Hawthorne), Room 1S-F40 | ||
| Host: | John Field | |||
Abstract: We are interested in understanding the dynamic behavior of large-scale, long-running software systems. This talk will describe our current approaches to dynamic program understanding. Performance analysis, in the broadest sense, is a central issue in understanding the dynamic behavior of a system. Understanding the performance of server-based software that runs continually for long periods of time under widely varying loads presents a number of challenges. To address these issues we have developed a methodology consisting of a monitoring framework along with a set of specialized monitoring agents called proflets. The system gathers performance data with a guaranteed maximum overhead that is settable by the user by doing appropriate sampling and scheduling of the proflets. This makes the system suitable for use on production systems. The data is displayed dynamically through either a graphical or a web-based interface. Many server applications are event-based, with the bulk of the server execution listening for events and then processing them. Performance analysis of such applications should concentrate on event processing rather than on traditional performance measures. As part of our performance analysis tool we have developed a proflet that finds and then monitors event handlers. We are currently in the process of extending this to actually track the flow of events through the system. To fully understand the dynamics of a complex application, one needs to be able to ask and answer specific questions about the dynamics in terms of the application. Here we have developed a visualization system that lets the programmer define a application-specific dynamic visualization in terms of program events and an underlying data model and then view the result dynamically and historically. Our eventual goal is to combine these efforts into a comprehensive system that provides detailed dynamic analysis of just the parts of an application that are of interest to the programmer and does so in terms of the application and the expressed interests. We conclude by describing how this approach might work. | ||||
| QVM: An Efficient Runtime for Detecting Defects in Deployed Systems | ||||
| Matthew Arnold | On: | 4-Dec-2008 01:00 PM - 02:30 PM | ||
| IBM Research | At: | Watson Research Center (Hawthorne), Room | ||
Abstract: This talk is based on an OOPSLA 2008 paper by Matthew Arnold, Martin Vechev, and Eran Yahav. Coping with software defects that occur in the post-deployment stage is a challenging problem: bugs may occur only when the system uses a specific configuration and only under certain usage scenarios. Nevertheless, halting production systems until the bug is tracked and fixed is often impossible. Thus, developers have to try to reproduce the bug in laboratory conditions. Often the reproduction of the bug consists of the lion share of the debugging effort. In this paper we suggest an approach to address the aforementioned problem by using a specialized runtime environment (QVM, for Quality Virtual Machine). QVM efficiently detects defects by continuously monitoring the execution of the application in a production setting. QVM enables the efficient checking of violations of user-specied correctness properties, e.g., typestate safety properties, Java assertions, and heap properties pertaining to ownership. QVM is markedly different from existing techniques for continuous monitoring by using a novel overhead manager which enforces a user-specied overhead budget for quality checks. Existing tools for error detection in the field usually disrupt the operation of the deployed system. QVM, on the other hand, provides a balanced trade off between the cost of the monitoring process and the maintenance of sufficient accuracy for detecting defects. Specifically, the overhead cost of using QVM instead of a standard JVM is low enough to be acceptable in production environments. We implemented QVM on top of IBM's J9 Java Virtual Machine and used it to detect and fix various errors in real world applications. | ||||
