I classify myself as a software engineering researcher, but my research tends to fall in the intersection of several areas, most commonly software engineering, programming languages, and databases. As this odd collection probably suggests, my research interests are fairly diverse: software engineering environments, and language, tool, process, and methodology support for the whole software engineering lifecycle; separation and integration of concerns (and integration technologies as a whole) and aspect-oriented software development; traceability; consistency and inconsistency management; advanced object management and concurrency control mechanisms; and most recently, software development governance. Believe it or not, there is a common theme here: I work on tools and technologies to facilitate the development and long-term evolution of complex software.
| Multi-Dimensional Separation of Concerns using Hyperspaces |
| Morphogenic Software |
| Subject-Oriented Software Engineering |
| Consistency and Inconsistency Management for Complex Applications (with researchers at University of Massachusetts) |
| Event-Based Integration Framework (with researchers at University of Massachusetts) |
| Software Engineering Environments (part of the Arcadia Consortium) |
| Object Management for Complex Applications (PhD work) |
Multi-Dimensional Separation of Concerns using Hyperspaces
Multi-dimensional separation of concerns is a new subfield of software engineering. Its goals are to enable:
- Encapsulation of all kinds of concerns in a software system, simultaneously, in a way that permits dynamic (re)selection of concerns.
- Overlapping and interacting concerns.
- On-demand remodularization, to encapsulate new concerns as they are identified during the software lifecycle.
Separation of concerns is a concept that is at the core of software engineering, and it has been since it was introduced in Parnas' seminal work nearly 30 years ago. It refers to the ability to identify, encapsulate, and manipulate those parts of software that are relevant to a particular concern (concept, goal, purpose, etc.). Concerns are the primary motivation for organizing and decomposing software into manageable and comprehensible parts. Many kinds of concerns may be relevant to different developers in different roles, or at different stages of the software lifecycle. For example, the prevalent concern in object-oriented programming is the class, which encapsulates data concerns. Feature concerns, like printing, persistence, and display capabilities, are also common, as are concerns like aspects, roles, variants, and configurations. Appropriate separation of concerns has been hypothesized to reduce software complexity and improve comprehensibility; promote traceability; facilitiate reuse, non-invasive adaptation, customization, and evolution; and simplify component integration.
These goals, while laudable, have not yet been achieved in practice. We believe this is because the set of relevant concerns varies over time and is context-sensitive--different development activities, stages of the software lifecycle, developers, and roles often involve concerns of dramatically different kinds. Thus, any criterion for decomposition will be appropriate for some contexts, but not for all. Further, multiple kinds of concerns may be relevant simultaneously, and they may overlap and interact, as features and classes do.We use the term multi-dimensional separation of concerns (MDSOC) to refer to flexible and incremental separation, modularization, and integration of software artifacts based on any number of concerns. It overcomes limitations of existing mechanisms by permitting clean separation of multiple, potentially overlapping and interacting concerns simultaneously, with support for on-demand remodularization to encapsulate new concerns at any time. Realizations of MDSOC can permit incremental identification and encapsulation of concerns, without requiring the use of new languages or formalisms. MDSOC promotes reuse, improves comprehension, reduces the impact of change, eases maintenance and evolution, improves traceability, and opens the door to system refactoring and reengineering. Thus, it addresses some fundamental limitations in software engineering.
Hyperspaces are our approach to achieving MDSOC. Hyperspaces also provide a powerful composition mechanism that facilitiates non-invasive integration, adaptation, and "plug-and-play." The approach has a low entry barrier, since it does not affect programming languages or processes--developers can continue to use their programming languages, development processes, development environments, and compilers of choice, while still reaping the benefits of multi-dimensional separation of concerns. We have defined a tool, called Hyper/J(TM), which provides support for hyperspaces in Java(TM). This tool is now available on IBM's alphaWorks site, free of charge.
Co-conspirators in this work:
- Harold Ossher, who co-invented Hyperspaces and co-leads this research effort with me.
- Vincent Kruskal and Matt Kaplan, who have done much work on concepts, design, and implementation.
- Prof. Gail Murphy and her group at University of British Columbia, who are collaborating with us on exploration and evaluation of MDSOC and Hyperspaces. This work will include the development of tools to aid in the analysis and identification of concerns, integrating Hyper/J and UBC's existing tool support for conceptual modules.
Morphogenic Software
As software has become more and more pervasive and its life expectancy has increased, it has been subject to greater pressures to integrate and interact with other pieces of software, and to evolve and adapt to uses in all manner of new and unanticipated contexts, both technological (e.g., new hardware, operating systems, software configurations, standards) and sociological (e.g., new domains, business practices, processes and regulations, users). Unfortunately, as the ongoing software crisis attests, software fundamentally cannot meet these challenges. Evolution, adaptation and integration are costly and difficult, if they can be per-formed at all. As a result, more and more people are using software that fails to do what they want, that does things they do not want, that does not integrate well with their hardware, other software, or into their business processes, and that becomes ever more brittle and unreliable over time. Essentially, software is like a square peg being forced into round, triangular, octagonal, and other kinds of ever-changing (and rapidly changing) holes into which it does not, and cannot readily be made to, fit.
This problem arises because it is impossible to predict, even with the most careful analysis, how business practices, business processes, people, and technology will co-evolve over time. As people use software, and as their businesses, domains, and available technologies change, they find different uses for the software, and they impose new requirements on it. In rapidly-changing technology fields, it is little wonder that it is impossible to anticipate all uses to which a piece of software will be put in the future, and in what contexts it will be used.
At present, however, the technology available to aid in software integration and evolution depends fundamentally on the ability to anticipate and pre-plan for change: designers anticipate likely evolution scenarios and variations, and build in "open points" to make them straightforward to accomplish, using techniques such as frameworks and design patterns. When significant changes arise that were not anticipated, reengineering is usually needed. Despite reengineering tools, this remains an exceedingly costly and error-prone process.
The reliance on pre-planning for change is problematic for two reasons. First, as noted above, it is impossible to anticipate all future needs, and changes that were not anticipated remain difficult or impossible to accomplish. Second, many current pressures militate against good up-front planning. Traditional models of software development involve careful requirements analysis and design prior to implementation. This may be appropriate in more traditional contexts, such as the development of large, high-quality systems over a significant period of time, or the engineering of safety-critical software, but in many areas of business and research today there is great pressure to be first to market or first to publish, even at the cost of reduced quality. Products that appear first often capture the market, and higher-quality alternatives released later are often too late. This puts great pressure on developers to shorten the software lifecycle in favor of quick release of a product that does enough to capture interest and a significant share of the market. Initial development is fast and focused, without spending time on planning for future changes that probably couldn't be an-ticipated adequately anyway. The changes come later and are often difficult, requiring some amount of reengineering to introduce needed open points. Clearly, the state-of-the-art approach to evolution--careful pre-planning--does not itself satisfy current and evolving needs.
In a nutshell, software is currently like clay, and it needs to become like gold. Clay is soft and malleable initially, but then it hardens. After that, bumps can be added to it, but it cannot be changed or reshaped without breakage. Attempts to force a hardened clay peg into a hole of a different shape are likely to lead to breakage. Gold, on the other hand, remains malleable for life. It can be reshaped as needed, and will assume the shape of a hole into which it is hammered.
We introduce the term morphogenic software to refer to software that is malleable for life: sufficiently adaptable to allow context mismatch to be overcome with acceptable effort, repeatedly, as new, unanticipated contexts arise. In other words, it can be reshaped to fit into new holes as needed.
Morphogenic software extends the goals of software integration. Traditional integration solutions say, in effect, "integration can be achieved by building software using this particular integration approach (e.g., event mechanism, repository, or process specification approach)." This means that, when writing some software, one does not have to anticipate what other specific software it will have to integrate with, but one does have to commit to a particular context (the choice of integration approach). Adaptation across contexts is still difficult or impossible. Morphogenic software goes a step further by requiring a commitment to integration, but to not any particular approach.
Co-conspirators in this research:
- At IBM Watson: Harold Ossher, Brent Hailpern, and John Field and his group.
- At IBM Haifa: Gabi Zodik and Yael Shaham-Gafni.
Subject-Oriented Software Engineering
Subject-oriented programming is a program-composition technology that supports building object-oriented systems as compositions of subjects. A subject is a collection of classes or class fragments whose hierarchy models its domain in its own, subjective way. A subject may be a complete application in itself, or it may be an incomplete fragment that must be composed with other subjects to produce a complete application. Subject composition combines class hierarchies to produce new subjects that incorporate functionality from existing subjects. Subject-oriented programming thus supports building object-oriented systems as compositions of subjects, extending systems by composing them with new subjects, and integrating systems by composing them with one another (perhaps with "glue" or "adapter" subjects). The technology behind subject-oriented programming is applicable to much of the software lifecycle and has many implications for software engineering.
This research has forked off into a few directions:
- The work on Hyperspaces.
- Message Central, a project (led by Bill Harrison) that aims to provide a technology base to help with component integration by automatically creating interface and conversion glue needed when messages exchanged between the components do not match.
- Subject-oriented design, which is the application of the principles behind subject-oriented programming to design artifacts (initially applied to UML). This is being headed by Siobhán Clarke, who is now on the faculty of Trinity University in Dublin, Ireland.
Co-conspirators in this research:
- Bill Harrison and Harold Ossher, who are the inventors of subject-oriented programming, and Matt Kaplan and Vincent Kruskal, who are long-time collaborators on this project.
- Siobhan Clarke (Dublin City University, Ireland) worked on subject-oriented design for her PhD thesis. She worked with us on the early concepts as well.
Consistency and Inconsistency Management for Complex Applications
The need to define and maintain consistency among objects is a difficult task that arises in many complex applications. One or more objects are consistent if they satisfy some condition(s) for acceptability or correctness. Consistency management is the process of controlling the manipulation of objects to ensure that their consistency definitions are respected, while inconsistency management is the process of controlling the manipulation of objects whose consistency definitions are not satisfied. CIM (consistency and inconsistency management) comprises the definition of consistency conditions, identification of consistency violations, reestablishment of consistency following violations, and ensuring the meaningful manipulation of inconsistent objects. Our work in this area has focused on the development of general frameworks of CIM, on instantiating those frameworks for use in different contexts, and on identifying and managing the tradeoffs involved in performing such an instantiation.
Co-conspirators in this research:
- Lori Clarke, Barbara Lerner, Lee Osterweil, and Krithi Ramamritham (University of Massachusetts), and Stan Sutton (IBM Research).
- Alexander Wise (University of Massachusetts), who co-founded (and named, I believe) "the container problem," a key issue in achieving consistency management in the presence of first-class objects.
Event-Based Integration Framework
Although event-based software integration is one of the most prevalent approaches to loose integration, no consistent model for describing it existed. As a result, there is no uniform way to discuss event-based integration, compare approaches and implementations, specify new event-based approaches, or match user requirements with the capabilities of event-based integration systems. We attempt to address these shortcomings by specifying a generic framework for event-based integration, the EBI framework, that provides a flexible, object-oriented model for discussing and comparing event-based integration approaches. The EBI framework can model dynamic and static specification, composition and decomposition, and can be instantiated to describe the features of most common event-based integration approaches.
Co-conspirators in this research:
- Lori Clarke and Alexander Wise (University of Massachusetts)
- Daniel Barrett (D.E. Shaw & Co.)
Software Engineering Environments
I worked for many years as part of the Arcadia Consortium, a project involving collaborative efforts by researchers at several universities on software engineering environments research. The Arcadia project investigates tools and techniques to improve the software engineering process. The goal of the project is to support the creation of software engineering environments intended for the development, analysis, and maintenance of large, complex software systems, particularly those with high reliability requirements. Additionally, Arcadia is committed to a highly distributed, tool-based architecture that supports flexible environment evolution, heterogeneous tools (i.e., developed using a variety of programming languages, object management systems, etc.), and organizationally dispersed software engineering.
As part of this project, I did research and tool development in a number of areas, including language processing and translator design, language-independent program representations (and language-specific specializations for particular languages), object management (including persistent object management, consistency management, concurrency control, object modeling, and object relationships), software interoperability and integration, traceability, software process programming, and various other aspects of support for software engineering throughout the software lifecycle.
The primary Arcadia sites:
- University of California, Irvine
- University of Colorado, Boulder
- University of Massachusetts, Amherst
Object Management for Complex Applications (PhD thesis)
(I'll let the abstract from my thesis speak for itself here...)
The definition and long-term management of data in complex systems requires extensive support, including high-level type and behavior modeling, persistence, query-based and navigational access, consistency management, and concurrency control. Traditionally, some of these capabilities have been provided by programming languages (e.g., semantically rich type and behavior models and navigational access), while others have been provided by database management systems (e.g., persistence, queries, and concurrency control). No language or database has provided the full set of required capabilities, however. This has typically required developers to program in multiple paradigms, translating explicitly between "programming language" and ``database'' models as necessary to use their respective capabilities. The object-oriented database approach has sought to reduce this impedance mismatch in certain areas, but discrepancies still remain.
We have developed a different approach to addressing the object management needs of complex applications. This approach eliminates the dichotomy between "programming language" and "database"' objects, thus allowing the full set of language and database capabilities to be applied equally to all objects. The resulting object management capabilities are provided in a programming language-like manner, sometimes referred to as a "database programming language." This allows software developers to define objects, their interrelationships, and their behavioral semantics in the same programming language in which they build the systems that manipulate these objects. The object management capabilities can be applied to any kinds of objects these systems may need to define, including non-traditional objects like threads and procedures.
The database programming language approach thus reduces the burden on application developers and minimizes application complexity, resulting in more rapid development of more maintainable software. The database programming language approach raises several challenges, arising in part from the historically different goals of programming languages and databases. Languages are general-purpose and flexible, to support a wide variety of application semantics, while databases impose semantic restrictions to improve performance. Thus, fully integrating programming language and database object management capabilities requires expanding language and database semantics to accommodate capabilities from the other domain. It also requires addressing numerous integration problems that arise when these new semantic models are inconsistent with each other. The result must retain the power of the language and database, and still perform acceptably. In this research, we formally define some of these semantic models, and explore and attempt to address the set of interactions via a prototype implementation and experimental evaluation.
My thesis work (and a bunch of the other work I did) was used in several different Arcadia-related efforts and by some other academic and industrial sites.
Co-conspirators (well, my thesis committee, anyway):
- Lori Clarke, my thesis chair
- Lee Osterweil
- Eliot Moss
- Krithi Ramamritham
- Alexander Wolf
