|
|
A Market-Based Approach to Software Evolution
|
|
|
Compile-Time Polymorphism on a Diet
|
|
|
Avoiding Unbounded Priority Inversion in Barrier Protocols Using Gang Priority Management
|
|
|
A Computing Origami: Folding Streams in FPGAs
|
|
|
PTIDES on Flexible Task Graph: Real-Time Embedded System Building from Theory to Practice
|
|
|
Low-Latency Time-portable Real-time Programming with Exotasks
Journal version of the work on the Exotask system and its realization in the JAviator helicopter, extended to integrate previous work on Eventrons. |
|
|
Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection
|
|
|
Optimus: Efficient Realization of Streaming Applications on FPGAs
|
|
|
Liquid Metal: Object-Oriented Programming Across the Hardware/Software Boundary
The goal of the Liquid Metal project is to bring the dynamism and flexibility of JIT-compiled languages to heterogeneous platforms like CPU/FPGA systems. It consists of a language (Lime) and a compilation and run-time system which partitions the program between the virtual machine on the CPU and the FPGA. In this paper we describe the overall system design and present initial results of compilation of object-oriented languages features into hardware. |
|
|
Flexible Task Graphs: A Unified Restricted Thread Programming Model for Java
Flexotasks provide a uniform programming model for building real-time and streaming systems out of single-threaded tasks running in isolated memory regions. It supports Exotasks, Eventrons, Reflexes, and Streamflexes in a single compositional system. |
|
|
Languages and Performance Engineering: Method, Instrumentation, and Pedagogy
|
|
|
Design and Implementation of a Comprehensive Real-time Java Virtual Machine
Describes the design and implementation of a production real-time Java virtual machine that incorporates Metronome garbage collection, ahead-of-time compilation, class pre-loading, and a complete implementation of the RTSJ specification. |
|
|
Real-time Music Synthesis in Java using the Metronome Garbage Collector
Describes the design and evaluation of a real-time MIDI synthesizer written entirely in Java running on top a real-time JVM with Metronome garbage collection. We achieve latency and jitter characteristics comparable to a Kurzweil K2000R synthesizer. |
|
|
Generational Real-time Garbage Collection: A Three Part Invention for Young Objects
A generational real-time collector, based on Metronome, that uses a tri-partite nursery to enable real-time incremental nursery collection while previous nurseries are reclaimed. |
|
|
CGCExplorer: A Semi-automated Search Procedure for Provably Correct Concurrent Collectors
A semi-automated procedure for generating whole families of concurrent garbage collection algorithms using model checking and a sound heap abstraction. |
|
|
The ExoVM System for Automatic VM and Application Reduction
ExoVM allows a Java application to be shrink-wrapped together with a virtual machine customized for the application, allowing standard Java programs to compile to extremely small sizes suitable for mote-class devices. |
|
|
Java Takes Flight: Time-portable Real-time Programming with Exotasks
Exotasks provide a functional data-flow abstraction combined with logical-execution time scheduling inside of Java. We show how this allows for the creation of time-portable real-time systems by using them to build the control software for a custom-built helicopter called the JAviator. |
|
|
Eventrons: A Safe Programming Construct for High-Frequency Hard Real-Time Applications
Eventrons are a safe subset of Java that are validated before execution using data-sensitive analysis. Eventrons run at frequencies up to 22 KHz (45 microsecond periods) achieving the same latency and jitter performance as programs written in C. |
|
|
Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms
Shows how complex concurrent collectors can be derived from a single, simple, inefficient, but highly precise algorithm using a set of composable, performance-improving, correctness-preserving, and precision-reducing transformations. |
|
|
Braids and Fibers: Language Constructs with Architectural Support for Adaptive Response to Memory Latencies
Braids and Fibers are high level constructs for user-level programming of of threads that are significantly lightweight to respond to cache misses. Hardware support in the form of split-phase loads and stores, and a hardware/software handshake for completed split-phase operations is described, and the compilation of high-level code to the extended instruction set is described. |
|
|
On-line Visualization and Analysis of Real-time Systems with TuningFork
TuningFork is a modular, pluggable on-line visualization system designed specifically for real-time systems. This paper describes the architecture, visualizations, and application to the Metronome real-time garbage collector. |
|
|
Derivation and Evaluation of Concurrent Collectors
Presents a new abstract, generalized, and more accurate algorithm for concurrent garbage collection, shows how both pre-existing and new concrete algorithms can be derived, and studies the relative performance of four algorithms. |
|
|
Syncopation: Generational Real-time Garbage Collection in the Metronome
Shows how two new techniques, syncopation and arraylet pre-tenuring can be combined with an over-clocked scheduler to extend the benefits of generational systems to real-time garbage collection. |
|
|
Efficient On-the-Fly Cycle Collection
A number of improvements to the Recycler algorithm greatly reduce the load on the cycle collector and yield a corresponding increase in performance. |
|
|
A Unified Theory of Garbage Collection
Tracing and reference counting garbage collection are shown to be duals of one another, and all high-performance garbage collectors are shown to be hybrids of tracing and reference counting. A uniform cost model for comparing space and time requirements is provided. |
|
|
The Virtualized Virtual Machine
Position paper describing a vision for future virtual machines, in which all components, from the run-time system down to the hardware instruction set, are virtualized and dynamically generated. |
|
|
Write Barrier Elision for Concurrent Garbage Collectors
A limit study of the opportunities for eliminating write barriers in concurrent garbage collectors, showing that in many cases well over 90% of write barriers are redundant. |
|
|
Dynamic Selection of Application-Specific Garbage Collectors
Shows how a virtual machine can dynamically switch between diverse garbage collectors to optimize performance as the application mix and available resources change. |
|
|
Braids and Fibers: Language Constructs with Architectural Support for Adaptive Response to Memory Latencies
Braids and Fibers are high level constructs for user-level programming of of threads that are significantly lightweight to respond to cache misses. Hardware support in the form of split-phase loads and stores, and a hardware/software handshake for completed split-phase operations is described, and the compilation of high-level code to the extended instruction set is described. |
|
|
Garbage Collection for Embedded Systems
Describes two different garbage collector implementations for a J2ME virtual machine (mark-compact and paged mark-sweep), along with some novel techniques for reducing per-object overheads. Several versions of each collector are tested using the EEMBC embedded systems benchmark suite, and most of them perform well with only 10% space overhead. |
|
|
The Metronome: A Simpler Approach to Garbage Collection in Real-time Systems
A position paper that shows how true real-time garbage collection, as we have implemented in the Metronome, can greatly simplify the programmer interface for real-time systems over that provided by environments such as the Real-Time Specification for Java (RTSJ). |
|
|
MJ: A Rational Module System for Java and its Applications
Describes a module system for Java that is compatible with the existing Java language but provides significantly improved support for building large, robust, long-lived systems out of modular components. We implemented MJ and converted the Tomcat web application server from using classloaders to using about 30 MJ modules. The resulting system is much easier to install and maintain, and also achieves a 30% speedup. |
|
|
Controlling Fragmentation and Space Consumption in the Metronome, a Real-time Garbage Collector for Java
Describes the Metronome real-time garbage collector's mechanisms for limiting fragmentation and the resulting wasted space. The application is characterized in terms of a fragmentation factor λ. For real-world applications λ is very small and the collector only needs to copy a very small number of objects to limit fragmentation. |
|
|
A Real-time Garbage Collector with Low Overhead and Consistent Utilization
Describes a real-time garbage collector for uniprocessors, implemented for Java in the Jikes RVM virtual machine. The collector makes use of low-overhead read barriers (4%), and is mostly non-copying. Pause times are low and utilization meets the target within a small range of variation. |
|
|
Space- and Time-Efficient Implementation of the Java Object Model
Most implementations of Java use two or three word object headers. We show that there are a variety of ways to represent this information using a single header word without any appreciable run-time performance penalty, while reducing memory consumption by 12%. We also show how this can be implemented in the IBM Jikes RVM as a pluggable module, thereby making the object model more well-documented, flexible, and amenable to experimentation. |
|
|
Kava: A Java Dialect with a Uniform Object Model for Lightweight Classes
Kava is a backward-compatible extension to Java that allows a single object model to be used consistently from the bit level on up, combining the abstraction facilities of high-level object-oriented programming languages with the ability to create highly efficient value types. |
|
|
A Comparative Evaluation of Parallel Garbage Collectors
Describes a suite of garbage collectors we implemented in the IBM Jalapeño Java Virtual Machine, and quantitatively evaluates the relative performance of the different collectors. With large amounts of available memory, a generational semi-space copying collector performs best. But a hybrid collector that uses a copying semi-space for the young generation and a mark-and-sweep collector for the old generation can run at close to the same speed in half the memory of other collectors, thereby doubling the potential transaction throughput. |
|
|
Java without the Coffee Breaks: A Non-intrusive Multiprocessor Garbage
Collector
The Recycler is a concurrent multiprocessor garbage collector with extremely low pause times (maximum of 6 milliseconds over eight benchmarks) while remaining competitive with the best throughput-oriented collectors in end-to-end execution times. This paper describes the overall architecture of the Recycler, including its use of reference counting and concurrent cycle collection, and presents extensive measurements of the system comparing it to a parallel, stop-the-world mark-and-sweep collector. |
|
|
Concurrent Cycle Collection in Reference Counted Systems
This paper describes in detail the concurrent cycle collection algorithm employed in the Recycler (see above). It includes both detailed pseudo-code and a proof of correctness. Measurements show that cycle collection can be highly effective for garbage collection, and often exhibits better locality properties than mark-and-sweep collectors. |
|
|
Kava: A Java Dialect with a Uniform Object Model for Lightweight Classes
Kava is a backward-compatible extension to Java that allows a single object model to be used consistently from the bit level on up, combining the abstraction facilities of high-level object-oriented programming languages with the ability to create highly efficient value types. |
|
|
Guava: A Dialect of Java without Data Races
Guava is a dialect of Java that provides true monitors: mutual exclusion of access to shared data is guaranteed at compile-time. This frees the programmer from trying to understand complexities of the underlying memory model, while also allowing efficient compilation to weakly ordered multiprocessors. |
|
|
Thin Locks: Featherweight Synchronization for Java
High performance synchronization for Java, now incorporated into most of IBM's Java virtual machines. When implemented in the JDK, mean application speedup was 1.22, maximum speedup was 1.7. Multiprocessor scalability also improved drammatically. |
|
|
Fast Static Analysis of C++ Virtual Function Calls
|
|
Describes and evaluates Rapid Type Analysis, an algorithm that resolves 71% of the dynamic virtual function calls in a suite of seven C++ benchmark programs of significant size. Rapid Type Analysis also reduces compiled program size by 25%, and can be used to help the programmer understand his or her C++ program more easily. |
|
|
|
Compiler Transformations for High-Performance Computing
An encyclopedic summary of the state of the art (as of 1993) in optimizing compiler transformations for superscalar, vector, and multiprocessor computers. |
|
|
A Compiler Framework for Restructuring Data Declarations to Enhance Cache
and TLB Effectiveness
An algorithm for inter- and intra-array padding that reduces cache and TLB conflicts. The algorithm also minimizes cache jamming, a new performance bottleneck we identified in CPU's capable of issuing multiple outstanding loads. |
|
|
Optimistic Parallelization of Communicating Sequential Processes
A method for optimistically parallelizing any sequential computation in a distributed system when the result of the first part of the compuation can be guessed with reasonably high probability. |
|
|
Hardware-Assisted Replay of Multiprocessor Programs
Design and simulation results for a bus-monitoring device that logs memory transactions to allow deterministic replay of parallel programs on a shared-memory multiprocessor. |
|
|
File System Measurements and their Application to the Design of Efficient
Operation Logging Algorithms
Demonstrates that deterministic replay can be achieved by only logging 1% of all file system operations, by using the volatile logging technique. |
|
|
High-Level Language Support for Programming Distributed Systems
Concert is a system that adds a process model (derived from Hermes) to languages like C, C++, and PL/I. |
|
|
NEST: A Network Simulation and Prototyping Testbed
NEST is a network simulation tool in wide use at many industry and academic sites around the world. Includes material published previously in the Proceedings of the 1989 Winter Simulation Conference and the Winter 1988 Usenix Technical Conference. |
|
|
Transparent Recovery in Distributed Systems
The case for using transparent recovery techniques like optimistic recovery instead of transactions. |
|
|
A Portable Run-Time System for the Hermes Distributed Programming
Language
Describes the portable run-time system for the Hermes distributed programming language. |
|
|
Transparent Recovery of Mach Applications
An implementation of optimistic techniques on Mach processes. |
|
|
NEST: A Network Simulation and Prototyping Testbed
The original paper on NEST. |
|
|
Volatile Logging in n-Fault-Tolerant Distributed Systems
Deterministic replay requires logging of non-deterministic events. But many events are actually determined by other inputs, and therefore can be logged "volatilely" by using the replay capability of other processes in the system. |
|
|
A Recoverable Object Store
Application of optimistic recovery techniques to a persistent object store. |
|
|
Toward Self-Recovering Operating Systems
How to integrate recovery as a fundamental operating system primitive. |
