Despite its many advantages, the Java programming language has been faulted for laggardly performance.
But help is on the way.
In Brief:
New compilers, code optimizers, debugging tools and other technologies developed at IBM Research are close to making criticism of Java performance a thing of the past. Some fruits of this research are already
making their way into products, all of them Sun compliant.
In the three years since the Java programming language sprang forth, not quite fully formed, it has made extraordinarily rapid progress and has been widely embraced as the long-sought lingua franca of computing. Yet, for all its virtues, Java has yet to silence its critics in regard to two key points: Java programs usually run much slower than programs written in other computer languages, and the promise of "write once, run anywhere" sometimes works better in theory than in practice. IBM researchers on three continents are working closely with IBM product teams and partners in the industry to solve both of these problems.
Not by coincidence, both problems relate to the unusual nature of Java. On one level, Java is simply another programming language, like C, C++, BASIC, FORTRAN or COBOL. Developers can use Java to write programs that run on a specific platform (microprocessor architecture and operating system), just as they always have with other languages.
What makes Java special, and controversial, is that it's not just a language. It's also a platform -- a "virtual platform" -- that isn't chained to a particular microprocessor or operating system, as so-called native platforms are. Developers don't have to rewrite or even recompile their Java programs to achieve compatibility with a wide range of microprocessors and operating systems. That can save developers lots of work, and it gives customers unprecedented freedom to choose the native platforms they want.
This unique quality is why Java is vital to IBM.
Unlike most companies, IBM offers products for many different platforms, including Windows®, OS/2®, AIX®, AS/400® and S/390®. A virtual platform that bridges them all has obvious value for IBM and software vendors. It's great for customers, too: they can freely migrate their applications from one native platform to another as their businesses grow.
But there's a downside: Java requires an extra layer of software, called the Java virtual machine (JVM), to make all native platforms look the same to Java. The overhead of the first JVMs could slow down programs by a factor of 10 or more, while other problems can cause Java programs to run inconsistently on different systems.
Fortunately, Java is maturing fast, thanks in large part to efforts at IBM Research to boost performance.
IMPROVING THE VIRTUAL MACHINE
The most obvious target for improvement is the JVM. It sits between Java programs and the native operating system (see "Java Architecture," above), and it handles four critical functions: garbage collection, multithreading, run-time exceptions, and bytecode interpreting.
"Garbage collection" is a colorful term for Java's automatic memory management. In other languages, such as C++, programmers are responsible for allocating the memory their programs need and for releasing that memory back to the system later. A common bug in C++ programs is a "memory leak": the program neglects to release memory it no longer needs, so the system appears to gradually lose memory as the program runs. Eventually, the system may run out of memory and crash.
Java programmers have it easier. A JVM "garbage collector" regularly makes its rounds, checking the allocated memory. Usually the memory belongs to objects -- a block of memory that contains data and the program routines that operate on the data. But memory that no longer belongs to any object -- "garbage" -- is automatically released back to the system. At IBM's Haifa Research Laboratory in Israel,
a team has designed better garbage collectors for AS/400 and S/390 systems. A version for AIX is under development.
"Many JVM implementations contain what we call a 'stop-the-world' garbage collector," says Haifa's Hillel Kolodner, "because it stops everything while it's
freeing memory." In contrast, the concurrent, on-the-fly garbage collector that
Kolodner and his team designed for servers runs in the background, while the Java program threads continue uninterrupted in the foreground. "The normal stop-the-world garbage collector is not very scalable for multithreaded programs running on systems with multiple processors," says Kolodner. "With ours, we can use all the processors all the time." When running an internal IBM benchmark test on an S/390 system with eight processors, he notes, a JVM with the Haifa garbage collector ran twice as fast as a regular JVM.
Even with Java's garbage collector, it's still possible to write a program that
leaves unused objects lying around. If the collector can't be sure the program
won't use a particular object again, it can't dispose of the object. That's one reason
a team at IBM's Thomas J. Watson Research Center is developing a tool known as Jinsight.
DEBUGGING APPLICATIONS
Jinsight is the outgrowth of six years' research into object-oriented program visualization that started with C++ and Smalltalk (see "Programming with Objects Is Getting Easier,"
Research, Number 2, 1995.). About two years ago, the team turned its attention to Java. Jinsight allows developers to view and analyze their code while a program is running, so they can identify bottlenecks, track memory usage, weed out redundant code and eliminate other performance-sapping flaws. Similar profiling tools are available, but Jinsight goes further, says team leader Wim De Pauw: "Other tools tell you there's a problem, but not necessarily where it is. So you have to guess, just as you might infer that the roof is leaking when you find water in your basement. Jinsight shows you exactly where the roof is leaking."
De Pauw's team recently used Jinsight to fix a nagging problem in one of IBM's own Java products. The product shipped on time. Jinsight also found some redundant code in a client-server application that manipulates large database tables. Jinsight is available
on ® (http://www.alphaWorks.ibm.com),
an IBM Web site that makes leading technologies
available for preview to the Internet community. Jinsight is also included as a technology preview with VisualAge® for Java 2.0, Enterprise Edition, a development tool for Java programmers.
Another debugging tool at Watson is code-named Deja Vu (Deterministic Java Replay Utility). Deja Vu helps programmers find bugs in multithreaded Java programs -- which encourages them to use more multithreading, which leads to better performance.
Multithreading is like multitasking, except it involves a single program instead of multiple programs. For instance, a multithreaded program could use one thread of execution to load a large file while simultaneously using another thread to draw graphics on the screen.
While threading makes programs more responsive, it also makes them more difficult to debug, explains Jong-Deok Choi. "In the case of multithreaded Java programs," he says, "bugs can occur in different ways each time the program is executed, depending on the timing of the threads."
Deja Vu guarantees that a program will run the same way every time. In record mode, it captures the timing of a running program. In replay mode, it duplicates that timing, over and over again. Choi points out that conventional debuggers can themselves alter a Java program's timing, whereas Deja Vu has little effect on the running of a program.
Managing threads can be challenging. For example, if two or more threads try to manipulate the same data at the same time, the results can range from odd to disastrous, as when two threads in an accounting program both debit the same account before adjusting the balance, causing an overdraft.
To prevent thread conflicts, Java has a feature called synchronization. While one thread is manipulating critical data, it "locks" that part of the program to prevent other threads from manipulating the same data until it has finished. But synchronization adds considerable overhead to the JVM; it causes programs to waste time checking for competing threads even when there aren't any, according to Watson researcher David F. Bacon.
Bacon and his colleagues have found a faster way to synchronize threads by drastically reducing the number of processor instructions required and by changing the structure of Java objects. Their modified JVMs run 22 percent faster on average and twice as fast under best-case conditions.
Yet another critical function of a JVM is exception handling. Exceptions are errors detected at run time. Java has a safety mechanism for handling those errors before they can cause serious trouble. With C++, a simple arithmetic error can cause a program to overwrite memory that it is using. In contrast, Java's exception-handling mechanism will block a program from accessing out-of-bounds memory and allow the program to recover more gracefully.
Exception handling has a price, however: slower execution. That's one reason critics say Java isn't suitable for scientific and technical computing that demands high performance. To prove the critics wrong, a small team at Watson is working on a project code-named Ninja (Numerically Intensive Java).
One Ninja technique provides a program that
creates "safe regions" where no out-of-bounds
exceptions can occur. These safe regions can be
aggressively optimized by compilers. Another technique exploits the special math operations found in some microprocessors, such as the FMA (fused multiply-add) instruction in IBM's PowerPC®, POWER2 and POWER3 chips.
"By applying these and other techniques, Ninja-powered Java programs achieved 80 to 100 percent of the performance of native FORTRAN programs when running a suite of six scientific benchmarks," says team member Jose Moreira. Some techniques, such as using the FMA instruction, will require changes to Sun's Java standard. Those changes already are under way. Eventually, IBM will incorporate Ninja techniques into JVMs, compilers and development tools.
MAKING BYTECODE FASTER
Bytecode is the intermediate form of Java that runs on multiple platforms. Programmers write their programs in Java source code, then use a compiler to convert source code into bytecode. At run time, the JVM has to execute the bytecode on the native platform. With most other programming languages, a compiler converts the source code directly into binary code, so no further translation is necessary at run time. Depending on the program, bytecode interpreting can be 10 to 100 times slower than binary execution.
One approach is to improve the bytecode. That's the aim of a project code-named JAX at Watson, which, like Jinsight, is available on the alphaWorks Web site. JAX is a tool that starts with compiled bytecode, applies numerous optimizations, and outputs better bytecode. The end product is still compatible with all JVMs. "Among other things," says the group's manager, John Field, "JAX compresses text strings, eliminates methods [Java program routines] that a program never uses, and combines some classes into single classes." By merging some smaller classes into larger classes, for example, JAX makes a program load and run faster because it's more compact and efficient.
Optimized bytecode still must be interpreted at run time -- that is, translated into actual machine instructions. Bypassing the interpreter altogether, therefore, can yield even greater performance. Researchers are pursuing three approaches to achieve that aim: just-in-time (JIT) compilers, dynamic compilers and static native compilers.
As the name implies, a JIT compiler jumps in and translates all or part of the bytecode when a program first launches, rather than interpreting the bytecode one instruction at a time while the program runs. Often, the program takes longer to start, but after that it runs much faster.
At IBM's Tokyo Research Laboratory, a team led by Toshio Nakatani keeps developing better and better JIT compilers. JIT 1.0 was developed in six months and released shortly after IBM issued the first JVM. JIT 2.0 was twice as fast as 1.0, and 3.0 is three times as fast as 1.0. JIT 3.0 is now 10 to 100 times faster than a typical interpreter, depending on the type of code executed. In October, it achieved the highest score in the JVM category from SPEC, the performance evaluation company. "Our JIT compiler is already faster than static Java compilers for many programs," Nakatani says.
The latest version ships with most IBM platforms, including NetStation, JavaOS, W32, OS/2®, AIX® and OS/390®. To accommodate the memory constraints of NCs, the latest JIT compiler has a sophisticated buffer control system that limits the amount of memory it consumes. "To help JIT generate efficient code, we even made substantial changes to the Sun JVM," Nakatani adds.
It's not always desirable for a JIT compiler to translate a whole program at once, especially if the program is large. IBM's JIT compiler quickly decides which code would benefit the most from compilation. "This depends on letting the compiler know which methods will be frequently used, and which should just be interpreted," explains Nakatani.
An extension of the JIT concept is dynamic compilation: compiling and sometimes recompiling parts of a program while it's running, based on actual usage patterns. A Watson project called Jalapeño, led by Mark Mergen and Vivek Sarkar, is experimenting with dynamic compilation for Java in the context of a research JVM that is itself being implemented in Java.
In a JVM with dynamic compilation, a basic compiler translates Java methods the first time they are called. Later, a real-time profiler known as the controller decides whether to dynamically recompile the method with optimization, based on how the program is being used. For example, an accounting program might frequently use a method that computes account balances. By estimating how much time it would take to optimize the method, the controller figures out if it's worth dynamically recompiling the method with optimization. If the optimization would pay off, the controller invokes the optimizing compiler and gives it a "time budget" in which to do the job. The optimizing compiler caches the recompiled method in memory for later use.
The Jalapeño project has been under way less than a year, but Sarkar says that the optimizing dynamic compiler already generates code that runs 10 to 100 times faster than bytecode interpreters. Past research has shown the potential for gaining significant performance improvements through optimized dynamic compilation by generating code that is specialized to the program's run-time behavior. The goal of the Jalapeño dynamic optimizing compiler is to tap this potential in server applications that run on multiple layers of Java code.
The third, more conventional approach to Java compilation is a static native compiler, which directly translates bytecode into native binary code before the program runs. One obvious drawback is that statically compiled Java programs are no longer compatible with different platforms -- they're specific to the native platform the compiler targets. But it doesn't necessarily break the promise of "write once, run anywhere." The program still exists as bytecode. For some applications, such as large programs installed on servers, which are not likely to be migrated, static compilation makes sense.
A team at Watson led by Mark Mergen started working on IBM's High Performance Compiler for Java (HPCJ) in 1995, building on previous work with other languages. Over time, the team began working in parallel with an IBM product group in Toronto, and last year the researchers turned over the static compiler to the Toronto development team. HPCJ is now shipping with VisualAge for Java.
HPCJ accelerates Java faster than anything else, says Mergen. "Depending on the program, we can already run Java as fast as equivalent programs written in C." An example of a program especially apt to benefit from static compilation would be a server-based business program without a graphical user interface. However, Mergen thinks it's possible that JIT and dynamic compilers may someday eclipse static compilers. Compilers that work at run time can adapt to actual usage patterns, while static compilers cannot.
Another team at Watson found a way to provide high-performance access to record-oriented files on OS/390 systems. Many OS/390 customers have a significant amount of data in record-oriented file systems such as VSAM (virtual sequential access method), and they need to access that data from new Java applications. In addition to making heritage data accessible via Java interfaces, enabling application portability was desirable since the paradigm of record-oriented access has performance potential on other platforms.
A team led by Michael Wright wrote a set of Java classes known internally as Java Record I/O (JRIO). The JRIO object model provides portable access to record-oriented files on any Java platform. In addition, the object model provides high performance access to OS/390 record-oriented files by exploiting access methods which are only accessible via Java native methods; in addition to faster access, the OS/390 version supports files larger than virtual memory.
Without a reliable way to measure performance, these projects may produce less than optimal results. That's why yet another Watson team is developing a benchmark suite. The Java Server Performance (JASPER) team is part of a corporation-wide effort to improve Java performance across multiple platforms. "Most of the Java benchmarking work that's been done so far is on the client side, because that's where Java started," says senior manager Bill Tetzlaff. "But IBM is primarily a server company."
The JASPER team focuses on the JVM and related software -- such as the Java Database Connectivity (JDBC) layer that interacts with databases. "The benchmarks evaluate garbage collection, thread synchronization, memory bandwidth and many other aspects of performance," says manager Sandra Baylor. IBM is working with other companies and with industry committees to help develop a formal Java benchmark standard. Baylor's group has already identified bottlenecks in various JVMs, paving the way for future optimizations. For instance, they helped the Ninja project (described above) find ways to speed up matrix multiplication, a critical operation in scientific and technical computing.
FAST ENOUGH
A vast amount of research all over the world is aggressively pushing the boundaries of Java performance. For most business applications, Java will soon come so close to native code that the gap will no longer be an issue. "Thanks to the work of our Java technologists worldwide," says Jim Russell, senior manager of Java technology and applications at Watson, "Research is playing a key role in IBM's effort to make Java real for business." When you consider Java's other advantages -- cross-platform compatibility, higher programmer productivity, superior error handling, automatic memory management, easy networking, and broad industry support -- the result looks like a winning combination.
Tom R. Halfhill, a former senior editor of Byte, lives near San Francisco.