IBM®
Skip to main content
    Country/region change    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Think Research


 


Featured Concept
Compressed memory

By Gary Dagastine

A Computer's RAM Capacity Can Now Be Doubled, Practically for Free


Raw processing speed, often measured in millions of instructions per second (MIPS), is not always the best indicator of how fast a computer can accomplish specific tasks. Just as a Ferrari can go no faster in a traffic jam than can a Mack truck, so a processor constrained by the lack of a steady stream of data from the computer's main memory cannot perform at top speed. As software becomes more complex and more data intensive, processing speed is paramount. Which means that the demand for memory is mounting rapidly.

The benefits of added memory can be dramatic: the performance of most commercial database and computer-aided-design software can improve as much as 100 percent when the system memory — consisting of dynamic random-access memory (DRAM) chips — is doubled. By the same token, inadequate memory not only can impair performance but can jeopardize reliability and lead to system crashes and loss of business. Unfortunately, adding memory is costly. For large data centers, server farms and e-businesses, which may employ hundreds or thousands of machines, beefing up the memory in each server can run into the millions of dollars.

In late June, researchers at IBM's Thomas J. Watson Research Center announced a fundamental breakthrough in computer design that does the seemingly impossible: it allows twice as much information to be stored in a given amount of memory, with no compromises in computer performance or in data integrity. For an insignificant added cost, users can double the installed memory to increase performance. Or they can buy additional memory at half the cost they would have paid in the past. The heart of the technology is a novel, hardware-based approach to compressing and storing data.

Although the new IBM Memory eXpansion Technology (MXT) is applicable to many computers, it will be implemented first in IBM's Netfinity ® line of servers that run Windows NT ® or Linux ® and that have one to four processors and up to 16 gigabytes (GB) of main memory. These popular servers form the backbone of the Internet and of many departmental and enterprise computing systems. The large amounts of main memory used by these models represent a significant portion of their manufacturing cost. For example, for certain IBM Netfinity servers, in which the new technology will appear first, DRAM chips can constitute more than 80 percent of the manufacturing cost.

Customers might have been stuck with the choice between spending more on memory or sacrificing performance had not Peter Franaszek, manager of systems theory and analysis at Watson, decided to revisit the question of memory compression. In the mid-1990s, he began to wonder whether it might finally be possible to apply compression techniques to squeeze more data into main memory in a way that didn't slow down the computer or damage the data.

Data compression itself is a well-known process that uses special coding techniques, or algorithms, to represent data in such a way that it can be stored more compactly. That is almost always possible, because some letters or phrases occur more frequently than others. By choosing a scheme that assigns shorter encodings to the more frequent elements of the data and longer encodings to the less frequent ones, the overall length of the encoded data will take up less space than the original.

But, while compression techniques have long been used to store data on disks or to transmit large amounts of information such as image or video efficiently, all previous attempts to build workable compression systems for main memory have failed. One reason has always been the added time required to find the compressed data and decompress it. That's not a problem when it comes to compressing data on a disk, because the decompression process is faster than the disk access time. But main memory is much faster than a disk, so conventional approaches cannot keep up.

A key requirement for main memory compression, which is not necessary for speech or images, is that the original data be recoverable exactly when it is decompressed. A widely used family of compression techniques with this property is usually named after the computer scientists Abraham Lempel and Jacob Ziv, who derived important algorithms based on them in the 1970s. The techniques, of which there are actually some earlier examples, operate by replacing strings or phrases in the data with pointers to entries in a dictionary. The dictionary can be constructed either ahead of time, or at the time of compression. In the latter case, part of the data currently being compressed may form the dictionary. For example, the word "dictionary" in the previous sentence could be replaced by a pointer to its earlier occurrence.

Lempel-Ziv compression is efficient and very fast. "Unfortunately," says Franaszek, "it simply was not fast enough." He conjectured, however, that by generalizing the algorithms to permit the use of multiple hardware engines operating in parallel, the speed problem could be overcome. As a result, he says, "We developed algorithms in which each engine operates on a fraction of the data block to be compressed, but the engines share information on their respective subblocks. This yields efficient compression, with a parallel speedup. Using four engines, for example, speeds up the compression by a factor of four." Franaszek and his team also developed techniques that permit highly efficient storage and very fast retrieval of compressed data.

COMBINING FORCES

Independently, Basil Smith, manager of advanced commercial platforms at Watson, and Brett Tremaine, a senior technical staff member, had been developing various hardware approaches to improved memory systems since the mid-1980s. "In 1996, we learned about Peter Franaszek's ideas for compression," says Tremaine. "He had a great algorithm, but needed a suitable hardware architecture; we had a great architecture, but were looking for an innovative algorithm. Soon afterward, Peter and his team joined ours, so that we could build a complete system."

It was a fortunate step. "What could have been a long and fruitless interdisciplinary research problem turned into a very productive collaboration," says Smith, who had overall responsibility for the project. Working together, the group created a solution to memory compression centered on a novel, single-chip memory controller. While such a component is a standard piece of computer hardware — its role is to regulate and direct both the processors and the input-output devices — the new memory controller is unique in several respects.

First, it uses a relatively large, 32 megabyte cache, a form of very fast memory in which a bit or group of bits can be read out or written quickly, allowing the controller to keep up with the demands of the processors. The cache, which is shared by all the processors, is attached to the controller chip. It is the only memory the processors can access directly. Because the cache is smaller than the main memory, data must be continually shuttled between the two, while being compressed and decompressed on the fly. That process requires keeping track of what information is in the cache and what is in memory, a task handled by high-speed directory memory. The compression algorithm itself is implemented by dedicated circuitry, yet another innovative feature.

Finally, the chip contains a novel virtual memory manager to help ensure that the compressed data is stored as efficiently as possible. Unlike most computer systems, which store each byte of data at a fixed memory address, the new memory-compression technology views main memory as having physical sectors. "It compresses 1,024 bytes of information at a time and then stores it in up to four contiguous 256-byte sectors," says Franaszek. "The number of sectors per block varies because some data compresses easily and some doesn't. Any unused sectors are immediately assigned to a central pool of available sectors."

The memory controller can compress and decompress data at a rate of 2 gigabits per second. That is twice as fast as the processors can absorb the data, and the compression scheme is scalable, so that it will be able to maintain that ratio as processors get faster. However, there is one source of unavoidable delay in the system: the lag time between requesting and receiving compressed data. "That delay can extend to hundreds of processor cycles, which is unacceptably long," says Tremaine, who served as the project's lead architect. "But roughly 98 percent of the time, the needed data has already been decompressed and transferred to the cache, eliminating the delay. Although in the remaining cases it takes an average 70 cycles to retrieve the data, the impact on system performance is minimal because it happens only 2 percent of the time."

Squeezing all the functions of the controller onto one chip was a key requirement. "Without the ability to fabricate such a densely packed chip, we would have had to use multiple chips, and the system would probably not have been economically feasible," says Tremaine. But the needed technology was available in the form of a new fabrication method that produces silicon chips with quarter-micron features. "It is one of the most advanced processes available commercially," says Tremaine, "and it enabled us to fabricate a single memory controller chip with more than 5 million transistors." The shared cache also benefited from a new kind of chip, the "double data rate" synchronous DRAM. "Because this memory chip was so new," Tremaine adds, "we had to work with industry standard groups to ensure the devices would work in our application."

The development of the chip was not solely an IBM affair. A partnership with ServerWorks, a startup in Santa Clara, California, greatly speeded up the chip's development. The chip is now marketed as the Champion North Bridge chip, ServerWorks' flagship product.


Gary Dagastine is a freelance writer who lives in Niskayuna, New York.



    About IBMPrivacyContact