IBM Journal of Research and Development
IBM Skip to main content
  Home     Products & services     Support & downloads     My account  

  Select a country  
Journals Home  
  Systems Journal  
Journal of Research
and Development
    Current Issue  
    Recent Issues  
    Papers in Progress  
    Search/Index  
    Orders  
    Description  
    Patents  
    Recent publications  
    Author's Guide  
  Staff  
  Contact Us  
  Related links:  
     IBM Research  

IBM Journal of Research and Development  
Volume 34, Number 1, Page 59 (1990)
IBM RISC System/6000 processor
  Full article: arrowPDF   arrowCopyright info





   

Design of the IBM RISC System/6000 floating-point execution unit

by R. K. Montoye, E. Hokenek, S. L. Runyon
The IBM RISC System/6000® (RS/6000) floating-point unit (FPU) exemplifies a second-generation RISC CPU architecture and an implementation which greatly increases floating-point performance and accuracy. The key feature of the FPU is a unified floating-point multiply-add-fused unit (MAF) which performs the accumulate operation (A times B) + C as an indivisible operation. This single functional unit reduces the latency for chained floating-point operations, as well as rounding errors and chip busing. It also reduces the number of adders/normalizers by combining the addition required for fast multiplication with accumulation. The MAF unit is made practical by a unique fast-shifter, which eases the overlap of multiplication and addition, and a leading-zero/one anticipator, which eases overlap of normalization and addition. The accumulate instruction required by this architecture reduces the instruction path length by combining two instructions into one. Additionally, the RS/6000 FPU is tightly coupled to the rest of the CPU, unlike typical floating-point coprocessor chips. As a result, floating-point and fixed-point instructions can be executed simultaneously. Load/store operations are performed using register renaming and store buffering to allow completely independent operation of load/store with arithmetic operations. Thus, data-cache accesses can occur in parallel with independent arithmetic operations. This unit attains a peak execution rate of 50 MFLOPS with a 25-MHz clock frequency and is capable of sustaining nearly that rate in complex programs such as graphics and Livermore loops.
Related Subjects: Computer organization and design; Reduced-instruction-set computers (RISC)