R.M. Tomasulo
IBM Journal of Research and Development
January 1967, pp. 25-33
© IBM 1967
Abstract
This paper describes the methods employed in the floating point area of the System/360 Model 91 to exploit the existence of multiple execution units. Basic to these techniques is a simple common data busing and register tagging scheme which permits simultaneous execution of the independent instructions while preserving the essential dependences inherent in the instruction stream. The common data bus improves performance by efficiently utilizing the execution units without requiring specially optimized code. Instead, the hardware, by "looking ahead" by about eight instructions, automatically optimizes the program on a local basis.
The application of these techniques is not limited to floating point arithmetic or System/360 architecture. It may be used in almost any computer having multiple execution units and one or more "accumulators." Both of the execution units, as well as the associated storage buffers, multiple accumulators and input/output buffers, are extensively checked.
The GF11 Parallel Computer
J. Beetem, M. Denneau, and D. Weingarten
Experimental Parallel Computing Architectures
(edited by J.J. Dongarra)
North Holland, Amsterdam, 1987.
Abstract
GF11 is a parallel computer currently nearing completion at the IBM Yorktown Research Center. The machine will have a peak arithmetic rate of 11.4 Gflops and a total memory of 1.14 GBytes. The computational power and memory are uniformly distributed among 566 floating point processors which communicate through a switching network. At each machine cycle, any of the 1024 preselected permutations of data can be realized among the processors. The main intended application of GF11 is a class of computations arising from quantum chromodynamics, a proposed theory of particles which participate in nuclear interactions.
The Research Parallel Processor Prototype (RP3): Introduction and Architecture
G.F. Pfister, W.C. Brantley, D.A. George, S.L. Harvey, W.J. Kleinfelder, K.P. McAuliffe, E.A. Melton, V.A. Norton, and J. Weiss.
Proceedings of the 1985 International Conference on Parallel Processing
August 1985, pp. 764-771.
© IEEE 1985
Abstract
As a research effort to investigate both hardware and software aspects of highly parallel computation, the Research Parallel Processor Prototype Project (RP3) has been initiated in the IBM Research Division, in cooperation with the Ultracomputer Project of NYU. The RP3 machine being designed is a highly parallel MIMD design with a uniquely flexible organization encompassing both shared memory paradigms and local memory message-passing paradigms, as well as mixtures of the two chosen at run time. It is being designed to accommodate 512 state-of-the-art microprocessors. A full configuration will provide up to 1.3 GIPS, 800 MFLOPS, 1-2 Gbytes of main storage, 192 Mbytes/second I/O rate, and 13 Gbytes/second inter-processor communication. Performance evaluations indicate that approximately 1 GIPS performance should be sustainable. This paper is intended to be the first of a set of papers describing the RP3 and the performance analysis on which its design is based.
The 801 Minicomputer
G. Radin
Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems
pp. 39-47, March 1-3, 1982, Palo Alto, California. ACM Press, 1982, SIGARCH Computer Architecture News 10(2), SIGPLAN Notices 17(4)
© ACM 1982
Abstract
This paper provides an overview of an experimental system developed at the IBM T.J. Watson Research Center. It consists of a running hardware prototype, a control program and an optimizing compiler. The basic concepts underlying the system are discussed as are the performance characteristics of the prototype. In particular, three principles are examined:
- system orientation towards the pervasive use of high level language programming and a sophisticated compiler.
- a primitive instruction set which can be completely hard-wired.
- storage hierarchy and I/O organization to enable the CPU to execute an instruction at almost every cycle.
The Yorktown Simulation Engine
Monty M. Denneau
Proceedings of the 19th Design Automation Conference
June 1982, pp. 55-59.
© IEEE 1982
Abstract
The Yorktown Simulation Engine (YSE) is a high speed special purpose parallel processor designed and built at the IBM Thomas J. Watson Research Center to simulate the logical operation of large digital networks. A full YSE configuration simulates networks of up to 2,000,000 gates at a rate exceeding 3 billion gate computations per second, doing more simulations in just eight hours than an IBM 370/168 does in an entire year.
This paper reviews the gate-level logic simulation and describes the architecture and hardware implementation of the YSE. A companion paper by G. Pfister and E. Kronstadt discusses the YSE software
