|
Target: SPE (Cell)
In addition to the traditional compiler optimizations, we provide the
following optimizations.
Scalar on SIMD Units. Most SPE instructions, including all memory
instructions, are SIMD instructions operating on 128 bits of data at a
time, . As a result, all scalar code in a program must be adapted in order
to run correctly on the SPE's SIMD units. Most notable is that all scalar
stores must be modified into a read-modify-write, as stores necessarily
store 16 byte of data at once. Note that by performing aggressive register
allocation and by allocating temporary scalars in distinct 16-byte memory
locations, we can avoid most such overhead, although there is a storage
use penalty for doing this.
Branch Optimization. The SPE's hardware has no dynamic branch
prediction but has a special branch hint instruction, which indicates
likely taken branches. The compiler inserts such hints when suitable.
Instruction Scheduling, Bundling, and Instruction Fetch Handling.
The SPE's hardware supports dual issuing of independent instructions but
has some code layout restrictions. Namely, to dual-issue, the even pipe
instruction must be at an even-word PC address and the odd pipe instruction
must be at an odd-word PC address. Code layout can be modified by adding
nops when required. Another optimization by the compiler is explicit instruction
fetch from the local memory, which improves memory intensive code as both
the load/store and the instruction fetch share the same local memory port,
with load/store instructions having priority over instruction fetch.

|