| Memory Abstraction |
Software Caching. An SPE can directly access only its local store, requiring a DMA transfer whenever it reads or writes to locations in the shared system memory. An explicit DMA for each such access is likely to degrade performance severely. Software caching permits reuse of data transferred by a single DMA across multiple accesses that have appropriate spatial or temporal locality. We implement a runtime library for software caching to temporarily store and manage copies of system memory data in the SPE local store. The compiler identifies system memory accesses in SPE code regions, and replaces these accesses with calls to the software cache library.
Static Buffering. When an SPE loop accesses a large array, we attempt to tile the loop. Tiling allows us to pre-fetch a significant portion of the array into a static buffer using a single DMA transfer, instead of calling the software cache library for each array element. This is one of several optimizations that can reduce the performance impact of DMA transfers between system memory and SPE local stores. We are currently exploring other techniques as well.
Code Partitioning. An SPE code section may
be too large to fit in the limited space available in the SPE local store.
We have developed a facility to partition code into multiple overlays,
and to enable the automatic transfer of an overlay into SPE local store
when required. However, we have yet to integrate this code partitioning
with our support for OpenMP.
