| Parallelization using OpenMP |
Our current compiler implementation supports parallelization through user-inserted OpenMP pragmas to denote the parallel sections of code.We generate a PPE binary that when executed spawns threads on the SPEs, initiates DMAs to transfer the appropriate SPE code to the SPE local stores, and co-ordinates execution of all threads in the program. The figure illustrates the steps in compiling OpenMP for the Cell architecture.

Figure 1. Parallelization using OpenMP.
Outlining. We outline each parallel code section, i.e., we replace it with a call to a newly created function, and move the code corresponding to the parallel code section into the newly created function.
Cloning. We clone outlined functions corresponding to parallel code regions, so that we have two copies of these functions, one to be compiled to execute on the PPE and the other to execute on the SPEs. When cloning outlined functions, we also clone any other functions that may be invoked when these functions execute.
