Alignment Handling

Alignment Issue, Our Contribution

The key idea for our contribution is to realize that when alignment is done in software, there is no reason to remain as constrained as a hardware approach, which has little knowledge about the alignment of the entire computation. Indeed, prior work corresponds to an "align-to-slot-zero" approach where the data of the first loop iteration is artificially aligned to the first slot in their respective vector registers.

To illustrate our approach, consider the same a[i+2] = b[i+1] + c[i+3] as before, in Figure 3. But this time, the compiler is using global alignment knowledge to directly shift the misaligned input memory stream to the alignment of the output memory stream.

simd eager
Figure 3. Optimized SIMD code for a[i+2] = b[i+1] + c[i+3].

In doing so, we eliminate the need for misalignment handling before the store and thus reduce the number of shifts by 50%, from 3 to 2 per simdized iteration, in this example.

We can generalize the insertion of shifts by inserting shift operation in the original dependence graph whenever there is a boundary between two alignments. One constraint is that the alignment of memory operations is dictated by the alignment property of arrays. The second constraint is that whenever an operation is to proceed, all its inputs and its output must have an identical alignment. Finally, the property of the shifts (referred to as "vshiftstreams" here) is that it can convert the alignment property of its single input data streams to an arbitrary alignment for its output data stream.

Using the above definition, a "shift-to-zero" policy is shown below in Figure 4a, where each of the memory inputs is shifted to the zero alignment, and then the outcome of the computation is shifted back to the alignment of the final store alignment.

The optimized shift policy used to derive Figure 3 is shown in Figure 4b. With this policy, refered to as "eager shift," each of the misaligned inputs is eagerly shifted to the alignment of the final store.

simd shift policies
Figure 4. SIMD shift placement policy.

A "lazy shift" policy is illustrated in Figure 4c for a slightly different example, a[i+3] = b[i+1] + c[i+1]. In this example, both inputs are respectively aligned with each other. Thus, we may lazily shift the result of the addition instead eagerly aligning of each of the addition's inputs.

Finally, Figure 4d illustrates a "dominant shift" policy where the inputs are aligned to the most frequent alignment seen in the computation.