Code Examples

  Example of Code Handled by Simdization Framework

Example 1: A vanilla loop computing vector add.

for (j=0; i<n; j++) c[j] = a[j]+b[j];

Example 2: The loop body involves aggregate copy and computation on adjacent members of aggregates. SIMD parallelism can be extracted at basic-block level.

for (i=0; i<n; i++) {
q = quads[i];

vertex_results[i].x = W0 * q.p[0].x + W1 * q.p[1].x + W2 * q.p[2].x + W3 * q.p[3].x;

vertex_results[i].y = W0 * q.p[0].y + W1 * q.p[1].y + W2 * q.p[2].y + W3 * q.p[3].y;

vertex_results[i].z = W0 * q.p[0].z + W1 * q.p[1].z + W2 * q.p[2].z + W3 * q.p[3].z;

vertex_results[i].w = W0 * q.p[0].w + W1 * q.p[1].w + W2 * q.p[2].w + W3 * q.p[3].w;

}

Example 3: This loop contains arbitrary combination of misalignments and unknown loop bounds.

for(i=lowBound; i<highBound; i++) {
vout0[i+sindex3] = in0[i+sindex0] + in1[i+sindex2 + in2[i+sindex2] + in3[i+sindex3];
vout1[i+sindex2] = in5[i+sindex0] + in7[i+sindex0] + in4[i+sindex1] + in6[i+sindex3];
vout2[i+sindex2] = in10[i+sindex1] + in11[i+sindex2] + in8[i+sindex3] + in9[i+sindex3];
vout3[i+sindex1] = in13[i+sindex0] + in14[i+sindex0] + in12[i+sindex2] + in15[i+sindex3];
}

Example 4: This loop involves reduction, short to int data conversion, and runtime alignment on b[i] and b[i+j] (because i is a triangle loop).

short b[M];
int a;
...
for (j=0; j<n; j++) {
a = 0;
for (i=0; i<M-j; i++)
a+= ((int)b[i]*(int)b[i+j])>>n;
}

Example 5: Unrolled loops.

for (i = m; i < n; i = i + 4) {
dy[i] = dy[i] + da*dx[i];
dy[i+1] = dy[i+1] + da*dx[i+1];
dy[i+2] = dy[i+2] + da*dx[i+2];
dy[i+3] = dy[i+3] + da*dx[i+3];
}