My argument is: it is better to do unroll-and-jam (a.k.a. outer unroll in LNO),
instead of interchange:
for (i = 0; i < 1000; i = i + 4) {
for (j = 1; j < 1000; j++) {
a[i][j] = a[i][j] + a[i-1][j];
a[i+1][j] = a[i+1][j] + a[i][j];
a[i+2][j] = a[i+2][j] + a[i+1][j]
a[i+3][j] = a[i+3][j] + a[i+2][j]
}
}
This way, you get the benefit of reuse on the i-loop without losing
the stride-1 access on the j-loop.
Anyway, I don't see the whole program. Ross is probably right about
the frontend issue.
Peng.
On Wednesday, September 13, 2000 8:09 AM, David L Stephenson
[SMTP:dlstephe@xxxxxxx] wrote:
> Peng Tu wrote:
>
> > For this C program, I don't see why it is beneficial to interchange
> > the loop because the inner loop is already stride-1 (C is row-major).
>
> Interchanging the loops allows one of the array loads to be removed:
>
> for (i=1; i<1000; i++)
> for (j=1; j<1000; j++)
> a[i][j] = a[i][j] + a[i-1][j];
>
> becomes
>
> for (j=1; j<1000; j++) {
> t = a[i][0];
> for (i=1; i<1000; i++) {
> a[i][j] = t = a[i][j] + t;
> }
>
> But see Ross Towle's response.
>
> --
> David Stephenson http://reality.sgi.com/dlstephe_engr/
|