pro64-support
[Top] [All Lists]

RE: about LNO

To: "'David L Stephenson'" <dlstephe@xxxxxxx>, "pro64-support@xxxxxxxxxxx" <pro64-support@xxxxxxxxxxx>
Subject: RE: about LNO
From: Peng Tu <tu@xxxxxxxxxxxxx>
Date: Wed, 13 Sep 2000 09:47:49 -0700
Organization: Tensilica Inc
Reply-to: "tu@xxxxxxxxxxxxx" <tu@xxxxxxxxxxxxx>
Sender: owner-pro64-support@xxxxxxxxxxx
My argument is: it is better to do unroll-and-jam (a.k.a. outer unroll in LNO),
instead of interchange:

for (i = 0; i < 1000; i = i + 4) {
    for (j = 1; j < 1000; j++) {
         a[i][j] = a[i][j] + a[i-1][j];
         a[i+1][j] = a[i+1][j] + a[i][j];
         a[i+2][j] = a[i+2][j] + a[i+1][j]
         a[i+3][j] = a[i+3][j] + a[i+2][j]
    }
}

This way, you get the benefit of reuse on the i-loop without losing
the stride-1 access on the j-loop.

Anyway, I don't see the whole program. Ross is probably right about
the frontend issue.

Peng.

On Wednesday, September 13, 2000 8:09 AM, David L Stephenson 
[SMTP:dlstephe@xxxxxxx] wrote:
> Peng Tu wrote:
> 
> > For this C program, I don't see why it is beneficial to interchange
> > the loop because the inner loop is already stride-1 (C is row-major).
> 
> Interchanging the loops allows one of the array loads to be removed:
> 
>       for (i=1; i<1000; i++)
>         for (j=1; j<1000; j++)
>           a[i][j] = a[i][j] + a[i-1][j];
> 
> becomes
> 
>       for (j=1; j<1000; j++) {
>         t = a[i][0];
>         for (i=1; i<1000; i++) {
>           a[i][j] = t = a[i][j] + t;
>       }
> 
> But see Ross Towle's response.
> 
> -- 
> David Stephenson        http://reality.sgi.com/dlstephe_engr/

<Prev in Thread] Current Thread [Next in Thread>