Stephen Pickles <zzcgusp@xxxxxxxxxxxxxxxx> writes on Tue, 8 May 2001
18:37:02 +0100 (BST) about a performance difference in Fortran 90
matrix multiplication seen with the SGI Pro64 compilers on IA-64.
I took the sample code and ran it on a number of other architectures,
with the results shown below. Notice that method 2 (0-based indexing)
is notably slower on the Intel Pentium III system, but slightly faster
on six other systems. I repeated several of these runs, without seeing
any significant difference in the reported timings.
Stephen's message did not make clear whether his results were for
native IA-64 hardware, or for the NUE IA-64 emulator environment.
Mine are for the latter, and they demonstrate a 2.7x slowdown for
0-based indexing. Given that the same number of loads and stores, and
the same number of floating-point operations, is required in each
case, these differences are puzzling, and further examination of the
generated assembly code will likely be necessary to resolve the
question. The .s file is over 10K lines long, and I wasn't able to
quickly isolate in a text editor the relevant code bodies for
comparison.
Compaq/DEC Alpha 4100-5/466 OSF/1 4.0F
f95 -O3 matmul_test.f && ./a.out
speed(1) = 128.8102
speed(2) = 126.6855
discrepancy = 0.000000000000000E+000
Compaq AlphaServer ES40 Sierra/667 (32 EV6.7 21264A CPUs, 667 MHz, 8GB
RAM); OSF/1 5.0
f95 -O5 matmul_test.f && ./a.out
speed(1) = 84.92637
speed(2) = 86.62321
discrepancy = 0.000000000000000E+000
Compaq AlphaServer ES40 DEC6600/500 (8 EV6 21264 CPUs, 500 MHz, 8GB
RAM); OSF/1 4.0F
f95 -O5 matmul_test.f && ./a.out
speed(1) = 155.6647
speed(2) = 150.3770
discrepancy = 0.000000000000000E+000
IBM SP/2 AIX 4.3
xlf95 -O1 matmul_test.f && ./a.out
speed(1) = 7.607727051
speed(2) = 7.227553844
discrepancy = 0.000000000000000000E+00
Intel Pentium III GNU/Linux 2.2.17-14smp (Red Hat 6.2)
lf95 -O3 matmul_test.f -o foo.exe && ./foo.exe
speed(1) = 87.0748291
speed(2) = 97.8032532
discrepancy = 0.000000000000000E+00
Intel Pentium III GNU/Linux 2.2.17-14smp (Red Hat 6.2)
HP NUE IA-64 emulator
(reduced nruns from 100000 to 1000)
sgif90 -O3 matmul_test.f && ./a.out
speed(1) = 0.207665801
speed(2) = 0.565433443
discrepancy = 0.E+0
SGI R5000-PC IRIX 6.5
f90 -O3 matmul_test.f && ./a.out
speed(1) = 36.2537994
speed(2) = 34.8822517
discrepancy = 0.E+0
SGI Origin 200 IRIX 6.5
f90 -O3 matmul_test.f && ./a.out
speed(1) = 75.4860535
speed(2) = 68.8059082
discrepancy = 0.E+0
Sun SPARC Solaris 2.7
f95 -O3 matmul_test.f && ./a.out
speed(1) = 53.20448
speed(2) = 51.201023
discrepancy = 0.0E+0
As an aside, see
http://www.math.utah.edu/pub/benchmarks/usirep.pdf
http://www.math.utah.edu/pub/benchmarks/usirep.ps
for ways to sometimes dramatically speed-up matrix multiplication on
modern RISC systems.
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- Center for Scientific Computing FAX: +1 801 585 1640, +1 801 581 4148 -
- University of Utah Internet e-mail: beebe@xxxxxxxxxxxxx -
- Department of Mathematics, 322 INSCC beebe@xxxxxxx beebe@xxxxxxxxxxxx -
- 155 S 1400 E RM 233 beebe@xxxxxxxx -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe -
-------------------------------------------------------------------------------
|