"Nelson H. F. Beebe" wrote:
> Compaq AlphaServer ES40 Sierra/667 (32 EV6.7 21264A CPUs, 667 MHz,
> 8GB RAM); OSF/1 5.0
> f95 -O5 matmul_test.f && ./a.out
> speed(1) = 84.92637
> speed(2) = 86.62321
> discrepancy = 0.000000000000000E+000
hmm I get different figures to you here.
I get on the same hardware
(ES40 = 4x 667Mhz Alpha EV6.7 with 4Gb RAM):
f90 -O5 matmul_test.f -o matmul_test.kelvin && ./matmul_test.kelvin
speed(1) = 172.9608
speed(2) = 103.7560
discrepancy = 0.000000000000000E+000
but:
f90 -O3 matmul_test.f -o matmul_test.kelvin && ./matmul_test.kelvin
speed(1) = 235.6120
speed(2) = 234.7684
discrepancy = 0.000000000000000E+000
so here is certainly one case where aggressive optimisation makes things
worse.
Also to add to your figures, on the Cray T3E:
(816x 600Mhz Alpha EV6 with 256Mb memory):
f90 -O3 matmul_test.f -o matmul_test.turing && ./matmul_test.turing
speed(1) = 59.99571143155223
speed(2) = 61.396552312259679
discrepancy = 0.E+0
and on the the Sgi Origin 3000
(256x 400Mhz Mips 12k , IRIX 6.5.11f, MIPSpro Compilers: Version 7.3.1.1m):
f90 -O3 matmul_test.f -o matmul_test.green && ./matmul_test.green
speed(1) = 200.609299
speed(2) = 220.491638
discrepancy = 0.E+0
and on the SGI Troons (2x 667Mhz IA64 with 2Gb RAM):
<not sure if Intel NDA stops me publishing these figures here ?>
Finally is it worth checking the memory alignment of arrays in cases where the
first element is *not* A(1:1) ?
Yours,
Daniel
-----------------------------------------------------------------------
Dr. Daniel Kidger | E: d.kidger@xxxxxxxxx
High Performance Computing Group | W: www.csar.cfs.ac.uk
Manchester Computing, University of Manchester, | T: +44 161 275 7038
Oxford Road, Manchester, M13 9PL, UK | F: +44 161 275 6800
--------------------Q: what's up ? A: X cross Z ---------------------
|