Dear SGI compile folks:
I wrote and maintain the major RISC/Unix client, Mlucas, for the Great Internet
Mersenne Prime Search.
Mlucas is a much-improved version of lucas, an early prototype version of the
code which is now part
of the SPEC2000FP suite. While Mlucas is much bigger and better than lucas, it
was designed (based on
my experience with SPEC2000) to be similarly portable. I've successfully
compiled it on all the major
RISC platforms that have a decent f90 compiler: Alpha (TruUnix and various
flavors of Linux), MIPS/Irix,
Sparc and IBM Power1,2,3/AIX. I recently tried it out on a Linux Itanium system
avaialable via the compaq
TestDrive program. Here are the platform specifics:
spe190.testdrive.compaq.com> uname -a
Linux spe190.testdrive.compaq.com 2.4.0-test9 #25 SMP Fri Oct 20 09:14:53 PDT
2000 ia64 unknown
spe190.testdrive.compaq.com> sgif90 -version
SGIcc Compilers: Version 0.01.0-12
The source code I was compiling is described and linked at
ftp://209.133.33.182/pub/mayer/README2.html
The code basically performs a bunch of large-integer multiplies using an
efficient FFT implementation I wrote.
For Mersenne testing, high accuracy is very important, so one of the tricks the
code uses to try to minimize
roundoff error is to query the compiler on the platform where it is built for
an extended-precision floating-point
data type (via the f90 selected_real_kind function), and to do initializations
of small FFT sincos data tables using
the highest-precision supported floating data type. All subsequent computations
are done in strictly 64-bit real.
I do the high-prec. inits simply as
integer, parameter :: r8 =selected_real_kind(15,30) !* This gives the default
8-byte floating data type.
integer, parameter :: r16=max(r8,selected_real_kind(18,50)) !* If r16 >= 0, an
extended-precision (i.e. > 8-byte) float is
available.
i.e. if an ext.-prec. floating type is not available, everything reverts to
real*8. The above is from the Mdata.f90 module
of the source code - simply attempting to compile this small module will show
you what error messages were generated.
Here is the complete module:
!********************
!
!...Module for stashing global parameters used by various Mlucas routines.
!
MODULE Mdata
implicit none
integer, parameter :: r8 =selected_real_kind(15,30) !* This gives the default
8-byte floating data type.
integer, parameter :: r16=max(r8,selected_real_kind(18,50)) !* If r16 >= 0, an
extended-precision (i.e. > 8-byte) float is
available.
!...Quasi-generic cache-and-bank-conflict-avoiding array padding parameters are
here. These MUST be a power of two to be
! compatible with the simple padded-index calculation scheme we use.
! Unctuous TV announcer: "Are you suffering from cache flow problems? Bank
foreclosures? Then try my new array padding!"
integer, parameter :: dat_bits = 10 !* Number of 8-byte array data in each
contiguous-data block = 2^dat_bits.
!* This should be chosen so a complete data block, along with a roughly
equal
!* number of FFT sincos or DWT weights data, fits in the L1 cache. 512
8-byte
!* floats seems a good choice for an 8KB L1 cache.
!* NOTES: - the blocklength must divide the (unpadded) vector length.
!* - dat_bits must be at least 5 to be compatible with
radix16_wrapper_square.
!* - dat_bits must be at least 6 to be compatible with
radix32_wrapper_square.
!* - dat_bits must be at least 7 to be compatible with
radix64_wrapper_square.
integer, parameter :: pad_bits = 1 !* Number of 8-byte padding slots in each
contiguous-data block = 2^pad_bits.
real*8 , parameter :: rnd_const = .75d0*2d0**53 !* Constant used to effect
speedy NINT(x) = (x+rnd_const)-rnd_const.
!* General form is .75*2^{number of mantissa bits in FP representation},
!* e.g. on the x86 family with its 80-bit register float we'd use
.75*2d0**64.
!* For this NINT to work, must prevent overaggressive compiler
optimizations
!* which eliminate sequences like (x+y)-y, e.g. under DEC Unix
!* we specify <-assume accuracy_sensitive>.
!...Parameters used frequently in the complex floating-point transform...
real(kind=r16), parameter ::
one=1,two=2,twopi=6.28318530717958647692528676655900559_r16
real*8, parameter :: isrt2=0.7071067811865476_r8 !* 1/sqrt(2)
real*8, save :: one_half(0:2)=(/1d0,.5d0,.25d0/) !* Needed for
small-weights-tables scheme
end MODULE Mdata
Anyway, the above should compile OK even if there is only real*8 support, but
sgif90 doesn't agree.
Interestingly, when I modified the code so it only defined r16 (but didn't use
it in a (kind=) statement
or as an _r16 literal extension), it generated a value of 16, which should mean
that such a data type is
available, but apparently it isn't. So that's probably a simple compiler bug
for you to look into.
An additional question: when will the sgif90 compiler support
extended-precision floats? And,
will it support both the x86 real*10 type and an emulated real*16 type? It
would be nice to have
both available, since there is actual hardware support for real*10, hence such
operations should
run only about 2x as slow as real*8 (the factor of 2 comes from the load/store
penalty for moving
80-bit-wide operands between floating registers and memory). If people actually
need even more
precision, they could then resort to real*16, but of course that typically
involves a 10x performance
penalty, and for many applications (e.g. the aforementioned high-precision
floating data tables inits),
simply having a few bits or bytes more than real*8 is sufficient.
-Ernst Mayer
p.s.: Before writing this I looked for the Pro64 support mail archive mentioned
on your website,
but found no link to it - only a way to subscribe to the mailing list. Would
subscribing allow me
to access the list archive?
|