What's the best way (for Itanium, at least) to multiply large integers?
For
example, 128 bit integers, 1024 bit integers or indeed arbitrarily large
integers?
I had a look through the Pro64 source and note that there is decreed a
threshold of '14' below which successive shifts+adds are used, and above
which you outright multiply. But when is it best to use the xma instruction
(and the associated cost of converting to/from an FP representation)? When
is it better to use the integer packed multiply instructions? For 'streaming
multiplication', as it were, what do you think is the best way to proceed?
Using the integer/MM units? The FP units? Both at once?
I am concerned with 'normal' multiplication of ~kilobit numbers, not
massive numbers where it's better to use transform-based multiplication.
Any ideas?
Many thanks for your time,
Duraid
|