Some arm cortex-a8 improvements

Richard Henderson rth at twiddle.net
Tue Apr 24 00:59:25 CEST 2012


On 04/23/12 15:32, Torbjorn Granlund wrote:
> Richard Henderson <rth at twiddle.net> writes:
> 
>   On 04/23/12 07:49, Torbjorn Granlund wrote:
>   > Do you know the repeat rate of umull, umlal, umaal, assuming no reg
>   > dependencies?
>   
>   For a8: 3 cycles.
>   
> For a9 it seems to be 2 cycles, so 3.25 c/l for the current addmul_1 is
> not very good.
> I have found no timing docs, so I measured it myself:

arm.com has them, free registration required.

Table B.5. Multiplication instruction cycle timings
Instruction
Cycles
Result latency

MUL(S), MLA(S)
2
4

SMULL(S), UMULL(S), SMLAL(S), UMLAL(S)
3	
4 for the first written register
5 for the second written register

UMAAL
3	
4 for the first written register
5 for the second written register

There doesn't seem to be any clear indication of the repeat rate.
One is lead to believe that the multiplier is fully pipelined, but
your experiment suggests that it isn't.


r~


More information about the gmp-devel mailing list