Thomas Graf a écrit :
* Eric Dumazet <42CA390C.9000801@xxxxxxxxxxxxx> 2005-07-05 09:38
[NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates
better code.
(Using skb_queue_empty() to test the queue is faster than trying to
__skb_dequeue())
oprofile says this function uses now 0.29% instead of 1.22 %, on a
x86_64 target.
I think this patch is pretty much pointless. __skb_dequeue() and
!skb_queue_empty() should produce almost the same code and as soon
as you disable profiling and debugging you'll see that the compiler
unrolls the loop itself if possible.
OK. At least my compiler (gcc-3.3.1) does NOT unroll the loop :
Original 2.6.12 gives :
ffffffff802a9790 <pfifo_fast_dequeue>: /* pfifo_fast_dequeue total: 2904054
1.9531 */
258371 0.1738 :ffffffff802a9790: lea 0xc0(%rdi),%rcx
273669 0.1841 :ffffffff802a9797: xor %esi,%esi
12533 0.0084 :ffffffff802a9799: mov (%rcx),%rdx
292315 0.1966 :ffffffff802a979c: cmp %rcx,%rdx
11717 0.0079 :ffffffff802a979f: je ffffffff802a97d1
<pfifo_fast_dequeue+0x41>
4474 0.0030 :ffffffff802a97a1: mov %rdx,%rax
6238 0.0042 :ffffffff802a97a4: mov (%rdx),%rdx
41 2.8e-05 :ffffffff802a97a7: decl 0x10(%rcx)
6089 0.0041 :ffffffff802a97aa: test %rax,%rax
126 8.5e-05 :ffffffff802a97ad: movq $0x0,0x10(%rax)
39 2.6e-05 :ffffffff802a97b5: mov %rcx,0x8(%rdx)
6974 0.0047 :ffffffff802a97b9: mov %rdx,(%rcx)
2841 0.0019 :ffffffff802a97bc: movq $0x0,0x8(%rax)
366 2.5e-04 :ffffffff802a97c4: movq $0x0,(%rax)
14757 0.0099 :ffffffff802a97cb: je ffffffff802a97d1
<pfifo_fast_dequeue+0x41>
288 1.9e-04 :ffffffff802a97cd: decl 0x40(%rdi)
94 6.3e-05 :ffffffff802a97d0: retq
970400 0.6526 :ffffffff802a97d1: inc %esi
982402 0.6607 :ffffffff802a97d3: add $0x18,%rcx
4 2.7e-06 :ffffffff802a97d7: cmp $0x2,%esi
1 6.7e-07 :ffffffff802a97da: jle ffffffff802a9799
<pfifo_fast_dequeue+0x9>
59754 0.0402 :ffffffff802a97dc: xor %eax,%eax
561 3.8e-04 :ffffffff802a97de: data16
:ffffffff802a97df: nop
:ffffffff802a97e0: retq
And new code (2.6.12-ed):
ffffffff802b1020 <pfifo_fast_dequeue>: /* pfifo_fast_dequeue total: 153139
0.2934 */
27388 0.0525 :ffffffff802b1020: lea 0xc0(%rdi),%rdx
42091 0.0806 :ffffffff802b1027: cmp %rdx,0xc0(%rdi)
:ffffffff802b102e: jne ffffffff802b1052
<pfifo_fast_dequeue+0x32>
474 9.1e-04 :ffffffff802b1030: lea 0xd8(%rdi),%rdx
5571 0.0107 :ffffffff802b1037: cmp %rdx,0xd8(%rdi)
2 3.8e-06 :ffffffff802b103e: jne ffffffff802b1052
<pfifo_fast_dequeue+0x32>
1 1.9e-06 :ffffffff802b1040: lea 0xf0(%rdi),%rdx
20030 0.0384 :ffffffff802b1047: xor %eax,%eax
6 1.1e-05 :ffffffff802b1049: cmp %rdx,0xf0(%rdi)
6 1.1e-05 :ffffffff802b1050: je ffffffff802b1086
<pfifo_fast_dequeue+0x66>
:ffffffff802b1052: mov (%rdx),%rcx
11796 0.0226 :ffffffff802b1055: xor %eax,%eax
:ffffffff802b1057: cmp %rdx,%rcx
8 1.5e-05 :ffffffff802b105a: je ffffffff802b1083
<pfifo_fast_dequeue+0x63>
3146 0.0060 :ffffffff802b105c: mov %rcx,%rax
12 2.3e-05 :ffffffff802b105f: mov (%rcx),%rcx
118 2.3e-04 :ffffffff802b1062: decl 0x10(%rdx)
4924 0.0094 :ffffffff802b1065: movq $0x0,0x10(%rax)
65 1.2e-04 :ffffffff802b106d: mov %rdx,0x8(%rcx)
725 0.0014 :ffffffff802b1071: mov %rcx,(%rdx)
11493 0.0220 :ffffffff802b1074: movq $0x0,0x8(%rax)
194 3.7e-04 :ffffffff802b107c: movq $0x0,(%rax)
2995 0.0057 :ffffffff802b1083: decl 0x40(%rdi)
19607 0.0376 :ffffffff802b1086: nop
2487 0.0048 :ffffffff802b1087: retq
Please give us the code your compiler produces, and explain me how disabling
oprofile can change the generated assembly. :)
Debugging has no impact on this code either.
Thank you
Eric
|