Eric Dumazet a écrit :
Maybe we can rewrite the whole thing without branches, examining prio
from PFIFO_FAST_BANDS-1 down to 0, at least for modern cpu with
conditional mov (cmov)
struct sk_buff_head *best = NULL;
struct sk_buff_head *list = qdisc_priv(qdisc)+PFIFO_FAST_BANDS-1;
if (skb_queue_empty(list)) best = list ;
list--;
if (skb_queue_empty(list)) best = list ;
list--;
if (skb_queue_empty(list)) best = list ;
if (best != NULL) {
qdisc->q.qlen--;
return __qdisc_dequeue_head(qdisc, best);
}
This version should have one branch.
I will test this after some sleep :)
See you
Eric
(Sorry, still using 2.6.12, but the idea remains)
static struct sk_buff *
pfifo_fast_dequeue(struct Qdisc* qdisc)
{
struct sk_buff_head *list = qdisc_priv(qdisc);
struct sk_buff_head *best = NULL;
list += 2;
if (!skb_queue_empty(list))
best = list;
list--;
if (!skb_queue_empty(list))
best = list;
list--;
if (!skb_queue_empty(list))
best = list;
if (best) {
qdisc->q.qlen--;
return __skb_dequeue(best);
}
return NULL;
}
At least the compiler output seems promising :
0000000000000550 <pfifo_fast_dequeue>:
550: 48 8d 97 f0 00 00 00 lea 0xf0(%rdi),%rdx
557: 31 c9 xor %ecx,%ecx
559: 48 8d 87 c0 00 00 00 lea 0xc0(%rdi),%rax
560: 48 39 97 f0 00 00 00 cmp %rdx,0xf0(%rdi)
567: 48 0f 45 ca cmovne %rdx,%rcx
56b: 48 8d 97 d8 00 00 00 lea 0xd8(%rdi),%rdx
572: 48 39 97 d8 00 00 00 cmp %rdx,0xd8(%rdi)
579: 48 0f 45 ca cmovne %rdx,%rcx
57d: 48 39 87 c0 00 00 00 cmp %rax,0xc0(%rdi)
584: 48 0f 45 c8 cmovne %rax,%rcx
588: 31 c0 xor %eax,%eax
58a: 48 85 c9 test %rcx,%rcx
58d: 74 32 je 5c1 <pfifo_fast_dequeue+0x71> // one
conditional branch
58f: ff 4f 40 decl 0x40(%rdi)
592: 48 8b 11 mov (%rcx),%rdx
595: 48 39 ca cmp %rcx,%rdx
598: 74 27 je 5c1 <pfifo_fast_dequeue+0x71> // never
taken branch : always predicted OK
59a: 48 89 d0 mov %rdx,%rax
59d: 48 8b 12 mov (%rdx),%rdx
5a0: ff 49 10 decl 0x10(%rcx)
5a3: 48 c7 40 10 00 00 00 movq $0x0,0x10(%rax)
5aa: 00
5ab: 48 89 4a 08 mov %rcx,0x8(%rdx)
5af: 48 89 11 mov %rdx,(%rcx)
5b2: 48 c7 40 08 00 00 00 movq $0x0,0x8(%rax)
5b9: 00
5ba: 48 c7 00 00 00 00 00 movq $0x0,(%rax)
5c1: 90 nop
5c2: c3 retq
I Will post tomorrow some profiling results.
Eric
|