[Top] [All Lists]

Re: [PATCH] loop unrolling in net/sched/sch_generic.c

To: Eric Dumazet <dada1@xxxxxxxxxxxxx>
Subject: Re: [PATCH] loop unrolling in net/sched/sch_generic.c
From: Thomas Graf <tgraf@xxxxxxx>
Date: Wed, 6 Jul 2005 14:42:06 +0200
Cc: "David S. Miller" <davem@xxxxxxxxxxxxx>, netdev@xxxxxxxxxxx
In-reply-to: <>
References: <> <> <> <> <> <> <> <>
Sender: netdev-bounce@xxxxxxxxxxx
* Eric Dumazet <42CB2B84.50702@xxxxxxxxxxxxx> 2005-07-06 02:53

A short recap after some coffee and sleep:

The inital issue you brought up which you backed up with numbers
is probably the cause of multiple wrong branch predictions due
to the fact that I wrote skb = dequeue(); if (skb) which was
assumed to be likely by the compiler. In your patch you fixed
this with !skb_queue_empty() which fixed this wrong prediction
and also acts as a little optimization due to skb_queue_empty()
being really simple to implement for the compiler. The patch
I posted should result in almost the same result, despite of
the additional wrong branch prediction for the loop it always
has one wrong prediction which is when we hit a non-empty queue.
In your unrolled version you could optimize it even more by
taking advantage that prio=2 is the most likely non-empty queue
so you could change the check to likely and save a wrong branch
prediction for the common case at the cost of a branch misprediction
if all queues are empty.

The patch I posted results in something like this:

        pushl   %ebx
        xorl    %ecx, %ecx
        movl    8(%esp), %ebx
        leal    128(%ebx), %edx
        movl    (%edx), %eax
        cmpl    %edx, %eax
        jne     .L132           ; if (!skb_queue_empty())
        incl    %ecx
        addl    $20, %edx
        cmpl    $2, %ecx
        jle     .L129           ; end of loop
        xorl    %eax, %eax      ; all queues empty
        popl    %ebx

I regard the miss here as acceptable for the increased flexibility
we get. It can be optimized with a loop unrolling but my opinion
is to try and avoid it if possible.

Now your second thought is quite interesting, although it heavly
depends on the fact that prio=2 is the most often used band. It
will be interesting to see some numbers.

> static struct sk_buff *
> pfifo_fast_dequeue(struct Qdisc* qdisc)
> {
>         struct sk_buff_head *list = qdisc_priv(qdisc);
>         struct sk_buff_head *best = NULL;
>       list += 2;
>         if (!skb_queue_empty(list))
>                 best = list;
>         list--;
>         if (!skb_queue_empty(list))
>                 best = list;
>         list--;
>         if (!skb_queue_empty(list))
>                 best = list;

Here is what I mean, a likely() should be even better.

>         if (best) {
>                 qdisc->q.qlen--;
>                 return __skb_dequeue(best);
>                 }
>         return NULL;
> }

<Prev in Thread] Current Thread [Next in Thread>