[NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates better (Using skb_queue_empty() to test the queue is faster than trying to __skb_dequeue()) oprofile says this function uses no
* Eric Dumazet <42CA390C.9000801@xxxxxxxxxxxxx> 2005-07-05 09:38 I think this patch is pretty much pointless. __skb_dequeue() and !skb_queue_empty() should produce almost the same code and as soon as
Thomas Graf a écrit : * Eric Dumazet <42CA390C.9000801@xxxxxxxxxxxxx> 2005-07-05 09:38 [NET] : unroll a small loop in pfifo_fast_dequeue(). Compiler generates better code. (Using skb_queue_empty()
Thomas Graf a écrit : OK. At least my compiler (gcc-3.3.1) does NOT unroll the loop : Because you don't specify -funroll-loop I'm using vanilla 2.6.12 : no -funroll-loop in it. Maybe in your tree,
I bet the performance gain really comes from the mispredicted branches in the loop. For loops of fixed duration, say, 5 or 6 iterations or less, it totally defeats the branch prediction logic in most
Eric, I've told you this before many times. Please do something so that your email client does not corrupt the patches. Once again, your email client turned all the tab characters into spaces and thu
* David S. Miller <20050705.142210.14973612.davem@xxxxxxxxxxxxx> 2005-07-05 The patch must be changed to use __qdisc_dequeue_head() instead of __skb_dequeue() or we screw up the backlog.
David S. Miller a écrit : From: Thomas Graf <tgraf@xxxxxxx> Date: Tue, 5 Jul 2005 23:33:55 +0200 * David S. Miller <20050705.142210.14973612.davem@xxxxxxxxxxxxx> 2005-07-05 14:22 So I'll apply the
* Eric Dumazet <42CB14B2.5090601@xxxxxxxxxxxxx> 2005-07-06 01:16 Ok, this clarifies a lot for me, I was under the impression you knew about these changes. It is very unlikely to change within mainlin
But the branch prediction is where I personally think a lot of the lossage is coming from. These can cost upwards of 20 or 30 processor cycles, easily. That's getting close to the cost of a L2 cache
* David S. Miller <20050705.164503.104035718.davem@xxxxxxxxxxxxx> 2005-07-05 Absolutely. I think what happens is that we produce predicion failures due to the logic within qdisc_dequeue_head(), I can
Thomas Graf a écrit : I still think we can fix this performance issue without manually unrolling the loop or we should at least try to. In the end gcc should notice the constant part of the loop an
* Eric Dumazet <42CB2698.2080904@xxxxxxxxxxxxx> 2005-07-06 02:32 Correct. The !expr implies an unlikely so the prediction should be right and equal to your unrolling version. This would break the who
Eric Dumazet a écrit : Maybe we can rewrite the whole thing without branches, examining prio from PFIFO_FAST_BANDS-1 down to 0, at least for modern cpu with conditional mov (cmov) struct sk_buff_he
* Eric Dumazet <42CB2B84.50702@xxxxxxxxxxxxx> 2005-07-06 02:53 I think you got me wrong, the whole point of this qdisc is to prioritize which means that we cannot dequeue from prio 1 as long as the q
Thomas Graf a écrit : Maybe we can rewrite the whole thing without branches, examining prio from PFIFO_FAST_BANDS-1 down to 0, at least for modern cpu with conditional mov (cmov) This would break t