| To: | tgraf@xxxxxxx |
|---|---|
| Subject: | Re: [PATCH] loop unrolling in net/sched/sch_generic.c |
| From: | "David S. Miller" <davem@xxxxxxxxxxxxx> |
| Date: | Tue, 05 Jul 2005 14:22:10 -0700 (PDT) |
| Cc: | dada1@xxxxxxxxxxxxx, netdev@xxxxxxxxxxx |
| In-reply-to: | <20050705173411.GK16076@postel.suug.ch> |
| References: | <20050705134805.GH16076@postel.suug.ch> <42CAAE2F.5070807@cosmosbay.com> <20050705173411.GK16076@postel.suug.ch> |
| Sender: | netdev-bounce@xxxxxxxxxxx |
From: Thomas Graf <tgraf@xxxxxxx> Date: Tue, 5 Jul 2005 19:34:11 +0200 > Do as you wish, I don't feel like argueing about micro optimizations. I bet the performance gain really comes from the mispredicted branches in the loop. For loops of fixed duration, say, 5 or 6 iterations or less, it totally defeats the branch prediction logic in most processors. By the time the chip moves the I-cache branch state to "likely" the loop has ended and we eat a mispredict. I think the original patch is OK, hand unrolling the loop in the C code. Adding -funroll-loops to the CFLAGS has lots of implications, and in particular the embedded folks might not be happy with some things that result from that. So I'll apply the original unrolling patch for now. |
| Previous by Date: | Msi-X on Opterons?, Leonid Grossman |
|---|---|
| Next by Date: | Re: [PATCH] loop unrolling in net/sched/sch_generic.c, David S. Miller |
| Previous by Thread: | Re: [PATCH] loop unrolling in net/sched/sch_generic.c, Thomas Graf |
| Next by Thread: | Re: [PATCH] loop unrolling in net/sched/sch_generic.c, Thomas Graf |
| Indexes: | [Date] [Thread] [Top] [All Lists] |