On Wed, 11 Jun 2003, Andi Kleen wrote:
> eth_type_trans checks the ethernet protocol ID and sets the
> unicast L2 type.
> Some NICs have bits in the RX descriptor for most of them. They have a
> "packet is TCP or UDP or IP" bit and also a bit for unicast or sometimes
> even multicast/broadcast. So when you have the RX descriptor you
> can just derive these values from there and put them into the skb
> without calling eth_type_trans or looking at the cache cold header.
> Then you do a prefetch on the header. When the packet reaches the
> network stack later the header has already reached cache and it can be
> processed without a memory round trip latency.
I have done prefetching experiments with a NAPIezed sb1250.c driver on
MIPS. I never got rid of eth_type_trans() just prefetched skb->data
a few lines before calling it. I did see eth_type_trans() almost
disappear from the profile (it was way low to be important).
Andis idea is even more interesting.
I did see i think about 10Kpps more in throughput.
Robert, this means our biggest bottleneck right now is cache misses.
The MIPS processor i am playing with is SMP and has a large shared L2
cache. What i am observing is that this is quiet useful for SMP.
I am limited by how much traffic i can generate right now to test it
more. I can do 295Kpps L3 easy. This board is an excuse for you to
come down to Ottawa in July ;->
> On some cards it doesn't work for all packets or can be only done
> if you don't have any multicast addresses hashed (that's the case
> for the e1000 if I read the header bits correctly). The lxt1001
> (old EOLed card) can do it for all packet types.
So can the sb1250. I'll try this out.
> Often prefetch size is limited so you should not prefetch more
> than what you can store until the packet reaches the stack.
Good point. So is there a systematic way to find out the effects
of the prefecth size or you just have to keep trying until you get