It would be nice if the NIC could asynchronously trigger prefetches in
the CPU. Currently a lot of the packet processing cost goes
to waiting for read cache misses.
- NIC receives packet.
- Tells target CPU to prefetch RX descriptor and headers.
- CPU later looks at them and doesn't have to wait a for a cache miss.
Drawback is that you would need to tell the NIC in advance
on which CPU you want to process the packet, but with Linux
IRQ affinity that's easy to figure out.
With all the interrupt avoidance that is going-on these days, would
prefetching in the driver be sufficient? Presumably the driver is
going to be processing multiple packets at a time on an interrupt/etc
so having it issue prefetches in SW would seem to help with all but the
very first packet.
Wisdom teeth are impacted, people are affected by the effects of events