It would be nice if the NIC could asynchronously trigger prefetches in
the CPU. Currently a lot of the packet processing cost goes
to waiting for read cache misses.
- NIC receives packet.
- Tells target CPU to prefetch RX descriptor and headers.
- CPU later looks at them and doesn't have to wait a for a cache miss.
Drawback is that you would need to tell the NIC in advance
on which CPU you want to process the packet, but with Linux
IRQ affinity that's easy to figure out.