> > > Here is another alternative that seems better than the earlier posting.
> > > It uses
> > > a per device receive queue for non-NAPI devices. The only issue is that
> > > then
> > > we lose the per-cpu queue's and that could impact the loopback device
> > > performance.
> > > If that is really an issue, then the per-cpu magic should be moved to the
> > > loopback
> > > device.
> > >
> > The repurcassions of going from per-CPU-for-all-devices queue
> > (introduced by softnet) to per-device-for-all-CPUs maybe huge in my
> > opinion especially in SMP. A closer view of whats there now maybe
> > per-device-per-CPU backlog queue.
> Any real hardware only has a single receive packet source (the interrupt
> and the only collision would be in the case of interrupt migration. So having
> per-device-per-CPU queue's would be overkill and more complex because
> the NAPI scheduling is per-netdevice rather than per-queue (though that
> could be fixed).
> > I think performance will be impacted in all devices. imo, whatever needs
> > to go in needs to have some experimental data to back it
> Experiment with what? Proving an absolute negative is impossible.
> I will test loopback and non-NAPI version of a couple of gigabit drivers
> to see.
Just a naive question : why at all trying to accelerate netif_rx?
Isn't NAPI the best choice for high performance rx anyway?