[Top] [All Lists]

Re: Tx queueing

To: Andrew Morton <andrewm@xxxxxxxxxx>
Subject: Re: Tx queueing
From: jamal <hadi@xxxxxxxxxx>
Date: Thu, 18 May 2000 22:00:29 -0400 (EDT)
Cc: "netdev@xxxxxxxxxxx" <netdev@xxxxxxxxxxx>
In-reply-to: <392407D4.BE586507@xxxxxxxxxx>
Sender: owner-netdev@xxxxxxxxxxx

OK, Andrew, i know i am not as entertaining as some people, but i can
try ... Damn Aussie!

On Fri, 19 May 2000, Andrew Morton wrote:

> A number of drivers do this:
> start_xmit()
> {
>       netif_stop_queue()
>       ...
>       if (room for another packet)
>               netif_wake_queue()
>       ...
> }
> I suspect this is a simple port from the dev->tbusy days.
> It would seem to be more sensible to do
> start_xmit()
> {
>       ...
>       if (!room for another packet)
>               netif_stop_queue()
> }
> but the functional difference here is that we are no longer scheduling
> another BH run, so if there are additional packets queued "up there"
> then their presentation to the driver will be delayed until **this CPU**
> makes another BH run.  

This seems fine to me. In 2.3, the device is already serialized by the
txmit lock at this point. So your proposal should be fine. 

> For devices which have a Tx packet ring or decent FIFO I don't expect
> this to be a problem, because the Tx ISR will call netif_wake_queue()

To be correct: It is the invocation of the interupt handler (which
could be caused by quiet a few sources other than tx completion) that
forces the reclamation of the TX DMA ring descs.

> and the subsequent BH run will keep stuffing packets into the Tx ring
> until it's full.  But for devices which have very limited Tx buffering
> there may be a lost opportunity to refill the Tx buffer earlier.  Seems
> unlikely to me.

tx desc harvesting might be key here.
Donald's drivers typically do the reclamation in the interupt
path. I have tried to do it in both the tx and rx paths on the tulip
(locking of course, which Donald dislikes so much ;->) by setting
thresholds such that you only do it on the txmit if the number of
available descs is < 1/2 of total. This greatly reduces the amount of
txnobuffs. Of course this is impossible to do without locks ;->
Perhaps Donald has some words of wisdom about his choices.

> Also, I'm still attracted to the idea of dequeueing packets within the
> driver (the 'pull' model) rather than stuffing them in via
> qdisc_restart() and the BH callback. 

you will have to rewrite a _lot_ of the upper layers' code.
And you will really have to prove the benefit of going to this path.
What events will activate the pull? QoS will probably totaly break.

> A while back Don said:
> > The BSD stack uses the scheme of dequeuing packets in the ISR.  This was a
> > good design in the VAX days, and with primative hardware that handled only
> > single packets.  But it has horrible cache behavior, needs an extra lock,
> > and can result the interrupt service routine running a very long time,
> > blocking interrupts.
> I never understood the point about cache behaviour.  Perhaps he was
> referring to the benefit which a sequence of short loops has over a
> single, long loop?  And nowadays we only block interrupts for this
> device (or things on this device's IRQ?).

Interupt means context switch?

> One advantage which the 'pull' model has is with CPU/NIC bonding. 
> AFAIK, the only way at present of bonding a NIC to a CPU is via the
> IRQ.  This is fine for the ISR and the BH callback, but at present the
> direct userland->socket->qdisc->driver path will be executed on a random
> CPU. Moving some of this into the ISR will make bonding more effective.
> Or teach qdisc_restart() to simply queue packets and rely on the
> CPU-specific softnet callback to do the transmit.  Probably doesn't make
> much diff.
> Of course, all this is simply noise without benchmarks...
> Has anyone done any serious work with NIC/CPU bonding?

You can do NIC CPU bonding today in 2.3 using IRQ affinity.


<Prev in Thread] Current Thread [Next in Thread>