netdev
[Top] [All Lists]

Re: [PATCH] Prevent netpoll hanging when link is down

To: "David S. Miller" <davem@xxxxxxxxxxxxx>
Subject: Re: [PATCH] Prevent netpoll hanging when link is down
From: Matt Mackall <mpm@xxxxxxxxxxx>
Date: Thu, 7 Oct 2004 18:43:23 -0500
Cc: ak@xxxxxxx, colin@xxxxxxxxxx, akpm@xxxxxxxx, netdev@xxxxxxxxxxx
In-reply-to: <20041007150756.2373719f.davem@xxxxxxxxxxxxx>
References: <20041006232544.53615761@xxxxxxxxxxxxxxx> <20041006214322.GG31237@xxxxxxxxx> <20041007075319.6b31430d@xxxxxxxxxxxxxxx> <20041006234912.66bfbdcc.davem@xxxxxxxxxxxxx> <20041007160532.60c3f26b@pirandello> <20041007112846.5c85b2d9.davem@xxxxxxxxxxxxx> <20041007224422.1c1bea95@xxxxxxxxxxxxxxx> <20041007214505.GB31558@xxxxxxxxxxxxx> <20041007215025.GT31237@xxxxxxxxx> <20041007150756.2373719f.davem@xxxxxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: Mutt/1.3.28i
On Thu, Oct 07, 2004 at 03:07:56PM -0700, David S. Miller wrote:
> On Thu, 7 Oct 2004 16:50:26 -0500
> Matt Mackall <mpm@xxxxxxxxxxx> wrote:
> 
> > > The only drawback is that there won't be a reply when the driver try
> > > lock fails, but netpoll doesn't have a queue for that anyways. You could
> > > probably poll then, but I'm not sure it's a good idea.
> > 
> > But your meaning here is not entirely clear.
> 
> If another thread on another cpu is in the dev->hard_start_xmit() routine,
> then it will have it's tx device lock held, and netpoll will simply get an
> immediate return from ->hard_start_xmit() with error NETDEV_TX_LOCKED.
> 
> The packet will thus not be sent, and because netpoll does not have a
> backlog queue for tx packets of any kind the packet lost forever.
> 
> NETDEV_TX_LOCKED is a transient condition.  It works for the rest of the
> kernel because whoever holds the tx lock on the device, will recheck the
> device packet transmit queue when it drops that lock and returns from
> ->hard_start_xmit().
> 
> Andi is merely noting how netpoll's design does not have such a model,
> which is why the NETIF_F_LLTX semantics don't mesh very well.
> 
> It is unclear if it ise wise that netpoll_send_skb() currently spins
> on ->hard_start_xmit() returning NETDEV_TX_LOCKED.  That could
> result in some kind of deadlocks.

Deadlocks from recursion, presumably? We could probably throw in a max
retry count, as ugly as that is..

-- 
Mathematics is the supreme nostalgia of our time.

<Prev in Thread] Current Thread [Next in Thread>