[cc'ing netdev]
On Fri, 2005-01-07 at 01:20 +0100, Francois Romieu wrote:
> Mark Broadbent <markb@xxxxxxxxxxxxxx> :
> [...]
> > I had a think about the netpoll deadlock following the initial detective
> > work by you and Francois did. I came up with a sample fix given below
> > that defers the packet transmission within netpoll until the xmit_lock
> > can be grabbed. An advantage of this is that is maintains the packet
> > ordering. I have attached the patch.
>
> Nice.
>
> Comments below. Off to bed now.
>
> > It is possible for netpoll to deadlock by attempting to take the network
> > device xmit_lock twice on different processors. This patch resolves this
> > by deferring to transmission of the second and subsequent skbs.
> >
> > Signed-Off-By: Mark Broadbent <markb@xxxxxxxxxxxxxx>
> >
> > --- linux-2.6.10-org/net/core/netpoll.c 2005-01-05 22:30:53.000000000
> > +0000
> > +++ linux-2.6.10/net/core/netpoll.c 2005-01-06 21:00:24.000000000 +0000
> [...]
> > @@ -178,17 +201,14 @@ repeat:
> > return skb;
> > }
> >
> > -void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
> > +/* This must be called with np->dev->xmit_lock spinlock held - this
> > function
> > + * function will drop the lock before returning indicating if it needs to
> > be
> > + * retried
> > + */
> > +static int transmit_skb(struct netpoll *np, struct sk_buff *skb)
>
> It may seem like cosmetic but I do not like functions which unlock a lock
> they have not acquired themselves (and I am probably not alone). Would you
> consider pushing the trylock() in transmit_skb() itself ?
>
> All the callers of this functions have to issue the trylock anyway so it
> should not matter from a performance POV.
Good point, I was having doubts about that.
> [...]
> > +
> > + spin_lock(&tx_list_lock);
> ^^^^^^^^^
> kernel/workqueue.c::run_workqueue() could reenable interrupts before
> it calls tx_retry_wq(). An irq safe version of the underlined lock
> seems required to avoid deadlocking with printks potentially issued
> from interrupt context.
Fixed.
> > +
> > + list_for_each_safe(tx_ctr, tx_tmp, &tx_queue.link) {
> > + tx_el = list_entry(tx_ctr, struct tx_queue_list, link);
> > +
> > + while (tx_el->np && tx_el->np->dev &&
> > + netif_running(tx_el->np->dev)) {
>
> netif_running() without xmit_lock held ? Uh oh...
This is done in the original code (and still is in other parts of the code).
> Imho you should try to keep transmit_skb() closer to the former
> netpoll_send_skb() (modulo the try_lock change of course).
> Btw I do not see what will prevent cleanup_netconsole() to succeed
> when tx_retry_wq() is scheduled for later execution. Add some module
> refcounting ?
No need, if netpoll_cleanup is called whilst a retry is pending the tx
list lock is taken. All the references to the netpoll pointer are
deleted from the pending list before the lock is released.
> > +void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
> > +{
> > +repeat:
> > + if(!np || !np->dev || !netif_running(np->dev))
> > + goto out_free;
>
> I am not sure that the first two tests above really made sense in
> the former version of the code:
> -> kernel/printk.c::vprintk() (downs console_sem)
> -> kernel/printk.c::release_console_sem
> -> kernel/printk.c::call_console_drivers
> [blah blah]
> -> drivers/net/netconsole.c::netconsole.write_msg
> -> drivers/net/netconsole.c::netpoll_send_udp
> -> drivers/net/netconsole.c::netpoll_send_skb
>
> -> drivers/net/netconsole.c::cleanup_netconsole
> -> kernel/printk.c::unregister_console
> -> kernel/printk.c::acquire_console_sem
> -> try to down(&console_sem)
>
> dev_put() has not been issued, np->dev is still among us and != NULL.
> So far, so good.
>
> However netpoll_send_skb can be called through:
> netpoll_rx -> arp_reply -> netpoll_send_skb
> Here I fail to see why netpoll_cleanup could not happen at the same time
> and issue np->dev = NULL;
Could this solved by having a per netpoll structure rw_lock to protect
against changing elements in the structure?
e.g.
struct netpoll {
...
rwlock_t netpoll_lock;
};
write_lock_irqsave(&np->netpoll_lock, flags);
np->dev = NULL;
write_unlock_irqsave(&np->netpoll_lock, flags);
> > +
> > + /* If the list is non-empty then pkts are pending tx,
> > + * append to the end of list to stop reordering of the pkts
> > + */
> > + if (!list_empty(&tx_queue.link) || !spin_trylock(&np->dev->xmit_lock)) {
> > + struct tx_queue_list *txel;
> > +
> > + /* If we failed to get the xmit_lock then netpoll probably
> > + * already holds this lock but will deadlock here if we relock,
>
> Ok ("probably" ?).
:) Fixed
> > + * queue the skb for transmission later
> > + */
> > + spin_lock(&tx_list_lock);
>
> No spin_lock_irqsave ? Hmmm...
> The only use of netpoll_send_skb() goes through netpoll_send_udp() which is
> issued in drivers/net/netconsole.c::write_msg() with local_irq disabled.
> You are right but an extra comment would not hurt imho.
Is it safe to issue spin_lock_irqsave/spin_unlock_restore in this
context? Reason: What if it wasn't netconsole calling netpoll_send_udp
and not with local interrupts disabled?
> > +
> > + txel = kmalloc(sizeof(struct tx_queue_list), GFP_ATOMIC);
> > + if (!txel)
> > + goto out_free;
> > +
> > + txel->skb = skb_get(skb);
> > + txel->np = np;
> > +
> > + list_add_tail(&txel->link, &tx_queue.link);
> > +
> > + queue_delayed_work(tx_wq, &tx_wq_obj, HZ/100);
> > +
> > + spin_unlock(&tx_list_lock);
> > + } else if (transmit_skb(np, skb))
> > + goto repeat;
>
> As I read it, the previous code could loop as long as netif_queue_stopped()
> is true and it can acquire the lock. Any reason to not defer the skb for
> transmission in tx_retry_wq() ?
Nope, no good reason. If transmit_skb fails it'll now defer.
>
> > +
> > + return;
> > +
> > +out_free:
> > + __kfree_skb(skb);
> > }
> >
> > void netpoll_send_udp(struct netpoll *np, const char *msg, int len)
> > @@ -636,6 +729,15 @@ int netpoll_setup(struct netpoll *np)
> > spin_unlock_irqrestore(&rx_list_lock, flags);
> > }
> >
> > + /* Workqueue to schedule the tx retries */
> > + if (!tx_wq)
> > + tx_wq = create_workqueue("netpoll");
> > +
> > + if (!tx_wq) {
> > + printk(KERN_ERR "Failed to create netpoll workqueue\n");
> > + goto release;
> > + }
> > +
>
> This queue is created but never removed. May be it could/should go in a
> separate function and be anchored to net_dev_init(). The network maintainers
> will tell if they care or not.
>
> Pure curiosity: why a specific queue instead of schedule_delayed_work() ?
Pure ignorance I'm afraid ;). That'll more what I was looking for
originally so I've changed it.
I've attached a revised version.
Thanks for you comments (and Matts)
Mark
--
Mark Broadbent <markb@xxxxxxxxxxxxxx>
netpoll-fix-tx-deadlock-2.patch
Description: Text Data
|