Hi,
I'm just looking into a problem that may be related to the
new neighbor discovery code in core/neighbour.c.
The problem was originally reported against our SLES kernel, but the
code in 2.6.11-rc2 is almost identical; code and patches shown below
are relative to 2.6.11-rc2
The problem is that IBM testing was hitting the assertion in kfree_skb
that checks that the skb has been removed from any list it was on
("kfree_skb passed an skb still on a list").
It seems the problem is a bad interaction between neigh_event_send and
neigh_timer_handler:
In neigh_event_send:
if (skb_queue_len(&neigh->arp_queue) >=
neigh->parms->queue_len) {
struct sk_buff *buff;
buff = neigh->arp_queue.next;
__skb_unlink(buff, &neigh->arp_queue);
kfree_skb(buff);
}
If the ARP queue overflows, we're trying to remove the first skb
from the list. While we do this, the neighbor is locked.
The corresponding piece of code in neigh_timer_handler is this:
struct sk_buff *skb = skb_peek(&neigh->arp_queue);
/* keep skb alive even if arp_queue overflows */
if (skb)
skb_get(skb);
write_unlock(&neigh->lock);
neigh->ops->solicit(neigh, skb);
atomic_inc(&neigh->probes);
if (skb)
kfree_skb(skb); /* <== this is where we BUG() */
The use of skb_get/kfree_skb makes sure the skb remains live while
we're calling solicit(). Note that the neighbor isn't locked when we
call kfree_skb.
Now what happens in the bug that was reported to us (this was on a
PPC SMP machine) is this, I think:
- On CPU 0, neigh_timer_handler grabs the first skb on the queue,
incrementing is reference count to 2,
unlocks the neighbor, calls solicit()
- On CPU 1, we enter neigh_event_send, find the ARP
queue is full and remove the first skb. The call to kfree_skb
will decrement the refcount to 1. This change is visible
on all CPUs immediately, because it's an atomic_dec.
The effects of __skb_unlink are not immediately visible
on the other CPUs however.
- On the first CPU, we return from solicit, call kfree_skb.
Refcount goes to zero, but __kfree_skb still sees the
skb->list pointer => barf.
One possible fix here would be to add an smp_wmb after the __skb_unlink
and a smp_rmb before the assertion in __kfree_skb, as in the attached
patch.
Does this make sense?
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@xxxxxxx | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
arp-skb-unlink-serialize
Description: Text document
|