On Fri, Jan 14, 2005 at 05:16:12PM -0500, Jeff Garzik wrote:
Blah. Any other suggestions for debugging this thing?
Yes I have a better theory now :)
All your "badness" messages start with a call to udpv6_sendmsg().
That function calls ip6_dst_lookup() to get its dst entry. Note
that udpv6_sendmsg() does not hold a lock on the sk at all. However,
ip6_dst_lookup() uses __sk_dst_check() which is only safe if you can
either guarantee single-threadedness or if you hold sk_dst_lock.
Neither is true here and therefore we may have a situation where
the cached dst is released twice. In fact I tracked down the
address closest to the "badness" messages and it belongs to
one of your domain's name servers. That means the requests were
probably made by named, which is multi-threaded.
So please give this patch a spin and see if it makes things any
better. I've verified that no callers to ip6_dst_lookup() holds
sk_dst_lock so it's safe (but possibly redundant in cases where
they hold locks on the sk itself) to use sk_dst_check().