> I just spent today performing a pseudo binary search for a buffer head
> corruption I have been experiencing with XFS and RAID5. I have no idea
> why it only happens in this instance, as you'll see.
> In page_buf.c, around line 1424, a call is made to kmem_cache_alloc
> (). The short story is: at least one pointer is returned that is
> already in use!
> I wrote a function that steps through the buffer_head lists, and checks
> for b_next_free == NULL. Since it's a circular list, that should never
> be true.
> However, after the call to kmem_cache_alloc, and the subsequent 'memset
> (bh, 0,...)', I have my NULL. This also is the source of most of my
> Oopes from within buffer.c. Those functions are not expecting a NULL in
> b_next_free at all ;-).
> So: I've found it, but I have no idea why kmem_cache_alloc would return
> a previously used bh, nor what to do about it.
Hmm, I am not sure how kmem_cache_alloc can do that either, is it not more
likely that a buffer is being freed, but not removed from the list - i.e. the
needle is in that other haystack over there. Maybe turning on memory poisoning
will make things fall over faster - in mm/slab.c there are three defines :
#define DEBUG 0
#define STATS 0
#define FORCED_DEBUG 0
I think you want to set the DEBUG flag to 1
> "Men occasionally stumble over the truth, but most of them pick
> themselves up and hurry off as if nothing had happened."
> -- Winston Churchill