xfs
[Top] [All Lists]

Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)

To: "Dan Williams" <dan.j.williams@xxxxxxxxx>
Subject: Re: Kernel 2.6.19.2 New RAID 5 Bug (oops when writing Samba -> RAID5)
From: Neil Brown <neilb@xxxxxxx>
Date: Tue, 23 Jan 2007 13:06:40 +1100
Cc: "Chuck Ebbert" <cebbert@xxxxxxxxxx>, "Justin Piszcz" <jpiszcz@xxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, linux-raid@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx
In-reply-to: message from Dan Williams on Monday January 22
References: <Pine.LNX.4.64.0701200718290.29223@xxxxxxxxxxxxxxxx> <45B5261B.1050104@xxxxxxxxxx> <17845.13256.284461.992275@xxxxxxxxxxxxxx> <e9c3a7c20701221744i7f0de470r4c8183dd3bcd20b5@xxxxxxxxxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
On Monday January 22, dan.j.williams@xxxxxxxxx wrote:
> On 1/22/07, Neil Brown <neilb@xxxxxxx> wrote:
> > On Monday January 22, cebbert@xxxxxxxxxx wrote:
> > > Justin Piszcz wrote:
> > > > My .config is attached, please let me know if any other information is
> > > > needed and please CC (lkml) as I am not on the list, thanks!
> > > >
> > > > Running Kernel 2.6.19.2 on a MD RAID5 volume.  Copying files over Samba 
> > > > to
> > > > the RAID5 running XFS.
> > > >
> > > > Any idea what happened here?
> > ....
> > > >
> > > Without digging too deeply, I'd say you've hit the same bug Sami Farin
> > > and others
> > > have reported starting with 2.6.19: pages mapped with kmap_atomic()
> > > become unmapped
> > > during memcpy() or similar operations.  Try disabling preempt -- that
> > > seems to be the
> > > common factor.
> >
> > That is exactly the conclusion I had just come to (a kmap_atomic page
> > must be being unmapped during memcpy).  I wasn't aware that others had
> > reported it - thanks for that.
> >
> > Turning off CONFIG_PREEMPT certainly seems like a good idea.
> >
> Coming from an ARM background I am not yet versed in the inner
> workings of kmap_atomic, but if you have time for a question I am
> curious as to why spin_lock(&sh->lock)  is not sufficient pre-emption
> protection for copy_data() in this case?
> 

Presumably there is a bug somewhere.
kmap_atomic itself calls inc_preempt_count so that preemption should
be disabled at least until the kunmap_atomic is called.

But apparently not.  The symptoms point exactly to the page getting
unmapped when it shouldn't.  Until that bug is found and fixed, the
work around of turning of CONFIG_PREEMPT seems to make sense.

Of course it would be great if someone who can easily reproduce this
bug could do the 'git bisect' thing to find out where the bug crept
in.....

NeilBrown


<Prev in Thread] Current Thread [Next in Thread>