xfs
[Top] [All Lists]

Re: file corruption during emacs build on XFS logical volume

To: Sean Neakums <sneakums@xxxxxxxx>
Subject: Re: file corruption during emacs build on XFS logical volume
From: Stephen Lord <lord@xxxxxxx>
Date: Sat, 05 Jan 2002 13:16:47 -0600
Cc: Linux XFS <linux-xfs@xxxxxxxxxxx>
References: <1010174871.30053.6.camel@xxxxxxxxxxxxxxxxxxxx> <1010176193.2938.14.camel@UberGeek> <1010176393.30037.9.camel@xxxxxxxxxxxxxxxxxxxx> <1010179700.30053.13.camel@xxxxxxxxxxxxxxxxxxxx> <1010187371.30053.32.camel@xxxxxxxxxxxxxxxxxxxx> <6uy9jdhdsr.fsf@xxxxxxxxxxxxx> <6upu4ph8c5.fsf@xxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.6) Gecko/20011120
Sean Neakums wrote:

begin  Sean Neakums quotation:

begin  Steve Lord quotation:

Sean, if you could apply the xfs patches for 2.4.16 (on oss.sgi.com
in the projects/xfs/download/patches/2.4.16 directory) and see if
you could make this fail I would appreciate it. As I said, for me
this worked just fine - as did adding all the I/O path related
changes I made before Christmas.

Sure thing, I'll get that built and tested.  The 2.4.16 that's
failing for me was built on the 18th December.  The cvs update
probably didn't happen very long before that.  I should try the
io-test/emacs dump procedure on that kernel too, for completeness.


I built a Linus 2.4.16, patched for XFS with the patch from the above
location, and I'm certain that the bug was not present in XFS when
that patch was generated.  I successfully recreated the problem with
the CVS 2.4.16 of the 18th using the io-test procedure.  (I hadn't
done this previously with io-test on that kernel; I had only tried the
Debian Emacs build up to now on that kernel.)

I've found the best way to do this reliably is to use vmstat (I used
"vmstat 3") to watch the amount of used cache.  For example, on this
machine, which has 256M, used cache stabilised at about 198M with four
copies of io-test running to four separate files, and ~1100 in the bo
column.

At that point, I kicked off the dump, which took over a minute to
complete, and started another io-test.  The machine load got to the
point where a VT switch took many seconds to be acknowledged.  I then
killed one of the io-test instances and attempted to start the dumped
Emacs.  The dumped Emacsen started perfectly (though very slowly) on
vanilla 2.4.16-plus-patch, but the ones dumped on the CVS-pulled
2.4.16 failed with various nasty dynamic linker errors due to the
corruption.

OK, thanks, this narrows it down even more than I had - I was running a kernel from Dec 22nd and recreating the problem. I had tried the individual patches to
the I/O path and failed to recreate it - but maybe I should try that again.

Steve




<Prev in Thread] Current Thread [Next in Thread>