xfs
[Top] [All Lists]

Re: corruption

To: cattelan@xxxxxxxxxxx
Subject: Re: corruption
From: Rajagopal Ananthanarayanan <ananth@xxxxxxx>
Date: Wed, 26 Apr 2000 10:19:01 -0700
Cc: linux-xfs@xxxxxxxxxxx
References: <200004260743.AAA42400@xxxxxxxxxxxxxxxxxxxx> <14599.2543.798830.22528W@xxxxxxxxxxxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
cattelan@xxxxxxxxxxx wrote:
> 
> At Wed, 26 Apr 100 00:43:16 -0700 (PDT),
> Ananth Ananthanarayanan <ananth@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> >
> >
> > Here are my findings in an effort to track down the corruption.
> >
> > I backed up all the way to a tree as of 3/30/2000.6:00.
> > You can do that sort of thing with '-t' option to p_tupdate.
> > This is the time I know kernel compilation was working fine
> > when PAGEBUF_META was off.
> >
> > However, today I tried 3/30/2000 with PAGEBUF_META turned on,
> > and the corruption showed up.  With the config turned off as
> > I tried about 4 weeks back, the corruption went away.
> >
> > Switching to my tot workarea with page-cleaner stuff compiled in,
> > but run-time turned off, and with PAGEBUF_META off,
> > I can now compile the kernel again.
> >
> > So there is a strong correlation between PAGEBUF_META and corruption.
> > My own hunch is that either the 'block no' calculation in
> > pagebuf is wrong, or the 'rele/hold problem' is doing an I/O
> > when it shouldn't be.
> >
> > Suggestion #1 is to ensure that the kernel compiles cleanly
> > in several tries with a tot kernel and PAGEBUF_META off.
> >
> > I'm likely going back to working on delalloc tomorrow,
> >
> > ananth.
> 
> Hmm very odd.
> So if the block number was wrong on a meta data write, thus
> possibly dropping meta data into file data, the data in
> the file should be meta data... which it doesn't seem to be.

You're right on that aspect.

> 
> A hold/rele shouldn't be dropping data in a file either, corrupting
> the meta data aspect of the file system maybe?

Don't know on that. If I/O is initiated on a free'd pagebuf,
then the contents of the I/O is unknown. So the corruption
doesn't necessarily have to show meta-data in it.

> 
> Hmm very strange...
> 
> BTW the a 2 thread version of doio ran all night!
> I also ran 5 compiles at the same time on a different file system
> (same system) they all ran to completion.
> 
> I also haven't seem any pb_hold count to low messages.

Note that as I said originally, it is lmbench which consistently
reproduces the hold problem. I've seen it on kernel compiles
only occasionally.

ananth.

<Prev in Thread] Current Thread [Next in Thread>