cattelan@xxxxxxxxxxx wrote:
>
> At Wed, 26 Apr 100 00:43:16 -0700 (PDT),
> Ananth Ananthanarayanan <ananth@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> >
> >
> > Here are my findings in an effort to track down the corruption.
> >
> > I backed up all the way to a tree as of 3/30/2000.6:00.
> > You can do that sort of thing with '-t' option to p_tupdate.
> > This is the time I know kernel compilation was working fine
> > when PAGEBUF_META was off.
> >
> > However, today I tried 3/30/2000 with PAGEBUF_META turned on,
> > and the corruption showed up. With the config turned off as
> > I tried about 4 weeks back, the corruption went away.
> >
> > Switching to my tot workarea with page-cleaner stuff compiled in,
> > but run-time turned off, and with PAGEBUF_META off,
> > I can now compile the kernel again.
> >
> > So there is a strong correlation between PAGEBUF_META and corruption.
> > My own hunch is that either the 'block no' calculation in
> > pagebuf is wrong, or the 'rele/hold problem' is doing an I/O
> > when it shouldn't be.
> >
> > Suggestion #1 is to ensure that the kernel compiles cleanly
> > in several tries with a tot kernel and PAGEBUF_META off.
> >
> > I'm likely going back to working on delalloc tomorrow,
> >
> > ananth.
>
> Hmm very odd.
> So if the block number was wrong on a meta data write, thus
> possibly dropping meta data into file data, the data in
> the file should be meta data... which it doesn't seem to be.
You're right on that aspect.
>
> A hold/rele shouldn't be dropping data in a file either, corrupting
> the meta data aspect of the file system maybe?
Don't know on that. If I/O is initiated on a free'd pagebuf,
then the contents of the I/O is unknown. So the corruption
doesn't necessarily have to show meta-data in it.
>
> Hmm very strange...
>
> BTW the a 2 thread version of doio ran all night!
> I also ran 5 compiles at the same time on a different file system
> (same system) they all ran to completion.
>
> I also haven't seem any pb_hold count to low messages.
Note that as I said originally, it is lmbench which consistently
reproduces the hold problem. I've seen it on kernel compiles
only occasionally.
ananth.
|