xfs
[Top] [All Lists]

Re: [PATCH] Re: Corrupted XFS log replay oops.

To: Eric Sesterhenn <snakebyte@xxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>, Nick Piggin <nickpiggin@xxxxxxxxxxxx>, Pavel Machek <pavel@xxxxxxx>, Chris Mason <chris.mason@xxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, npiggin@xxxxxxxxxxxx, xfs@xxxxxxxxxxx
Subject: Re: [PATCH] Re: Corrupted XFS log replay oops.
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 23 Jan 2009 12:10:42 +1100
In-reply-to: <20090122233717.GB32390@disturbed>
Mail-followup-to: Eric Sesterhenn <snakebyte@xxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>, Nick Piggin <nickpiggin@xxxxxxxxxxxx>, Pavel Machek <pavel@xxxxxxx>, Chris Mason <chris.mason@xxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, npiggin@xxxxxxxxxxxx, xfs@xxxxxxxxxxx
References: <20090113142147.GE16333@alice> <20090120173455.GC21339@alice> <20090121035703.GH10158@disturbed> <200901211503.07308.nickpiggin@xxxxxxxxxxxx> <20090122043747.GU10158@disturbed> <20090122061158.GA31104@xxxxxxxxxxxxx> <20090122100648.GA16660@alice> <20090122233717.GB32390@disturbed>
User-agent: Mutt/1.5.18 (2008-05-17)
On Fri, Jan 23, 2009 at 10:37:17AM +1100, Dave Chinner wrote:
> On Thu, Jan 22, 2009 at 11:06:48AM +0100, Eric Sesterhenn wrote:
> > * Christoph Hellwig (hch@xxxxxxxxxxxxx) wrote:
> > > On Thu, Jan 22, 2009 at 03:37:47PM +1100, Dave Chinner wrote:
> > > >  xfs_buf_t *
> > > >  xlog_get_bp(
> > > >         xlog_t          *log,
> > > > -       int             num_bblks)
> > > > +       int             nbblks)
> > > 
> > > Any reason for reanming this variable?  That causes quite a bit of
> > > churn.
> > > 
> > > >  {
> > > > -       ASSERT(num_bblks > 0);
> > > > +       if (nbblks <= 0 || nbblks > log->l_logBBsize) {
> > > > +               xlog_warn("XFS: Invalid block length (0x%x) given for 
> > > > buffer", nbblks);
> > > 
> > > And doesn't prevent this line from needing a linebreak to stay under 80
> > > characters :)
> > > 
> > > Except for these nitpicks it looks fine to me.
> > 
> > Using the image at http://www.cccmz.de/~snakebyte/xfs.254.img.bz2
> > I was able to produce a pretty similar error with the patch applied
> 
> Different problem, obviously. ;)
> 
> I'll have a look at this later today....

One word: Ouch.

Basically the corruption introduced adds random feature bits into the
superblock that aren't actually in use. And hence instead of having
valid superblock fields for each of those features, they are zero
and so strange stuff happens.

What is really stupid is that the fields are often checked. By
ASSERT(), not by production code so a debug kernel will pick up the
problem and panic, while a production kernel will continue onwards
until it panics. This is not going to be a small patch.....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>