[Top] [All Lists]

Re: [PATCH v2] xfs: Check the return value of xfs_trans_get_buf()

To: Alex Elder <aelder@xxxxxxx>
Subject: Re: [PATCH v2] xfs: Check the return value of xfs_trans_get_buf()
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 22 Sep 2011 08:43:22 +1000
Cc: Chandra Seetharaman <sekharan@xxxxxxxxxx>, XFS Mailing List <xfs@xxxxxxxxxxx>
In-reply-to: <1316637518.5872.39.camel@doink>
References: <1316527015.9298.60.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20110921005612.GL15688@dastard> <1316637518.5872.39.camel@doink>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, Sep 21, 2011 at 03:38:38PM -0500, Alex Elder wrote:
> On Wed, 2011-09-21 at 10:56 +1000, Dave Chinner wrote:
> > On Tue, Sep 20, 2011 at 08:56:55AM -0500, Chandra Seetharaman wrote:
> > > Ran the xfstests (auto) overnight and didn't see any new issues.
> > 
> > Sure, but xfstests won't be triggering the new failure paths.
> > 
> > It looks to me like any failure to get a buffer will now result in a
> > cancelled transaction and a filesystem shutdown - the new failure
> > paths really need to be tested to ensure that failures are handled
> > gracefully and don't result in filesystem corruption.
> This is true.
> > As it is, I'm not sure we want to do this. The only reason we can
> > fail to get a buffer is allocation failures in extremely low memory
> > conditions. However, the last thing we want is for filesystem
> > shutdowns to be triggered by transient low memory conditions.
> But a failure to get a buffer, not checked, can't be good
> can it?  In other words, the patch now adds handling for
> a xfs_buf_get() failure, which avoids a kernel-mode null
> pointer dereference in the off chance that would happen.
> That's worse than a filesystem shutdown.

See my previous post about how the behaviour of the FS after an oops
is very different to a shutdown.

> Now I grant you your earlier statement, namely that it's
> conceivable that the new error return could lead to a
> previously un-exercised path that leads to file system
> corruption.
> But I do believe that all these places handle *an* error
> (not the specific ENOMEM error), so those spots already
> are generally prepared for something to go wrong.

Yes, but that's making the assumption that those error handling
routines handle the new failure cases correctly. The inode
allocation case is one I'm particularly concerned about...

> I didn't anticipate this, and the patch has already been
> committed.  I need to know whether people think this is
> critical enough for me to revert the patch.

In the long run, it doesn't matter. All I'm concerned about is that
everyone is aware of the potential downsides of this change, and why
not handling the error might be preferrable in the short term until
we get transaction rollbacks sorted out....


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>