On Tue, Sep 20, 2011 at 08:56:55AM -0500, Chandra Seetharaman wrote:
> Ran the xfstests (auto) overnight and didn't see any new issues.
Sure, but xfstests won't be triggering the new failure paths.
It looks to me like any failure to get a buffer will now result in a
cancelled transaction and a filesystem shutdown - the new failure
paths really need to be tested to ensure that failures are handled
gracefully and don't result in filesystem corruption.
As it is, I'm not sure we want to do this. The only reason we can
fail to get a buffer is allocation failures in extremely low memory
conditions. However, the last thing we want is for filesystem
shutdowns to be triggered by transient low memory conditions.
The current state of the code is that the xfs_buf_get() code tries
really, really hard to allocate memory, and we don't have any
evidence to point to the fact that is it failing to allocate memory.
We'd be seeing asserts firing and/or NULL pointer deref panics if
xfs_buf_get() was failing, and neither of these are happening.
As it is, before we can gracefully handle memory allocation failures
in the xfs_buf layer, we need to be able to roll back dirty
transactions so that memory allocation failure does not result
filesystem shutdowns. That's actually possible to do now with the
delayed logging infrastructure (because the CIL keeps a copy of the
previous in memory modifications prior to the bad transaction), so
we should look towards implementing transaction rollback first
before allowing memory allocation to fail inside transaction