On Fri, Jul 04, 2014 at 07:15:09AM -0700, Christoph Hellwig wrote:
> On Fri, Jul 04, 2014 at 03:57:13PM +1000, Dave Chinner wrote:
> > @@ -632,6 +632,12 @@ libxfs_putbuf(xfs_buf_t *bp)
> > pthread_mutex_unlock(&bp->b_lock);
> > }
> > }
> > + /*
> > + * ensure that any errors on this use of the buffer don't carry
> > + * over to the next user.
> > + */
> > + bp->b_error = 0;
> > +
> > cache_node_put(libxfs_bcache, (struct cache_node *)bp);
> This seems a bit fishy to me. For one I'm pretty sure it needs to be
> done before unlocking b_lock,
> second it's different behavior from the
> kernel where we explicitly clear it in the caller for the rare case
> we want to reuse a buffer that had an error (xfs_buf_iodone_callbacks
> seems to be the only one). Any reason to do this differently in
The userspace code that uses the buffer cache is much less
constrained than the kernel code. The userspace code is pretty nasty
in places, especially when it comes to buffer error handling.
We can't clear errors or zero buffer contents in libxfs_getbuf-*
like we do in the kernel, because those functions are used by the
libxfs_readbuf_* functions and hence need to leave the buffers
unchanged on cache hits. This is actually the only way to gather a
write error from a libxfs_writebuf() call - you need to
get the buffer again so you can check bp->b_error field - assuming
that the buffer is still in the cache when you check, that is.
This is very different to the kernel code which idoes not release
buffers on a write so we can wait on IO and check errors. The kernel
buffer cache also guarantees a buffer of a known initial state from
xfs_buf_get() even on a cache hit.
Hence the userspace buffer cache is behaving quite differently to
the kernel buffer cache and as a result it's leaking errors from
reads, invalidations and writes through
xfs_da_get_buf/libxfs_getbuf. Current no userspace outside libxfs
code clears bp->b_error - very little code even checks it - so th
elibxfs code is tripping on stale errors left by the usrspace code.
libxfs_writebuf() already zeros bp->b_error to prevent propagation
of stale errors into future reads, so this patch is really just
closing the hole in the other buffer release path that the code
Doing a full audit and addition of error handling of all the
userspace code is a little beyond my resources right now. The only
thing I can really do quickly about the problem is clear the error
when we've finished with the buffer.
I'm open to other ways of fixing this, but right now we've got to
fix xfs_repair because it's currently breaking filesystems worse
than before xfs_repair was run...