[PATCH 06/12] repair: use recursive buffer locking
Dave Chinner
david at fromorbit.com
Mon Dec 12 20:22:08 CST 2011
On Fri, Dec 02, 2011 at 12:46:25PM -0500, Christoph Hellwig wrote:
> On a sufficiently corrupt filesystem walking the btree nodes might hit the
> same node node again, which currently will deadlock. Use a recursion
> counter to avoid the direct deadlock and let them normal loop detection
> (two bad nodes and out) do its work. This is how repair behaved before
> we added the lock when implementing buffer prefetching.
>
> Reported-by: Arkadiusz Mi??kiewicz <arekm at maven.pl>
> Tested-by: Arkadiusz Mi??kiewicz <arekm at maven.pl>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
>
> Index: xfsprogs-dev/include/libxfs.h
> ===================================================================
> --- xfsprogs-dev.orig/include/libxfs.h 2011-11-22 22:28:23.000000000 +0000
> +++ xfsprogs-dev/include/libxfs.h 2011-11-22 22:34:27.000000000 +0000
> @@ -226,6 +226,8 @@ typedef struct xfs_buf {
> unsigned b_bcount;
> dev_t b_dev;
> pthread_mutex_t b_lock;
> + pthread_t b_holder;
> + unsigned int b_recur;
> void *b_fsprivate;
> void *b_fsprivate2;
> void *b_fsprivate3;
> Index: xfsprogs-dev/libxfs/rdwr.c
> ===================================================================
> --- xfsprogs-dev.orig/libxfs/rdwr.c 2011-11-22 22:28:23.000000000 +0000
> +++ xfsprogs-dev/libxfs/rdwr.c 2011-11-22 22:40:01.000000000 +0000
> @@ -342,6 +342,8 @@ libxfs_initbuf(xfs_buf_t *bp, dev_t devi
> list_head_init(&bp->b_lock_list);
> #endif
> pthread_mutex_init(&bp->b_lock, NULL);
> + bp->b_holder = 0;
> + bp->b_recur = 0;
> }
>
> xfs_buf_t *
> @@ -410,18 +412,24 @@ libxfs_getbuf_flags(dev_t device, xfs_da
> return NULL;
>
> if (use_xfs_buf_lock) {
> - if (flags & LIBXFS_GETBUF_TRYLOCK) {
> - int ret;
> + int ret;
>
> - ret = pthread_mutex_trylock(&bp->b_lock);
> - if (ret) {
> - ASSERT(ret == EAGAIN);
> - cache_node_put(libxfs_bcache, (struct cache_node *)bp);
> - return NULL;
> + ret = pthread_mutex_trylock(&bp->b_lock);
> + if (ret) {
> + ASSERT(ret == EAGAIN);
> + if (flags & LIBXFS_GETBUF_TRYLOCK)
> + goto out_put;
> +
> + if (pthread_equal(bp->b_holder, pthread_self())) {
> + fprintf(stderr,
> + _("recursive buffer locking detected\n"));
"Warning: recursive buffer locking @ bno %lld detected"
might be more informative, especially to do with the severity of the
issue.
> + bp->b_recur++;
> + } else {
> + pthread_mutex_lock(&bp->b_lock);
> }
> - } else {
> - pthread_mutex_lock(&bp->b_lock);
> }
> +
> + bp->b_holder = pthread_self();
That should probably only be written in the branch where the lock is
taken not every time through here.
Also, it might be worth commenting that the only reason there isn't
a race checking bp->b_holder without holding a lock is that the
holder is initialised to zero and cleared before the buffer lock is
dropped so that when a concurrent lookup fails the value of b_holder
will never match the failed thread ID.
Otherwise, looks good.
Reviewed-by: Dave Chinner <dchinner at redhat.com>
--
Dave Chinner
david at fromorbit.com
More information about the xfs
mailing list