xfs
[Top] [All Lists]

Re: [PATCH] repair: avoid ABBA deadlocks on prefetched buffers

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: [PATCH] repair: avoid ABBA deadlocks on prefetched buffers
From: Arkadiusz Miśkiewicz <arekm@xxxxxxxx>
Date: Wed, 23 Nov 2011 18:27:24 +0100
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=maven.pl; s=maven; h=from:to:subject:date:user-agent:cc:references:in-reply-to :mime-version:content-type:content-transfer-encoding:message-id; bh=Q3JkOaJf/0CUfnwZ+PS9A4zywWzswMRNciYznGKeSZk=; b=qotpn00PvmuPWCIuffS1BCxYv+1iRuhpxoCZVonE6NUZbGoTCC2swm+SlAYQHHjoq0 VywcJhU+6fVOk3zubn9vRU4/reeJBWR2Qiw5XfHDO3xxyQ+SJuESTDLUYraPYFPACSw7 YeFX/10gZHWhv517TPhNyJlWCWojJjisCxwXk=
In-reply-to: <20111122224620.GA20107@xxxxxxxxxxxxx>
References: <20111115210953.GA6670@xxxxxxxxxxxxx> <201111180944.10048.arekm@xxxxxxxx> <20111122224620.GA20107@xxxxxxxxxxxxx>
User-agent: KMail/1.13.7 (Linux/3.2.0-rc2-00400-g866d43c; KDE/4.7.3; x86_64; ; )
On Tuesday 22 of November 2011, Christoph Hellwig wrote:
> On Fri, Nov 18, 2011 at 09:44:09AM +0100, Arkadiusz Mi??kiewicz wrote:
> > On Tuesday 15 of November 2011, Christoph Hellwig wrote:
> > > Both the prefetch threads and actual repair processing threads can have
> > > multiple buffers at a time locked, but they do no use a common locker
> > > order, which can lead to ABBA deadlocks while trying to lock the
> > > buffers.
> > 
> > There is still some issue with deadlocking.
> > 
> > The last printed messages:
> > b????dna liczba magiczna 0x41425443 w bloku inobt 2/1438099
> > b????dna liczba magiczna 0x41425443 w bloku inobt 2/1438196
> > b????dna liczba magiczna 0x41425443 w bloku inobt 2/1438732
> > (invalid magic number ... in block inobt ...)
> 
> It looks like you have a circular loop in the inobt tree, and repair
> deadlocks trying to read the same node again.  Below is a patch working
> around that by allowing recursive locking for the buffer lock and then
> letting the normal two strikes and out policy apply.  I'm not overly
> proud of the patch, but in the short term I can't think of anything
> better.

Seems still deadlocking

Last lines on console:
bad hash table for directory inode 13655493544 (brak wpisu danych): przebudowano
rebuilding directory inode 13655493544
bad hash table for directory inode 13655509455 (brak wpisu danych): przebudowano
rebuilding directory inode 13655509455



[root@berta ~]# gdb ./xfs_repair_tcmalloc 23701
GNU gdb (GDB) 7.3.1-1 (PLD Linux)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pld-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/xfs_repair_tcmalloc...done.
Attaching to program: /root/xfs_repair_tcmalloc, process 23701
Reading symbols from /lib64/libuuid.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libuuid.so.1
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libtcmalloc_minimal.so.0...(no debugging symbols 
found)...done.
Loaded symbols for /lib64/libtcmalloc_minimal.so.0
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols 
found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 0x7fab01798700 (LWP 5134)]
[New Thread 0x7fab00f97700 (LWP 5133)]
[New Thread 0x7fab0279a700 (LWP 5132)]
[New Thread 0x7fab01f99700 (LWP 5131)]
[New Thread 0x7fab02f9b700 (LWP 5130)]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols 
found)...done.
Loaded symbols for /usr/lib64/libstdc++.so.6
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols 
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libgcc_s.so.1
0x00007fab0a7ed8e4 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007fab0a7ed8e4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fab0a7e91b5 in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007fab0a7e900a in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000004334f8 in libxfs_getbuf_flags (device=<optimized out>, 
blkno=<optimized out>, len=<optimized out>, flags=<optimized out>) at rdwr.c:428
#4  0x00000000004337ce in libxfs_readbuf (dev=65024, blkno=6827773504, len=8, 
flags=0) at rdwr.c:547
#5  0x0000000000434369 in libxfs_trans_read_buf (mp=<optimized out>, tp=0x0, 
dev=65024, blkno=6827773504, len=8, flags=0, bpp=0x7fff7510edd8) at trans.c:485
#6  0x0000000000443147 in xfs_da_do_buf (trans=0x0, dp=<optimized out>, 
bno=<optimized out>, mappedbnop=0x7fff7510ee48, bpp=0x12922faa8, 
    whichfork=<optimized out>, caller=2, ra=0x422354) at xfs_da_btree.c:2016
#7  0x00000000004354f4 in libxfs_da_read_bufr (trans=<optimized out>, 
dp=<optimized out>, bno=<optimized out>, mappedbno=6827773504, bpp=<optimized 
out>, 
    whichfork=<optimized out>) at util.c:635
#8  0x0000000000422354 in longform_dir2_entry_check (mp=0x7fff7510f300, 
ino=13655547166, ip=0x9a1a7a20, num_illegal=0x7fff7510f258, 
    need_dot=0x7fff7510f24c, irec=0xacfefc0, ino_offset=30, hashtab=0xa0028900) 
at phase6.c:2517
#9  0x0000000000424358 in process_dir_inode (mp=0x7fff7510f300, agno=<optimized 
out>, irec=0xacfefc0, ino_offset=30) at phase6.c:3307
#10 0x0000000000426f64 in traverse_function (arg=0x12a06c360, agno=3, 
wq=<optimized out>) at phase6.c:3622
#11 traverse_ags (mp=0x7fff7510f300) at phase6.c:3664
#12 phase6 (mp=0x7fff7510f300) at phase6.c:3756
#13 0x0000000000402c69 in main (argc=<optimized out>, argv=<optimized out>) at 
xfs_repair.c:772
(gdb) info threads
  Id   Target Id         Frame 
  6    Thread 0x7fab02f9b700 (LWP 5130) "xfs_repair_tcma" 0x00007fab0a7ed010 in 
sem_wait () from /lib64/libpthread.so.0
  5    Thread 0x7fab01f99700 (LWP 5131) "xfs_repair_tcma" 0x00007fab0a7eae6c in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4    Thread 0x7fab0279a700 (LWP 5132) "xfs_repair_tcma" 0x00007fab0a7eae6c in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3    Thread 0x7fab00f97700 (LWP 5133) "xfs_repair_tcma" 0x00007fab0a7eae6c in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2    Thread 0x7fab01798700 (LWP 5134) "xfs_repair_tcma" 0x00007fab0a7eae6c in 
pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1    Thread 0x7fab0b0e0760 (LWP 23701) "xfs_repair_tcma" 0x00007fab0a7ed8e4 
in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) thread 2
[Switching to thread 2 (Thread 0x7fab01798700 (LWP 5134))]
#0  0x00007fab0a7eae6c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
(gdb) bt
#0  0x00007fab0a7eae6c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00000000004297b3 in pf_io_worker (param=0x12a06c360) at prefetch.c:565
#2  0x00007fab0a7e6ed5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fab09fa3e5d in clone () from /lib64/libc.so.6
#4  0x0000000000000000 in ?? ()
(gdb) thread 3
[Switching to thread 3 (Thread 0x7fab00f97700 (LWP 5133))]
#0  0x00007fab0a7eae6c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
(gdb) bt
#0  0x00007fab0a7eae6c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00000000004297b3 in pf_io_worker (param=0x12a06c360) at prefetch.c:565
#2  0x00007fab0a7e6ed5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fab09fa3e5d in clone () from /lib64/libc.so.6
#4  0x0000000000000000 in ?? ()
(gdb) thread 4
[Switching to thread 4 (Thread 0x7fab0279a700 (LWP 5132))]
#0  0x00007fab0a7eae6c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
(gdb) bt
#0  0x00007fab0a7eae6c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00000000004297b3 in pf_io_worker (param=0x12a06c360) at prefetch.c:565
#2  0x00007fab0a7e6ed5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fab09fa3e5d in clone () from /lib64/libc.so.6
#4  0x0000000000000000 in ?? ()
(gdb) thread 5
[Switching to thread 5 (Thread 0x7fab01f99700 (LWP 5131))]
#0  0x00007fab0a7eae6c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
(gdb) bt
#0  0x00007fab0a7eae6c in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x00000000004297b3 in pf_io_worker (param=0x12a06c360) at prefetch.c:565
#2  0x00007fab0a7e6ed5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fab09fa3e5d in clone () from /lib64/libc.so.6
#4  0x0000000000000000 in ?? ()
(gdb) thread 6
[Switching to thread 6 (Thread 0x7fab02f9b700 (LWP 5130))]
#0  0x00007fab0a7ed010 in sem_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007fab0a7ed010 in sem_wait () from /lib64/libpthread.so.0
#1  0x0000000000429c72 in pf_queuing_worker (param=0x12a06c360) at 
prefetch.c:644
#2  0x00007fab0a7e6ed5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fab09fa3e5d in clone () from /lib64/libc.so.6
#4  0x0000000000000000 in ?? ()
(gdb)


> 
> 
> Index: xfsprogs-dev/include/libxfs.h
> ===================================================================
> --- xfsprogs-dev.orig/include/libxfs.h        2011-11-22 22:28:23.000000000 
> +0000
> +++ xfsprogs-dev/include/libxfs.h     2011-11-22 22:34:27.000000000 +0000
> @@ -226,6 +226,8 @@ typedef struct xfs_buf {
>       unsigned                b_bcount;
>       dev_t                   b_dev;
>       pthread_mutex_t         b_lock;
> +     pthread_t               b_holder;
> +     unsigned int            b_recur;
>       void                    *b_fsprivate;
>       void                    *b_fsprivate2;
>       void                    *b_fsprivate3;
> Index: xfsprogs-dev/libxfs/rdwr.c
> ===================================================================
> --- xfsprogs-dev.orig/libxfs/rdwr.c   2011-11-22 22:28:23.000000000 +0000
> +++ xfsprogs-dev/libxfs/rdwr.c        2011-11-22 22:40:01.000000000 +0000
> @@ -342,6 +342,8 @@ libxfs_initbuf(xfs_buf_t *bp, dev_t devi
>       list_head_init(&bp->b_lock_list);
>  #endif
>       pthread_mutex_init(&bp->b_lock, NULL);
> +     bp->b_holder = 0;
> +     bp->b_recur = 0;
>  }
> 
>  xfs_buf_t *
> @@ -410,18 +412,24 @@ libxfs_getbuf_flags(dev_t device, xfs_da
>               return NULL;
> 
>       if (use_xfs_buf_lock) {
> -             if (flags & LIBXFS_GETBUF_TRYLOCK) {
> -                     int ret;
> +             int ret;
> 
> -                     ret = pthread_mutex_trylock(&bp->b_lock);
> -                     if (ret) {
> -                             ASSERT(ret == EAGAIN);
> -                             cache_node_put(libxfs_bcache, (struct 
> cache_node *)bp);
> -                             return NULL;
> +             ret = pthread_mutex_trylock(&bp->b_lock);
> +             if (ret) {
> +                     ASSERT(ret == EAGAIN);
> +                     if (flags & LIBXFS_GETBUF_TRYLOCK)
> +                             goto out_put;
> +
> +                     if (pthread_equal(bp->b_holder, pthread_self())) {
> +                             fprintf(stderr,
> +     _("recursive buffer locking detected\n"));
> +                             bp->b_recur++;
> +                     } else {
> +                             pthread_mutex_lock(&bp->b_lock);
>                       }
> -             } else {
> -                     pthread_mutex_lock(&bp->b_lock);
>               }
> +
> +             bp->b_holder = pthread_self();
>       }
> 
>       cache_node_set_priority(libxfs_bcache, (struct cache_node *)bp,
> @@ -440,6 +448,9 @@ libxfs_getbuf_flags(dev_t device, xfs_da
>  #endif
> 
>       return bp;
> +out_put:
> +     cache_node_put(libxfs_bcache, (struct cache_node *)bp);
> +     return NULL;
>  }
> 
>  struct xfs_buf *
> @@ -458,8 +469,14 @@ libxfs_putbuf(xfs_buf_t *bp)
>       list_del_init(&bp->b_lock_list);
>       pthread_mutex_unlock(&libxfs_bcache->c_mutex);
>  #endif
> -     if (use_xfs_buf_lock)
> -             pthread_mutex_unlock(&bp->b_lock);
> +     if (use_xfs_buf_lock) {
> +             if (bp->b_recur) {
> +                     bp->b_recur--;
> +             } else {
> +                     bp->b_holder = 0;
> +                     pthread_mutex_unlock(&bp->b_lock);
> +             }
> +     }
>       cache_node_put(libxfs_bcache, (struct cache_node *)bp);
>  }


-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/

<Prev in Thread] Current Thread [Next in Thread>