pcp
[Top] [All Lists]

Re: Possible libpcp threading deadlock via pmlogger

To: Nathan Scott <nathans@xxxxxxxxxx>
Subject: Re: Possible libpcp threading deadlock via pmlogger
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Mon, 15 Oct 2012 05:48:46 +1100
Cc: pcp@xxxxxxxxxxx
In-reply-to: <784881246.11759577.1350105729295.JavaMail.root@xxxxxxxxxx>
References: <784881246.11759577.1350105729295.JavaMail.root@xxxxxxxxxx>
On Sat, 2012-10-13 at 01:22 -0400, Nathan Scott wrote:
> Hi Ken,
> 
> I came across what looks like a threading-related deadlock
> while running QA earlier, keen to hear your thoughts. ...

Both pmlogger and pmlc are single-threaded (unless there is some under
the covers multi-threading in pmlogger as a consequence of the timer
handling), and __pmLock_libpcp is a recursive lock so the only way that
I know this could block is if we had unbounded lock recursion and ran
into some implementation maximum locking depth in the pthread_mutex
implementation.

We had a case like this before in pmlogger but that related to the per
context locks (which are also recursive) and this was fixed in commit
29d91a.

Commit de185c6 added code to detect lock recursion, but it needs to be
manually compiled into libpcp.

I'll take a look when I get a chance.

> ... It
> happened during test 134 which exercises some pmlogger/pmlc
> scenarios around multi-volume logs.  It had been stuck for
> many minutes (tens I think) when I intervened.  Here's some
> gdb state I captured that tells some of the story...
> 
> nathans@smash:/source/build/debug-info-path/git/pcp$ ps -ef | grep pcpqa
> root      8179  8154  0 14:43 pts/45   00:00:00 sudo su - pcpqa
> root      8180  8179  0 14:43 pts/45   00:00:00 su - pcpqa
> pcpqa     8181  8180  0 14:43 pts/45   00:00:00 -bash
> pcpqa     9304 10574  0 15:13 pts/45   00:00:00 sh 134
> pcpqa     9319  9304  0 15:13 pts/45   00:00:00 pmlogger -L -c /dev/null -l 
> /tmp/9304.log /tmp/9304
> pcpqa     9481  9304  0 15:14 pts/45   00:00:00 pmlc 9319
> pcpqa    10574  8181  0 14:44 pts/45   00:00:00 /bin/sh ./check -l
> nathans  12864 14971  0 15:59 pts/31   00:00:00 grep pcpqa
> nathans@smash:/source/build/debug-info-path/git/pcp$ sudo gdb -p 9319
> GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Attaching to process 9319
> Reading symbols from /usr/libexec/pcp/bin/pmlogger...Reading symbols from 
> /usr/lib/debug/usr/libexec/pcp/bin/pmlogger.debug...done.
> done.
> Reading symbols from /usr/lib64/libpcp.so.3...Reading symbols from 
> /usr/lib/debug/usr/lib64/libpcp.so.3.debug...done.
> done.
> Loaded symbols for /usr/lib64/libpcp.so.3
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols 
> found)...done.
> [Thread debugging using libthread_db enabled]
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib64/libnss_files.so.2
> 0x00007f38daeeb054 in __lll_lock_wait () from /lib64/libpthread.so.0
> Missing separate debuginfos, use: debuginfo-install 
> glibc-2.12-1.80.el6_3.5.x86_64
> (gdb) bt
> #0  0x00007f38daeeb054 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007f38daee63a3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2  0x00007f38daee6287 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00007f38db105f7a in pmWhichContext () at context.c:172
> #4  0x00007f38db114bd5 in pmAddProfile (indom=4294967295, instlist_len=0, 
> instlist=0x0) at profile.c:205
> #5  0x00007f38db576110 in log_callback (afid=<value optimized out>, 
> data=0x7f38dd71fca0) at callback.c:439
> #6  0x00007f38db12bac7 in onalarm (dummy=<value optimized out>) at AF.c:299
> #7  <signal handler called>
> #8  0x00007f38daee6281 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #9  0x00007f38db1390fc in __pmMultiThreaded (scope=2) at lock.c:128
> #10 0x00007f38db12b7ae in __pmAFblock () at AF.c:491
> #11 0x00007f38db5748a1 in main (argc=<value optimized out>, argv=<value 
> optimized out>) at pmlogger.c:894
> (gdb) #3  0x00007f38db105f7a in pmWhichContext () at context.c:172
> 172       PM_LOCK(__pmLock_libpcp);
> (gdb) l
> 167        * return curcontext, provided it is defined
> 168        */
> 169       int         sts;
> 170   
> 171       PM_INIT_LOCKS();
> 172       PM_LOCK(__pmLock_libpcp);
> 173       if (PM_TPD(curcontext) > PM_CONTEXT_UNDEF)
> 174           sts = PM_TPD(curcontext);
> 175       else
> 176           sts = PM_ERR_NOCONTEXT;
> (gdb) p __pmLock_libpcp
> $1 = {__data = {__lock = 2, __count = 0, __owner = 0, __nusers = 0, __kind = 
> 1, __spins = 0, __list = {__prev = 0x0, 
>       __next = 0x0}}, __size = "\002", '\000' <repeats 15 times>, "\001", 
> '\000' <repeats 22 times>, __align = 2}
> (gdb) 
> 
> nathans@smash:/source/build/debug-info-path/git/pcp$ sudo gdb -p 9481
> GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Attaching to process 9481
> Reading symbols from /usr/bin/pmlc...Reading symbols from 
> /usr/lib/debug/usr/bin/pmlc.debug...done.
> done.
> Reading symbols from /usr/lib64/libpcp.so.3...Reading symbols from 
> /usr/lib/debug/usr/lib64/libpcp.so.3.debug...done.
> done.
> Loaded symbols for /usr/lib64/libpcp.so.3
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols 
> found)...done.
> [Thread debugging using libthread_db enabled]
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib64/libnss_files.so.2
> 0x00000038edee8e82 in recv () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install 
> glibc-2.12-1.80.el6_3.5.x86_64
> (gdb) bt
> #0  0x00000038edee8e82 in recv () from /lib64/libc.so.6
> #1  0x00007ffd333d6700 in pduread (fd=3, buf=<value optimized out>, len=12, 
> part=-1, timeout=0) at pdu.c:238
> #2  0x00007ffd333d69a1 in __pmGetPDU (fd=3, mode=0, timeout=0, 
> result=0x7fff7e92a368) at pdu.c:408
> #3  0x00007ffd333f936e in __pmConnectLogger (hostname=<value optimized out>, 
> pid=0x60d5a0, port=0x60d5a4)
>     at logconnect.c:172
> #4  0x0000000000405b7e in ConnectLogger (hostname=0x7fff7e92a4e0 "smash", 
> pid=0x60d5a0, port=0x60d5a4) at actions.c:129
> #5  0x0000000000402527 in main (argc=<value optimized out>, argv=<value 
> optimized out>) at pmlc.c:218
> 
> 
> [ from the interrupted test log after killing pmlogger, has the
>   line number from the test sh process: ]
> > 134: line 175:  9319 Killed                  pmlogger -L -c /dev/null -l 
> > $tmp.log $tmp
> 
> 
> cheers.
> 
> --
> Nathan


<Prev in Thread] Current Thread [Next in Thread>