On Sat, 2012-10-13 at 01:22 -0400, Nathan Scott wrote:
> Hi Ken,
>
> I came across what looks like a threading-related deadlock
> while running QA earlier, keen to hear your thoughts. ...
Both pmlogger and pmlc are single-threaded (unless there is some under
the covers multi-threading in pmlogger as a consequence of the timer
handling), and __pmLock_libpcp is a recursive lock so the only way that
I know this could block is if we had unbounded lock recursion and ran
into some implementation maximum locking depth in the pthread_mutex
implementation.
We had a case like this before in pmlogger but that related to the per
context locks (which are also recursive) and this was fixed in commit
29d91a.
Commit de185c6 added code to detect lock recursion, but it needs to be
manually compiled into libpcp.
I'll take a look when I get a chance.
> ... It
> happened during test 134 which exercises some pmlogger/pmlc
> scenarios around multi-volume logs. It had been stuck for
> many minutes (tens I think) when I intervened. Here's some
> gdb state I captured that tells some of the story...
>
> nathans@smash:/source/build/debug-info-path/git/pcp$ ps -ef | grep pcpqa
> root 8179 8154 0 14:43 pts/45 00:00:00 sudo su - pcpqa
> root 8180 8179 0 14:43 pts/45 00:00:00 su - pcpqa
> pcpqa 8181 8180 0 14:43 pts/45 00:00:00 -bash
> pcpqa 9304 10574 0 15:13 pts/45 00:00:00 sh 134
> pcpqa 9319 9304 0 15:13 pts/45 00:00:00 pmlogger -L -c /dev/null -l
> /tmp/9304.log /tmp/9304
> pcpqa 9481 9304 0 15:14 pts/45 00:00:00 pmlc 9319
> pcpqa 10574 8181 0 14:44 pts/45 00:00:00 /bin/sh ./check -l
> nathans 12864 14971 0 15:59 pts/31 00:00:00 grep pcpqa
> nathans@smash:/source/build/debug-info-path/git/pcp$ sudo gdb -p 9319
> GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Attaching to process 9319
> Reading symbols from /usr/libexec/pcp/bin/pmlogger...Reading symbols from
> /usr/lib/debug/usr/libexec/pcp/bin/pmlogger.debug...done.
> done.
> Reading symbols from /usr/lib64/libpcp.so.3...Reading symbols from
> /usr/lib/debug/usr/lib64/libpcp.so.3.debug...done.
> done.
> Loaded symbols for /usr/lib64/libpcp.so.3
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> found)...done.
> [Thread debugging using libthread_db enabled]
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libnss_files.so.2
> 0x00007f38daeeb054 in __lll_lock_wait () from /lib64/libpthread.so.0
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.12-1.80.el6_3.5.x86_64
> (gdb) bt
> #0 0x00007f38daeeb054 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x00007f38daee63a3 in _L_lock_892 () from /lib64/libpthread.so.0
> #2 0x00007f38daee6287 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3 0x00007f38db105f7a in pmWhichContext () at context.c:172
> #4 0x00007f38db114bd5 in pmAddProfile (indom=4294967295, instlist_len=0,
> instlist=0x0) at profile.c:205
> #5 0x00007f38db576110 in log_callback (afid=<value optimized out>,
> data=0x7f38dd71fca0) at callback.c:439
> #6 0x00007f38db12bac7 in onalarm (dummy=<value optimized out>) at AF.c:299
> #7 <signal handler called>
> #8 0x00007f38daee6281 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #9 0x00007f38db1390fc in __pmMultiThreaded (scope=2) at lock.c:128
> #10 0x00007f38db12b7ae in __pmAFblock () at AF.c:491
> #11 0x00007f38db5748a1 in main (argc=<value optimized out>, argv=<value
> optimized out>) at pmlogger.c:894
> (gdb) #3 0x00007f38db105f7a in pmWhichContext () at context.c:172
> 172 PM_LOCK(__pmLock_libpcp);
> (gdb) l
> 167 * return curcontext, provided it is defined
> 168 */
> 169 int sts;
> 170
> 171 PM_INIT_LOCKS();
> 172 PM_LOCK(__pmLock_libpcp);
> 173 if (PM_TPD(curcontext) > PM_CONTEXT_UNDEF)
> 174 sts = PM_TPD(curcontext);
> 175 else
> 176 sts = PM_ERR_NOCONTEXT;
> (gdb) p __pmLock_libpcp
> $1 = {__data = {__lock = 2, __count = 0, __owner = 0, __nusers = 0, __kind =
> 1, __spins = 0, __list = {__prev = 0x0,
> __next = 0x0}}, __size = "\002", '\000' <repeats 15 times>, "\001",
> '\000' <repeats 22 times>, __align = 2}
> (gdb)
>
> nathans@smash:/source/build/debug-info-path/git/pcp$ sudo gdb -p 9481
> GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> Attaching to process 9481
> Reading symbols from /usr/bin/pmlc...Reading symbols from
> /usr/lib/debug/usr/bin/pmlc.debug...done.
> done.
> Reading symbols from /usr/lib64/libpcp.so.3...Reading symbols from
> /usr/lib/debug/usr/lib64/libpcp.so.3.debug...done.
> done.
> Loaded symbols for /usr/lib64/libpcp.so.3
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> found)...done.
> [Thread debugging using libthread_db enabled]
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols
> found)...done.
> Loaded symbols for /lib64/libnss_files.so.2
> 0x00000038edee8e82 in recv () from /lib64/libc.so.6
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.12-1.80.el6_3.5.x86_64
> (gdb) bt
> #0 0x00000038edee8e82 in recv () from /lib64/libc.so.6
> #1 0x00007ffd333d6700 in pduread (fd=3, buf=<value optimized out>, len=12,
> part=-1, timeout=0) at pdu.c:238
> #2 0x00007ffd333d69a1 in __pmGetPDU (fd=3, mode=0, timeout=0,
> result=0x7fff7e92a368) at pdu.c:408
> #3 0x00007ffd333f936e in __pmConnectLogger (hostname=<value optimized out>,
> pid=0x60d5a0, port=0x60d5a4)
> at logconnect.c:172
> #4 0x0000000000405b7e in ConnectLogger (hostname=0x7fff7e92a4e0 "smash",
> pid=0x60d5a0, port=0x60d5a4) at actions.c:129
> #5 0x0000000000402527 in main (argc=<value optimized out>, argv=<value
> optimized out>) at pmlc.c:218
>
>
> [ from the interrupted test log after killing pmlogger, has the
> line number from the test sh process: ]
> > 134: line 175: 9319 Killed pmlogger -L -c /dev/null -l
> > $tmp.log $tmp
>
>
> cheers.
>
> --
> Nathan
|