pcp
[Top] [All Lists]

Possible libpcp threading deadlock via pmlogger

To: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Subject: Possible libpcp threading deadlock via pmlogger
From: Nathan Scott <nathans@xxxxxxxxxx>
Date: Sat, 13 Oct 2012 01:22:09 -0400 (EDT)
Cc: pcp@xxxxxxxxxxx
In-reply-to: <1502895691.11759081.1350104461446.JavaMail.root@xxxxxxxxxx>
Reply-to: Nathan Scott <nathans@xxxxxxxxxx>
Hi Ken,

I came across what looks like a threading-related deadlock
while running QA earlier, keen to hear your thoughts.  It
happened during test 134 which exercises some pmlogger/pmlc
scenarios around multi-volume logs.  It had been stuck for
many minutes (tens I think) when I intervened.  Here's some
gdb state I captured that tells some of the story...

nathans@smash:/source/build/debug-info-path/git/pcp$ ps -ef | grep pcpqa
root      8179  8154  0 14:43 pts/45   00:00:00 sudo su - pcpqa
root      8180  8179  0 14:43 pts/45   00:00:00 su - pcpqa
pcpqa     8181  8180  0 14:43 pts/45   00:00:00 -bash
pcpqa     9304 10574  0 15:13 pts/45   00:00:00 sh 134
pcpqa     9319  9304  0 15:13 pts/45   00:00:00 pmlogger -L -c /dev/null -l 
/tmp/9304.log /tmp/9304
pcpqa     9481  9304  0 15:14 pts/45   00:00:00 pmlc 9319
pcpqa    10574  8181  0 14:44 pts/45   00:00:00 /bin/sh ./check -l
nathans  12864 14971  0 15:59 pts/31   00:00:00 grep pcpqa
nathans@smash:/source/build/debug-info-path/git/pcp$ sudo gdb -p 9319
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 9319
Reading symbols from /usr/libexec/pcp/bin/pmlogger...Reading symbols from 
/usr/lib/debug/usr/libexec/pcp/bin/pmlogger.debug...done.
done.
Reading symbols from /usr/lib64/libpcp.so.3...Reading symbols from 
/usr/lib/debug/usr/lib64/libpcp.so.3.debug...done.
done.
Loaded symbols for /usr/lib64/libpcp.so.3
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols 
found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols 
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols 
found)...done.
Loaded symbols for /lib64/libnss_files.so.2
0x00007f38daeeb054 in __lll_lock_wait () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install 
glibc-2.12-1.80.el6_3.5.x86_64
(gdb) bt
#0  0x00007f38daeeb054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f38daee63a3 in _L_lock_892 () from /lib64/libpthread.so.0
#2  0x00007f38daee6287 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f38db105f7a in pmWhichContext () at context.c:172
#4  0x00007f38db114bd5 in pmAddProfile (indom=4294967295, instlist_len=0, 
instlist=0x0) at profile.c:205
#5  0x00007f38db576110 in log_callback (afid=<value optimized out>, 
data=0x7f38dd71fca0) at callback.c:439
#6  0x00007f38db12bac7 in onalarm (dummy=<value optimized out>) at AF.c:299
#7  <signal handler called>
#8  0x00007f38daee6281 in pthread_mutex_lock () from /lib64/libpthread.so.0
#9  0x00007f38db1390fc in __pmMultiThreaded (scope=2) at lock.c:128
#10 0x00007f38db12b7ae in __pmAFblock () at AF.c:491
#11 0x00007f38db5748a1 in main (argc=<value optimized out>, argv=<value 
optimized out>) at pmlogger.c:894
(gdb) #3  0x00007f38db105f7a in pmWhichContext () at context.c:172
172         PM_LOCK(__pmLock_libpcp);
(gdb) l
167          * return curcontext, provided it is defined
168          */
169         int         sts;
170     
171         PM_INIT_LOCKS();
172         PM_LOCK(__pmLock_libpcp);
173         if (PM_TPD(curcontext) > PM_CONTEXT_UNDEF)
174             sts = PM_TPD(curcontext);
175         else
176             sts = PM_ERR_NOCONTEXT;
(gdb) p __pmLock_libpcp
$1 = {__data = {__lock = 2, __count = 0, __owner = 0, __nusers = 0, __kind = 1, 
__spins = 0, __list = {__prev = 0x0, 
      __next = 0x0}}, __size = "\002", '\000' <repeats 15 times>, "\001", 
'\000' <repeats 22 times>, __align = 2}
(gdb) 

nathans@smash:/source/build/debug-info-path/git/pcp$ sudo gdb -p 9481
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 9481
Reading symbols from /usr/bin/pmlc...Reading symbols from 
/usr/lib/debug/usr/bin/pmlc.debug...done.
done.
Reading symbols from /usr/lib64/libpcp.so.3...Reading symbols from 
/usr/lib/debug/usr/lib64/libpcp.so.3.debug...done.
done.
Loaded symbols for /usr/lib64/libpcp.so.3
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols 
found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols 
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols 
found)...done.
Loaded symbols for /lib64/libnss_files.so.2
0x00000038edee8e82 in recv () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install 
glibc-2.12-1.80.el6_3.5.x86_64
(gdb) bt
#0  0x00000038edee8e82 in recv () from /lib64/libc.so.6
#1  0x00007ffd333d6700 in pduread (fd=3, buf=<value optimized out>, len=12, 
part=-1, timeout=0) at pdu.c:238
#2  0x00007ffd333d69a1 in __pmGetPDU (fd=3, mode=0, timeout=0, 
result=0x7fff7e92a368) at pdu.c:408
#3  0x00007ffd333f936e in __pmConnectLogger (hostname=<value optimized out>, 
pid=0x60d5a0, port=0x60d5a4)
    at logconnect.c:172
#4  0x0000000000405b7e in ConnectLogger (hostname=0x7fff7e92a4e0 "smash", 
pid=0x60d5a0, port=0x60d5a4) at actions.c:129
#5  0x0000000000402527 in main (argc=<value optimized out>, argv=<value 
optimized out>) at pmlc.c:218


[ from the interrupted test log after killing pmlogger, has the
  line number from the test sh process: ]
> 134: line 175:  9319 Killed                  pmlogger -L -c /dev/null -l 
> $tmp.log $tmp


cheers.

--
Nathan

<Prev in Thread] Current Thread [Next in Thread>