Hi Ken,
I came across what looks like a threading-related deadlock
while running QA earlier, keen to hear your thoughts. It
happened during test 134 which exercises some pmlogger/pmlc
scenarios around multi-volume logs. It had been stuck for
many minutes (tens I think) when I intervened. Here's some
gdb state I captured that tells some of the story...
nathans@smash:/source/build/debug-info-path/git/pcp$ ps -ef | grep pcpqa
root 8179 8154 0 14:43 pts/45 00:00:00 sudo su - pcpqa
root 8180 8179 0 14:43 pts/45 00:00:00 su - pcpqa
pcpqa 8181 8180 0 14:43 pts/45 00:00:00 -bash
pcpqa 9304 10574 0 15:13 pts/45 00:00:00 sh 134
pcpqa 9319 9304 0 15:13 pts/45 00:00:00 pmlogger -L -c /dev/null -l
/tmp/9304.log /tmp/9304
pcpqa 9481 9304 0 15:14 pts/45 00:00:00 pmlc 9319
pcpqa 10574 8181 0 14:44 pts/45 00:00:00 /bin/sh ./check -l
nathans 12864 14971 0 15:59 pts/31 00:00:00 grep pcpqa
nathans@smash:/source/build/debug-info-path/git/pcp$ sudo gdb -p 9319
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 9319
Reading symbols from /usr/libexec/pcp/bin/pmlogger...Reading symbols from
/usr/lib/debug/usr/libexec/pcp/bin/pmlogger.debug...done.
done.
Reading symbols from /usr/lib64/libpcp.so.3...Reading symbols from
/usr/lib/debug/usr/lib64/libpcp.so.3.debug...done.
done.
Loaded symbols for /usr/lib64/libpcp.so.3
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libnss_files.so.2
0x00007f38daeeb054 in __lll_lock_wait () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.80.el6_3.5.x86_64
(gdb) bt
#0 0x00007f38daeeb054 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f38daee63a3 in _L_lock_892 () from /lib64/libpthread.so.0
#2 0x00007f38daee6287 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f38db105f7a in pmWhichContext () at context.c:172
#4 0x00007f38db114bd5 in pmAddProfile (indom=4294967295, instlist_len=0,
instlist=0x0) at profile.c:205
#5 0x00007f38db576110 in log_callback (afid=<value optimized out>,
data=0x7f38dd71fca0) at callback.c:439
#6 0x00007f38db12bac7 in onalarm (dummy=<value optimized out>) at AF.c:299
#7 <signal handler called>
#8 0x00007f38daee6281 in pthread_mutex_lock () from /lib64/libpthread.so.0
#9 0x00007f38db1390fc in __pmMultiThreaded (scope=2) at lock.c:128
#10 0x00007f38db12b7ae in __pmAFblock () at AF.c:491
#11 0x00007f38db5748a1 in main (argc=<value optimized out>, argv=<value
optimized out>) at pmlogger.c:894
(gdb) #3 0x00007f38db105f7a in pmWhichContext () at context.c:172
172 PM_LOCK(__pmLock_libpcp);
(gdb) l
167 * return curcontext, provided it is defined
168 */
169 int sts;
170
171 PM_INIT_LOCKS();
172 PM_LOCK(__pmLock_libpcp);
173 if (PM_TPD(curcontext) > PM_CONTEXT_UNDEF)
174 sts = PM_TPD(curcontext);
175 else
176 sts = PM_ERR_NOCONTEXT;
(gdb) p __pmLock_libpcp
$1 = {__data = {__lock = 2, __count = 0, __owner = 0, __nusers = 0, __kind = 1,
__spins = 0, __list = {__prev = 0x0,
__next = 0x0}}, __size = "\002", '\000' <repeats 15 times>, "\001",
'\000' <repeats 22 times>, __align = 2}
(gdb)
nathans@smash:/source/build/debug-info-path/git/pcp$ sudo gdb -p 9481
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 9481
Reading symbols from /usr/bin/pmlc...Reading symbols from
/usr/lib/debug/usr/bin/pmlc.debug...done.
done.
Reading symbols from /usr/lib64/libpcp.so.3...Reading symbols from
/usr/lib/debug/usr/lib64/libpcp.so.3.debug...done.
done.
Loaded symbols for /usr/lib64/libpcp.so.3
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libnss_files.so.2
0x00000038edee8e82 in recv () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.80.el6_3.5.x86_64
(gdb) bt
#0 0x00000038edee8e82 in recv () from /lib64/libc.so.6
#1 0x00007ffd333d6700 in pduread (fd=3, buf=<value optimized out>, len=12,
part=-1, timeout=0) at pdu.c:238
#2 0x00007ffd333d69a1 in __pmGetPDU (fd=3, mode=0, timeout=0,
result=0x7fff7e92a368) at pdu.c:408
#3 0x00007ffd333f936e in __pmConnectLogger (hostname=<value optimized out>,
pid=0x60d5a0, port=0x60d5a4)
at logconnect.c:172
#4 0x0000000000405b7e in ConnectLogger (hostname=0x7fff7e92a4e0 "smash",
pid=0x60d5a0, port=0x60d5a4) at actions.c:129
#5 0x0000000000402527 in main (argc=<value optimized out>, argv=<value
optimized out>) at pmlc.c:218
[ from the interrupted test log after killing pmlogger, has the
line number from the test sh process: ]
> 134: line 175: 9319 Killed pmlogger -L -c /dev/null -l
> $tmp.log $tmp
cheers.
--
Nathan
|