pcp
[Top] [All Lists]

[Bug 1104] New: signal delivery may lead to deadlock

To: pcp@xxxxxxxxxxx
Subject: [Bug 1104] New: signal delivery may lead to deadlock
From: bugzilla-daemon@xxxxxxxxxxx
Date: Sat, 14 Feb 2015 20:22:30 +0000
Auto-submitted: auto-generated
Delivered-to: pcp@xxxxxxxxxxx
Bug ID 1104
Summary signal delivery may lead to deadlock
Product pcp
Version unspecified
Hardware All
OS Linux
Status NEW
Severity normal
Priority P5
Component pcp
Assignee pcp@kenj.com.au
Reporter kenj@internode.on.net
CC pcp@oss.sgi.com
Classification Unclassified

Observed when QA 134 hung with pmlc never exiting ... because pmlogger is
blocked and not responding ...

(gdb) where
#0  0x004d8416 in __kernel_vsyscall ()
#1  0x00956e02 in __lll_lock_wait () from /lib/libpthread.so.0
#2  0x00952933 in _L_lock_654 () from /lib/libpthread.so.0
#3  0x00952814 in pthread_mutex_lock () from /lib/libpthread.so.0
#4  0x0047f8dc in __pmInitLocks () at lock.c:59
#5  0x00442594 in pmWhichContext () at context.c:236
#6  0x00451f90 in pmAddProfile (indom=indom@entry=4294967295, 
    instlist_len=instlist_len@entry=0, instlist=instlist@entry=0x0)
    at profile.c:205
#7  0x009a63ba in log_callback (afid=32772, data="" at callback.c:471
#8  0x0046d504 in onalarm (dummy=14) at AF.c:272
#9  <signal handler called>
#10 0x00953521 in __pthread_mutex_unlock_usercnt () from /lib/libpthread.so.0
#11 0x0047f8f2 in __pmInitLocks () at lock.c:98
#12 0x0046dc10 in __pmAFblock () at AF.c:468
#13 0x009a3498 in main (argc=7, argv=0xbfd0f664) at pmlogger.c:904

So we're in __pmInitLocks() releasing the local mutex that protects the "one
trip" guard (done) and pmlogger's timer goes off and we notice an indom change
so try to adjust the fetch profile via pmAddProfile (it does not really matter
how we entry libpcp at this point, so in this case it is pmAddProfile(), but
__pmInitLocks() is called from all over the place).

I think the impact of this is relatively low because outside pmlogger and pmie
we don't asynchronous signals in contexts like the failing one.

The fix is probably in pmlogger where our log_callback() is called in a signal
handler context and clearly violates the guidance here
http://www.opengroup.org/onlinepubs/007904975/functions/xsh_chap02_04.html#tag_02_04_04

Frank has alluded to this in the past, see
http://oss.sgi.com/bugzilla/show_bug.cgi?id=1069, so I guess it is time to go
fix it, at least in this case.

pmlc is NOT the problem but for completeness here is the pmlc traceback when it
is hung
#0  0x00f3b416 in __kernel_vsyscall ()
#1  0x4f44ea81 in recv () from /lib/libc.so.6
#2  0x00cb4d60 in recv (__flags=0, __n=12, __buf=0x8e77000, __fd=3)
    at /usr/include/bits/socket2.h:45
#3  __pmRecv (fd=fd@entry=3, buffer=buffer@entry=0x8e77000, 
    length=length@entry=12, flags=flags@entry=0) at secureconnect.c:1555
#4  0x00c7265c in pduread (fd=fd@entry=3, buf=buf@entry=0x8e77000 "", 
    len=<optimized out>, len@entry=12, part=part@entry=-1, 
    timeout=timeout@entry=0) at pdu.c:198
#5  0x00c73022 in __pmGetPDU (fd=fd@entry=3, mode=mode@entry=0, timeout=0, 
    result=result@entry=0xbfd1fd00) at pdu.c:379
#6  0x00c9d6c7 in __pmConnectLogger (connectionSpec=<optimized out>, 
    pid=0x80542e4, port=0x80542e0) at logconnect.c:391
#7  0x0804bca8 in ConnectLogger ()
#8  0x08049a9b in main ()

All of this on vm11 (i686 Debian 6.0.9) running PCP 3.10.3 although the problem
is not platform specific


You are receiving this mail because:
  • You are on the CC list for the bug.
<Prev in Thread] Current Thread [Next in Thread>