pcp
[Top] [All Lists]

[Bug 1069] New: libpcp AF functionality has posix-signal-unsafe elements

To: pcp@xxxxxxxxxxx
Subject: [Bug 1069] New: libpcp AF functionality has posix-signal-unsafe elements
From: bugzilla-daemon@xxxxxxxxxxx
Date: Mon, 13 Oct 2014 14:18:50 +0000
Auto-submitted: auto-generated
Delivered-to: pcp@xxxxxxxxxxx
Bug ID 1069
Summary libpcp AF functionality has posix-signal-unsafe elements
Product pcp
Version unspecified
Hardware All
OS Linux
Status NEW
Severity major
Priority P5
Component pcp
Assignee pcp@kenj.com.au
Reporter fche@redhat.com
CC pcp@oss.sgi.com
Classification Unclassified

An inspection of the libpcp/src/AF.c code indicates that it relies on
unsafe mechanisms that can cause heisencrashes.  The gist of the
problem is that from within a SIGALRM signal handler, it is not safe
to invoke general libc/application functions.  A poorly timed callback
can corrupt e.g. libc malloc/stdio or libpcp internals, or result in
hangs.  (These have been observed in the wild, just not necessarily
in the context of pcp.)

Some general references on async-signal safety, which applies to the
whole transitive callchain of signal handlers:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
https://www.gnu.org/software/libc/manual/html_node/POSIX-Safety-Concepts.html

The specific problems include:

AF.c:onalarm()
- calling free (heap operations)
- calling stdio printf/putc (e.g. in pmDebug case)
- pmprintf() and pmflush()

AF.c:enqueue() (called both from onalarm and __pmAFregister):
- doing list manipulation with limited (AFhold) reentrancy controls,
  which could be broken if e.g. __pmAFregister is interrupted by
  some other signal wherein __pmAF* functions are called.

The lack of documented constraints on __pmAFregister callback
functions in pmaf.3 leads clients to do risky things:

pmlogger.c:run_done_callback
- more stdio
pmlogger.c:vol_switch_callback
- more stdio, including fopen/fclose
- more racy AF manpulation
callback.c:log_callback
- heap operations
- many general LIBPCP ops

perl/PMDA/PMDA.xs:timer_callback
- call into general perl interpreter


It may be possible to trigger some example problems with
numerous/rapid/unsafe-content __pmAFregister callbacks.  I got a toy
program to show some malloc corruption, but some of the race windows
are short enough that auditing rather than simple tests may be
necessary.  (Lengthening some of the race windows by inserting
usleep() here and there might help.)


The longevity of this code testifies that these races & corruption are
infrequent, so fixing the problems is not urgent.  One possible
thorough approach for an eventual fix would be to move away from
timers/signal handlers, and manage events/timing at (say) the exit of
PMAPI functions, or with a more formal application-main-loop
mechanism.  Some of the races may be shrunk with more aggressive
__pmAFblock, and/or nestedness counting for AF.c:block.


You are receiving this mail because:
  • You are on the CC list for the bug.
<Prev in Thread] Current Thread [Next in Thread>
  • [Bug 1069] New: libpcp AF functionality has posix-signal-unsafe elements, bugzilla-daemon <=