Hi,
On 2016-02-11 02:22, Mark Goodwin wrote:
> On 02/10/2016 08:56 PM, Marko Myllynen wrote:>>
>> On 2016-02-09 20:46, Dave Brolley wrote:
>>> Changes committed to git://git.pcp.io/pcp.git master
>>>
>>> Mark Goodwin (2):
>>> pmrep: cleanup stdio on exit to avoid Exception Ignored errors
>>> qa: add test 880 to check pmrep for broken pipe exceptions and
>>> Exception Ignored errors
>>
>> I'm still seeing issues after this patch, in fact now I have issues
>> also with Python 2:
>>
>> $ python3 /tmp/pcp/bin/pmrep --archive $here/archives/20130706 -o csv
>> -u -S @10:00 -x kernel.all.sysfork | head -n 1
>> #
>> zsh: broken pipe python3 /tmp/pcp/bin/pmrep --archive
>> $here/archives/20130706 -o csv -u -S -x |
>> zsh: done head -n 1
>> $ python2 /tmp/pcp/bin/pmrep --archive $here/archives/20130706 -o csv
>> -u -S @10:00 -x kernel.all.sysfork | head -n 1
>> #
>> zsh: broken pipe python2 /tmp/pcp/bin/pmrep --archive
>> $here/archives/20130706 -o csv -u -S -x |
>> zsh: done head -n 1
>> $
>>
>> This is on RHEL 7 (Python 2.7 and Python 3.3 in use). The reason why
>> I'm so inclined to have this fixed shows above, with PRINT_EXIT_VALUE
>> Zsh prints some ugly diagnostics.
>
> Isn't zsh's PRINT_EXIT_VALUE doing just that here - printing that there was
> a broken pipe for pmrep (for which the signal was ignored via SIG_DFL),
> and that 'head' exited normally?
Zsh is doing the right thing of course but I see this as a QoI issue:
$ echo foo > foo.txt
$ ls -1 / | head -n 1
bin
$ ps -ef | head -n 1
UID PID PPID C STIME TTY TIME CMD
$ cat foo.txt | head -n 1
foo
$ grep foo foo.txt | head -n 1
foo
$ pmval -s 1 kernel.all.sysfork | head -n 1
$ pmprobe -v kernel.all.sysfork | head -n 0
$ pminfo -dfmtT kernel.all.sysfork | head -n 1
$ pmdumptext -s 1 kernel.all.sysfork | head -n 1
Tue Feb 16 12:34:56 ?
$ pmrep -s 1 kernel.all.sysfork | head -n 1
k.a.sysfork
$ python3 /tmp/pcp/bin/pmrep --archive $here/archives/20130706 -o csv -u -S
@10:00 -x kernel.all.sysfork | head -n 1
#
zsh: broken pipe python3 /tmp/pcp/bin/pmrep --archive $here/archives/20130706
-o csv -u -S -x |
zsh: done head -n 1
$ python3 /tmp/pcp/bin/pmiostat -s 3 | head -n 1
# Device rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz
await r_await w_await %util
$ python2 /tmp/pcp/bin/pmrep --archive $here/archives/20130706 -o csv -u -S
@10:00 -x kernel.all.sysfork | head -n 1
#
zsh: broken pipe python2 /tmp/pcp/bin/pmrep --archive $here/archives/20130706
-o csv -u -S -x |
zsh: done head -n 1
$ python2 /tmp/pcp/bin/pmiostat -s 3 | head -n 1
# Device rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz
await r_await w_await %util
$
(pmiostat also fails with -n 0 so I think -n 1 is more reasonable test.)
>> Does this work for everyone else or can this be a local hickup? I
>> already tried using SIG_IGN instead of SIG_DFL etc to no avail.
>
> I'll check on rhel7 - I've only been testing on f23 with py2 and py3.
> But basically - python installs a sigpipe handler to raise an exception
> on broken pipe. We trapped this in pmiostat to ignore the socket.error
> exception, but I think the better solution is to restore the SIGPIPE
> SIG_DFL
> handler from glibc, which is what pmrep is now doing - this avoids the
> python exception and ignores SIGPIPE by default. Not sure why SIG_IGN
> didn't work though. Anyway - so I'll change pmiostat to SIG_DFL SIGPIPE
> too unless anyone has a better idea?
The below patch cures all the cases for me, makes the code a bit easier
to read (IMHO), and still passes QA/880. The key difference is that we
don't touch stderr in finalize() at all, in a standalone test program
it works ok but errors out in pmrep although we don't do anything with
stderr in pmrep (so perhaps we're hitting some corner case of the Python
wrapper or libpcp itself). But since we now always flush stderr when
we write something into it I don't see a need to flush it again in
finalize(). Also flushes the output file a bit harder (as per os.html).
---
src/pmrep/pmrep.py | 39 ++++++++++++++++-----------------------
1 file changed, 16 insertions(+), 23 deletions(-)
diff --git a/src/pmrep/pmrep.py b/src/pmrep/pmrep.py
index 25d4812..9e2c233 100755
--- a/src/pmrep/pmrep.py
+++ b/src/pmrep/pmrep.py
@@ -53,8 +53,8 @@ try:
except:
import simplejson as json
import socket
-import signal
import struct
+import errno
import time
import copy
import sys
@@ -66,8 +66,6 @@ from cpmapi import PM_CONTEXT_ARCHIVE, PM_CONTEXT_HOST,
PM_CONTEXT_LOCAL, PM_MOD
from cpmapi import PM_TYPE_32, PM_TYPE_U32, PM_TYPE_64, PM_TYPE_U64,
PM_TYPE_FLOAT, PM_TYPE_DOUBLE, PM_TYPE_STRING
from cpmi import PMI_ERR_DUPINSTNAME
-signal.signal(signal.SIGPIPE, signal.SIG_DFL)
-
if sys.version_info[0] >= 3:
long = int
@@ -151,10 +149,12 @@ def send_to_zabbix(metrics, zabbix_host, zabbix_port,
timeout=15):
# debug: write('Got response from Zabbix: %s' % resp)
if resp.get('response') != 'success':
sys.stderr.write('Error response from Zabbix: %s', resp)
+ sys.stderr.flush()
return False
return True
except socket.timeout as err:
sys.stderr.write("Zabbix connection timed out: " + str(err))
+ sys.stderr.flush()
return False
finally:
zabbix.close()
@@ -1345,26 +1345,18 @@ class PMReporter(object):
def finalize(self):
""" Finalize and clean up """
- try:
- if self.writer:
+ if self.writer:
+ try:
self.writer.flush()
- if self.writer != sys.stdout:
- self.writer.close()
- self.writer = None
- if self.pmi:
- self.pmi.pmiEnd()
- self.pmi = None
- finally:
- try:
- sys.stdout.flush()
- finally:
- try:
- sys.stdout.close()
- finally:
- try:
- sys.stderr.flush()
- finally:
- sys.stderr.close()
+ except BrokenPipeError:
+ pass
+ if self.writer != sys.stdout:
+ os.fsync(self.writer.fileno())
+ self.writer.close()
+ self.writer = None
+ if self.pmi:
+ self.pmi.pmiEnd()
+ self.pmi = None
if __name__ == '__main__':
try:
@@ -1382,7 +1374,8 @@ if __name__ == '__main__':
except pmapi.pmUsageErr as usage:
usage.message()
except IOError as error:
- sys.stderr.write("%s\n" % str(error))
+ if error.errno != errno.EPIPE:
+ sys.stderr.write("%s\n" % str(error))
except KeyboardInterrupt:
sys.stdout.write("\n")
P.finalize()
Thanks,
--
Marko Myllynen
|