pcp
[Top] [All Lists]

Fwd: Re: [pcp] proc pmda oddness - qa 022

To: PCP Mailing List <pcp@xxxxxxxxxxx>
Subject: Fwd: Re: [pcp] proc pmda oddness - qa 022
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Fri, 08 Nov 2013 09:19:24 +1100
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <527C0D86.4080107@xxxxxxxxxxxxxxxx>
References: <527C0D86.4080107@xxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0
oops ... meant this to go to the list.


-------- Original Message --------
Subject: Re: [pcp] proc pmda oddness - qa 022
Date: Fri, 08 Nov 2013 09:00:38 +1100
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
To: Nathan Scott <nathans@xxxxxxxxxx>

I've tracked this one down (I think).

There appears to be a logic error in fetch_proc_pid_stat() when handling
a zero sized "wchan" file.

When the read returns zero, following the "eh?" comment we set sts to -1
... this just seems wrong ... if the wchan is not available, the rest of
the proc stat info should be ok.  This is especially so as the code
behaves this way if the wchan file cannot be opened (see the check
earlier in the code after the proc_open() call for the wchan case).

The attached patch (which includes a lot of new DESPERATE debugging code
to help identify the problem) works for me, and QA 022 passes on the
hosts it was previously failing on.  And check -g pmda.proc runs on
these same hosts with no new failures, so no obvious regressions that I
can see.

Before committing this change, I'd appreciate some feedback.

The bit I _really_ don't understand is why this has not bitten before
and why now it appears to be hard fail on some systems and hard pass on
others and what has changed (this may be related to the relatively
recent change to use /proc/PID/task/NNN and maybe wchan there has
different semantics and state to wchan below /proc/PID that we would
have been using previously).




Attachment: patch.pcp
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>