On 07/17/2013 04:12 PM, Ken McDonell wrote:
> On 17/07/13 23:15, Frank Ch. Eigler wrote:
>> ..
>> OK, as long as we observe the requirement that we do not accidentally
>> regenerate / modify any files that a sysadmin has created (whether
>> that was by hand or by a prior interactive pm*conf run).
>
> Just to reinforce the mail exchange Nathan and I had yesterday, I now believe
> there to be no evidence of regeneration or modification of files in the case
> I reported that triggered this whole discussion. I did not follow a
> sanctioned upgrade path which let to version mismatched pieces being
> installed ... the cause was idiot user error, not PCP error.
>
>>> *nod* - this problem I have seen in real production environments, and it
>>> is sorta-handled in a non-intuitive way - as soon as the remote host goes
>>> away, pmlogger loses the connection and it exits [...]
>>
>> This sounds like an unfortunate policy, if for example there are
>> temporary network glitches or a quick reboot. A 30-minute re-poll is
>> IMO too slow.
>
> Let me outline the constraints here, and maybe we can brainstorm a better
> approach.
>
> 1. When a host is indeed rebooted, we have to start a new PCP archive ... the
> hw config may be different (even without human intervention in some HA
> worlds), instance domains may be different, etc. so all of the PCP metadata
> needs to be written afresh and the "log once" metrics written to the new
> archive.
>
> 2. pmlogger cannot tell the difference between a network outage and remote
> host reboot so if the connection to pmcd is closed, or a PDU get/put
> timesout, then pmlogger must finish the current PCP archive. But we
> could/should consider setting $PMCD_REQUEST_TIMEOUT to be something larger
> than the default 10 seconds as pmlogger in particular is tolerant of
> delayed PDUs coming back from pmcd, so that pmlogger is less exposed to
> short-term network glitches.
>
Just a note that in Ron Kerry's how to setup PCP (in a wiki at SGI) he suggests
120 seconds
for the timeout value.
> 3. pmlogger knows nothing of the date+timestamp[+sequence#] naming convention
> that the scripts around pmlogger use to name the PCP archives
>
> _______________________________________________
> pcp mailing list
> pcp@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/pcp
--
-----------------------------------------------------------------------
Jeff Hanson - jhanson@xxxxxxx - Field Technical Analyst
You can choose a ready guide in some celestial voice.
If you choose not to decide, you still have made a choice.
You can choose from phantom fears and kindness that can kill;
I will choose a path that's clear
I will choose freewill. - Lee/Lifeson/Peart
|