The configuration used by pmlogger to create the original test archive
is below. Apart from the option to use this config file, the only other
arguments given to that pmlogger instance were an option to specify the
remote PMCD's host, an option to terminate the logger after 5 minutes,
and the output archive's name.
log mandatory on 30 seconds {
proc.psinfo.pid
}
On 12/2/2013 4:59 PM, Ken McDonell wrote:
Tom,
I'm travelling at the moment ... I should be able to investigate in a day or
so, if not one else beats me to it.
I would help clarify the semantics of the original pmlogger run to know what
sort of configuration file pmlogger was given initially.
-----Original Message-----
From: pcp-bounces@xxxxxxxxxxx [mailto:pcp-bounces@xxxxxxxxxxx] On Behalf Of
Tom Yearke
Sent: Tuesday, 3 December 2013 6:06 AM
To: pcp@xxxxxxxxxxx
Subject: [pcp] pmlogextract Indom Corruption
Hello,
We are currently experiencing a problem with pmlogextract producing archives
with corrupted instance domain definitions. Comparing the output of
pmdumplog -i run on some original logs versus the output when run on
portions of those logs created with pmlogextract, where there are periods of
no change to the instance domain definitions in the original logs, there can
be many changes in the partial logs. These extra definitions can both be
missing certain instances and adding other instances from non-included
portions of the original log. This can make automatic summarization and
analysis of these partial logs difficult to perform accurately when dealing
with metrics with frequently-changing instance domains, such as the proc.*
metrics.
I've attached a small example that demonstrates these problems.
node_archive is the original log run through pmlogrewrite to make the issue
clearer to see (the corruption occurs in the same way when pmlogextract is
run on the unfiltered original logs), and extract_archive is the result of
running this command:
TZ="UTC" /usr/libexec/pcp/bin/pmlogextract -S "@ Nov 27 15:52:30" -T "@ Nov
27 15:54:30" node_archive extract_archive
At 10:53:44 log time, there is no change to the instance domains in the node
archive. However, in the extracted archive at that time, process
1533 has been removed, despite having a value at that time, and process
1503 has been added, even though 1503 was removed before the time window
given to pmlogextract. In this example, these two problems occur at the same
time, but we have seen other extracted logs where one of the problems
affects instance domains at some timestamps and the other problem affects
instance domains at other timestamps.
If someone could take a look into what might be causing these issues and
provide any advice, we would be very appreciative.
Thank you!
Tom Yearke
|