Hello,
We are currently experiencing a problem with pmlogextract producing
archives with corrupted instance domain definitions. Comparing the
output of pmdumplog -i run on some original logs versus the output when
run on portions of those logs created with pmlogextract, where there are
periods of no change to the instance domain definitions in the original
logs, there can be many changes in the partial logs. These extra
definitions can both be missing certain instances and adding other
instances from non-included portions of the original log. This can make
automatic summarization and analysis of these partial logs difficult to
perform accurately when dealing with metrics with frequently-changing
instance domains, such as the proc.* metrics.
I've attached a small example that demonstrates these problems.
node_archive is the original log run through pmlogrewrite to make the
issue clearer to see (the corruption occurs in the same way when
pmlogextract is run on the unfiltered original logs), and
extract_archive is the result of running this command:
TZ="UTC" /usr/libexec/pcp/bin/pmlogextract -S "@ Nov 27 15:52:30" -T "@
Nov 27 15:54:30" node_archive extract_archive
At 10:53:44 log time, there is no change to the instance domains in the
node archive. However, in the extracted archive at that time, process
1533 has been removed, despite having a value at that time, and process
1503 has been added, even though 1503 was removed before the time window
given to pmlogextract. In this example, these two problems occur at the
same time, but we have seen other extracted logs where one of the
problems affects instance domains at some timestamps and the other
problem affects instance domains at other timestamps.
If someone could take a look into what might be causing these issues and
provide any advice, we would be very appreciative.
Thank you!
Tom Yearke
pcp_archives.zip
Description: Zip archive
|