On Mon, 30 Oct 2000, The Lemming wrote:
> > Could you send me a copy of your archive for me to test?
>
> I am attaching it to this letter.
>
> > > Debug outputs with explanation are below. What do you think? Can I do
> > > somethink to make it work. (I have used pcp 2.1.10)
> >
> > I don't really see what's wrong with the debug stuff below. Can you please
> > explain in more detail?
>
> When I fetch all three metrics, the values for apache.total_accesses and
> apache.total_kbytes are some high numbers that doesn't make any sense. When I
> try to fetch just those two (using exactly the same time ,as you can see from
> the 'Gathering for' timestamps), I get values that are OK and you can see that
> they have no connection to the values got in previous case.
It looks like apache.busy_servers is reporting bad values too. In the
archive you sent me, the highest value for apache.busy_servers was 7,
yet you are reporting values as high as 64. Are we looking at the same
archive?
In any case, there is a very fundamental difference between PM_MODE_FORW
and PM_MODE_INTERP. You need to read the pmSetMode(3) man page very carefully,
especially this bit:
# If the mode is PM_MODE_FORW then, in the case of pmFetch(3), the collec
# tion of recorded metric values will be scanned in a forwards direction in
# time, until values for at least one of the requested metrics is located
# after the time origin, and then all requested metrics stored in the log or
# archive at that time will be returned with the corresponding timestamp. A
# mode of PM_MODE_FORW may only be used with an archive context.
The "hidden meaning" here is that when using PM_MODE_FORW, you only get
values back for metrics which are actually available for the time you asked
for them. You can see this with pmdumplog, e.g. :
pmdumplog -a 20001024.08.09 apache.total_accesses \
apache.total_kbytes apache.busy_servers
# ... lots deleted
# 17:19:52.999 68.0.0 (apache.total_accesses): No values returned!
# 68.0.1 (apache.total_kbytes): No values returned!
# 68.0.6 (apache.busy_servers): value 1
#
# [68 bytes]
# 17:19:53.003 68.0.0 (apache.total_accesses): value 791
# 68.0.1 (apache.total_kbytes): value 1040
# 68.0.6 (apache.busy_servers): No values returned!
#
# [56 bytes]
# 17:20:53.001 68.0.0 (apache.total_accesses): No values returned!
# 68.0.1 (apache.total_kbytes): No values returned!
# 68.0.6 (apache.busy_servers): value 1
# ... lots more deleted
See how the availability of values is interleaved? I can't be sure, but
does your code assume a value is returned for every metric in every fetch?
When using PM_MODE_FORW, that assumption is not true for the case where
different metrics are logged with different sampling intervals. However,
when using PM_MODE_INTERP it is true. Hence you should always use
PM_MODE_INTERP in your code. This will also allow you to replay archives
at any update/sampling interval, which is not possible with PM_MODE_FORW.
In addition, if you have PM_SEM_INSTANT metrics that should not be
interpolated during archive replay, (e.g. perhaps apache.busy_servers because
it does not make sense to have 0.6 servers) then you should change their
semantics to PM_SEM_DISCRETE instead. This tells pmFetch not to interpolate
values between samples. See the man page for pmLookupDesc(3) for details.
I hope all this is not too confusing!!
-- Mark
|