I suspect we can solve this problem with a large hammer that will work
on all platforms (Greg's suggestion won't apparently work on Mac OS X).
If the time ever goes backwards (not a small ntp adjustment, I mean
now - last_delay is < now at the last sample, so time went back more
than 2*last_delay), there is no point keeping any history and any notion
of a "rate" over the last interval is pretty wobbly.
If the time ever goes forward by a lot (e.g. now is > now at last sample
+ 2*last_delay), there is no point keeping any history and any notion of
a "rate" over the last interval is pretty wobbly.
Fortunately "no point keeping any history" looks a lot like reconnect to
pmcd after lost the connection to pmcd ... so we may be able to leverage
that logic.
This would cause a deep reset of pmie's notion of the time when it and
the system time get to be "more than a reasonable amount" out of whack,
and avoid any attempt to hurry up or slow down to drive pmie's notion of
time to be the same as the time of day the system is returning in these
circumstances.
I haven't tried coding this, but it seems more robust than trying to
accommodate large changes in the system time, assuming these events
don't happen too regularly (which would be a reason to make the window
of reasonableness relatively large).
On Wed, 2010-05-05 at 15:11 -0500, Martin Hicks wrote:
> On the next big SGI machines we ran into a problem where PMCD runs at
> 100%, all of that time spent collecting metrics from the linux pmda.
> Eventually we tracked the problem to PMIE failing to nanosleep().
>
> 1272757517.709705 select(5, [4], NULL, NULL, {9, 997965}) = 1 (in [4], left
> {9, 661665})
> 1272757518.046516 recvfrom(4, "\0\0\0H\0\0p\1\0\0015\365", 12, 0, NULL, NULL)
> = 12
> 1272757518.046600 select(5, [4], NULL, NULL, {10, 0}) = 1 (in [4], left {9,
> 999997})
> 1272757518.046644 recvfrom(4,
> "K\334\275\16\0\0\264\326\0\0\0\2\17\0\10\0\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\20"...,
> 60, 0, NULL, NULL) = 60
> 1272757518.046714 wait4(-1, 0x7fff22cbafbc, WNOHANG, NULL) = -1 ECHILD (No
> child processes)
> 1272757518.046808 gettimeofday({1272757518, 46817}, NULL) = 0
> 1272757518.046844 nanosleep({18446744073709013837, 18446744073410341545},
> 0x7fff22cbaf90) = -1 EINVAL (Invalid argument)
> 1272757518.046886 sendto(4,
> "\0\0\0$\0\0p\3\0\0017\253\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\17\0\10\0"..., 36,
> 0, NULL, 0) = 36
> 1272757518.047014 select(5, [4], NULL, NULL, {10, 0}) = 1 (in [4], left {9,
> 660755})
> 1272757518.386349 recvfrom(4, "\0\0\0H\0\0p\1\0\0015\365", 12, 0, NULL, NULL)
> = 12
> 1272757518.386398 select(5, [4], NULL, NULL, {10, 0}) = 1 (in [4], left {9,
> 999998})
> 1272757518.386442 recvfrom(4,
> "K\334\275\16\0\5\344s\0\0\0\2\17\0\10\0\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\20"...,
> 60, 0, NULL, NULL) = 60
> 1272757518.386496 wait4(-1, 0x7fff22cbafbc, WNOHANG, NULL) = -1 ECHILD (No
> child processes)
> 1272757518.386530 gettimeofday({1272757518, 386539}, NULL) = 0
> 1272757518.386562 nanosleep({18446744073709013957, 18446744073070619627},
> 0x7fff22cbaf90) = -1 EINVAL (Invalid argument)
> 1272757518.386599 sendto(4,
> "\0\0\0$\0\0p\3\0\0017\253\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\17\0\10\0"..., 36,
> 0, NULL, 0) = 36
> 1272757518.386724 select(5, [4], NULL, NULL, {10, 0}) = 1 (in [4], left {9,
> 660103})
>
>
> I believe what has happened is:
>
> - Machine's clock is set far in the past
> - pmie starts and notes the epoch for polling the pmcd. It then polls at even
> intervals where: next poll == epoch + poll_count * poll_delay
> - ntp starts, and warps time forward by a couple years.
> - pmie spins, failing nanosleep() because the "next poll" is still a long way
> in
> the past and advancing by poll_delay at each subsequent call until it gets
> to
> the present time.
>
>
> Attached is my first crack at fixing it, but I don't love it because:
>
> 1) It doesn't handle warping into the past. I think PMIE fails in
> similar ways, in that the nanosleep() is a few years in length.
>
> 2) small nit: we still do the waitpid() call with the way I have it
> setup now. I'd rather it not do any unnecessary work if its going to
> spin/fast-forward itself.
>
> I suspect if we fix #1 properly we may not need to worry about #2.
> Any suggestions about how we proceed to fix this? I am mostly worried
> about archives where the delta between samples may be very large. Are
> we concerned about this in practice?
>
> mh
>
>
> _______________________________________________
> pcp mailing list
> pcp@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/pcp
>
|