pcp
[Top] [All Lists]

pmie and system clock changes

To: pcp@xxxxxxxxxxx
Subject: pmie and system clock changes
From: Martin Hicks <mort@xxxxxxx>
Date: Wed, 5 May 2010 15:11:54 -0500
User-agent: Mutt/1.5.17 (2007-11-01)
On the next big SGI machines we ran into a problem where PMCD runs at
100%, all of that time spent collecting metrics from the linux pmda.
Eventually we tracked the problem to PMIE failing to nanosleep().

1272757517.709705 select(5, [4], NULL, NULL, {9, 997965}) = 1 (in [4], left {9, 
661665})
1272757518.046516 recvfrom(4, "\0\0\0H\0\0p\1\0\0015\365", 12, 0, NULL, NULL) = 
12
1272757518.046600 select(5, [4], NULL, NULL, {10, 0}) = 1 (in [4], left {9, 
999997})
1272757518.046644 recvfrom(4, 
"K\334\275\16\0\0\264\326\0\0\0\2\17\0\10\0\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\20"...,
 60, 0, NULL, NULL) = 60
1272757518.046714 wait4(-1, 0x7fff22cbafbc, WNOHANG, NULL) = -1 ECHILD (No 
child processes)
1272757518.046808 gettimeofday({1272757518, 46817}, NULL) = 0
1272757518.046844 nanosleep({18446744073709013837, 18446744073410341545}, 
0x7fff22cbaf90) = -1 EINVAL (Invalid argument)
1272757518.046886 sendto(4, 
"\0\0\0$\0\0p\3\0\0017\253\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\17\0\10\0"..., 36, 
0, NULL, 0) = 36
1272757518.047014 select(5, [4], NULL, NULL, {10, 0}) = 1 (in [4], left {9, 
660755})
1272757518.386349 recvfrom(4, "\0\0\0H\0\0p\1\0\0015\365", 12, 0, NULL, NULL) = 
12
1272757518.386398 select(5, [4], NULL, NULL, {10, 0}) = 1 (in [4], left {9, 
999998})
1272757518.386442 recvfrom(4, 
"K\334\275\16\0\5\344s\0\0\0\2\17\0\10\0\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\20"..., 
60, 0, NULL, NULL) = 60
1272757518.386496 wait4(-1, 0x7fff22cbafbc, WNOHANG, NULL) = -1 ECHILD (No 
child processes)
1272757518.386530 gettimeofday({1272757518, 386539}, NULL) = 0
1272757518.386562 nanosleep({18446744073709013957, 18446744073070619627}, 
0x7fff22cbaf90) = -1 EINVAL (Invalid argument)
1272757518.386599 sendto(4, 
"\0\0\0$\0\0p\3\0\0017\253\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\17\0\10\0"..., 36, 
0, NULL, 0) = 36
1272757518.386724 select(5, [4], NULL, NULL, {10, 0}) = 1 (in [4], left {9, 
660103})


I believe what has happened is:

- Machine's clock is set far in the past
- pmie starts and notes the epoch for polling the pmcd.  It then polls at even
  intervals where:  next poll == epoch + poll_count * poll_delay
- ntp starts, and warps time forward by a couple years.
- pmie spins, failing nanosleep() because the "next poll" is still a long way in
  the past and advancing by poll_delay at each subsequent call until it gets to
  the present time.


Attached is my first crack at fixing it, but I don't love it because:

1) It doesn't handle warping into the past.  I think PMIE fails in
  similar ways, in that the nanosleep() is a few years in length.

2) small nit:  we still do the waitpid() call with the way I have it
   setup now. I'd rather it not do any unnecessary work if its going to
   spin/fast-forward itself.

I suspect if we fix #1 properly we may not need to worry about #2.
Any suggestions about how we proceed to fix this?  I am mostly worried
about archives where the delta between samples may be very large.  Are
we concerned about this in practice?

mh

Attachment: pmie.patch
Description: Text Data

<Prev in Thread] Current Thread [Next in Thread>