On 14/01/14 16:58, Nathan Scott wrote:
Hi Greg,
----- Original Message -----
On 13/01/14 16:07, Max Matveev wrote:
On Mon, 13 Jan 2014 14:19:26 -0800, Greg Banks wrote:
gnb> While I designed and wrote the thing, I was never happy with any
gnb> of the iterations of the architecture and I wouldn't recommend
gnb> to anyone that they copy it. Some of the problems were:
gnb> * it was both a client of pmcd and a PMDA, which led to
interesting
gnb> deadlocks with the single-threaded pmcd
That was the "second" pass with nasavg pmda. I thought there was a
first version which only used archives but it had to be abandoned
because tailing of archive being written wasn't working reliably.
Yes, the first design iteration tailed archives and was horribly
unreliable. Pmarchive was writing to the various files of an archive in
(pmlogger)
Yep. That squeaky sound you hear is mental rusty hinges.
such a way that there was a race window where the archive reading code
in libpcp would see an inconsistent archive and barf. Plus, there was an
inconvenient amount of lag, up to 30 seconds, in pmarchive and in the
tailer.
OOC, what approaches were tried to address these reliability issues?
Given that the original libpcp design wasn't trying to service this
kind of log access, its not really surprising it didn't work first
go. Max's ordered log label update mechanism sounded interesting
- was that implemented and if so, did it improve reliability?
Ken did something to the guts of pmlogger which made it write the files
in the correct order, with an fsync(). It worked and I think it was
checked in, but I'm not sure. We couldn't use it because our design
relied on a stock PCP and there was no way to ship an update.
Plus it only solved half the problem, the lag being the other half.
Plus again, it meant that any metric we wanted to present an average for
had to be logged at sufficient frequency to make the average reasonably
responsive; that frequency is a lot more than you really want in
historical records and it chewed up a lot of archive disk space. Disk
space was always a problem, we spent a lot of time wrestling with
pmlogger metrics and frequencies.
The 30 second lag will possibly be a lack of pmlogger fflush'ing its
buffered writes I guess - although I see the code is sprinkled with
them nowadays.
That, plus the polling time in the tailing process and the polling time
in pmlogger itself.
Was that on IRIX or Linux, OOC?
Both IIRC.dev-melb:nasmgr:31806a
Some coordination
mechanism (like the pmlc flush command) for coordinating access may
help - was anything attempted there? If so, did anything work or
not work well that you recall?
So using pmlc flush reduces one (the largest) of the three sources of
lag, but not the other two. And see my comments on disk space.
I took a look at the comments in the historic source code (nostalgia,
heh). Some of the problems mentioned are:
* libpcp reads the latest metadata of an archive at open time only, so
if we are reading an archive before it's been finished and new metrics
appear, libpcp won't notice. We worked around this by closing and
re-opening the archive if pmNameID() failed.
* the function pmGetArchiveEnd() seems to have been broken at some time
* pmLookupDesc() is one of the failure cases when you lose the update
race with pmlogger
* when you have historical archives, it's really really important to
have a stable mapping of instance names to numbers - hence pmdaCacheOp()
et al
* sometimes this can't be helped, e.g. disk minor numbers across
reboots, and the only way out is to trash the entire remembered instance
domain
* a PMDA has to begin responding to packets from PMCD quite quickly -
within 5 seconds. But it can take a lot longer than that to scan
historical archives to fill the average buffer, so the PMDA has to
multiplex reading archives and responding to PMCD, and until it has
finished reading data it has to return a well-formed but empty result to
FETCHes.
Hope this helps.
--
Greg.
|