Hi, Dave -
> [...] The following is an outline of a high level design which is
> intended to address all of the above. [...]
Thanks, excellent writeup, as is Nathan's feedback. Just a couple of
other or emphasized ideas for your consideration:
> [...] I propose that this PMNS be built up as each individual
> archive is accessed. [...]
Echoing Nathan here, I expect this will need to be done during the
initial scan phase, not on-demand, because apprx. every pcp client
application will want to do a PMNS lookup like pmLookupDesc at
startup.
> [...]
> Scaling
> The primary issue is resource management when scaling to PCP
> installations for which individual directories may contain extremely
> large numbers of related archives. In particular, we don't want to
> keep large numbers of file descriptors open simultaneously.
File descriptors are somewhat scarce, yes, but one can have thousands
open. And don't forget about another general POSIX facility: mmap(2).
You can map a great many files into memory, without keeping the fd's
open. For example, mmap'ing all the .meta files would give a full
in-memory PMNS & log-label view.
> [...] We must also keep in mind that, in the case of directories of
> archives, new archives could be dynamically appearing via an active
> pmlogger or via some other means. They could also be dynamically
> disappearing, however this is just as easy to detect and should
> probably be treated as an error situation. [...]
Handling the fully general dynamic case might not be practical. For
example, having new archives pop up in the middle of a time interval
we've already passed, we'd have an inconsistency to the pmapi client
getting different results depending on the exact timing of the
operations.
I would posit that we should support less than the general case, and
describe it as simply as possible.
For example, a new multi-archive context could be defined to cover
"all referenced valid archive files in existence at pmNewContext call
time". This would exclude archives popping up in the middle, or brand
new archives. It would constitute the simplest implementation. (In
this scenario, a pmapi client that encounters the last record of the
originally-last archive will know it. To keep up with even newer
archives, it would have to reopen the archive-directory, pmSetMode
time-warp to the last timestamp it saw, and resume fetching.) This
seems reasonable & pleasantly simple, and well-suited for pmapi
applications that run a short time.
For another example, the definition could be as above, "PLUS all
newer-time archives found whenever a pmFetch goes past the then-known
last archive." It would identify the pmFetch-at-end-of-time as the
trigger for the rescanning operation, handling the above case
automatically within libpcp. It would not include newly found
older-time archives, so it could not change history / "mess up the
timeline". This seems reasonable too, not as simple, but helpful for
pmapi applications that are meant to process archives for a long time.
Maybe there are other workable definitions, but I would caution
against being overly ambitious in the sense of accepting too many
runtime archive changes, beyond what a pmapi archive-processing app
could reasonably want to subject itself to.
Drastically changed (deleted or renamed or pmlogrewrite or merged or
pmlogreduce) archives would constitute runtime archive changes that
probably should be rendered as errors, rather than motivate heroic
attempts to reverse-engineer an original consistent view.
> [...] All of this requires that we, at a minimum store the start and
> end time of each archive in the active set. [...]
Note that the archive end-time is not stored formally & efficiently,
but heuristically in the (optional!) .index files, and indirectly the
filesystem inode stat timestamps.
Regarding inotify. It's linux-only and local-filesystem-only, so has
adverse portability implications. If we define the times for
directory rescanning to be rare & clear, it is probably not any win
over a directory-fstat plus directory-traversal, esp. if we cache
fstat's of files we've already scanned.
- FChE
|