When pmwebd scans an directory of archives that contain compressed
volumes, libpcp causes each file to be decompressed, briefly analyzed,
and then tossed away. Even one round of bunzip2'ing a single file
takes too long; but imagine what a directory full of them does.
It would be nice if we had some combination of:
- a way of directly accessing compressed files in situ,
as in zlib or libbz2 ... except they are not normally seekable
- use of a seekable compression tool/library like dictzip
(it's in the dictd package, dictd_data_read_ etc.,
licensed GPL1+, so usable)
http://stackoverflow.com/questions/429987/compression-formats-with-good-support-for-random-access-within-archives/4010096
- tweak libpcp so it does not decompress whole volume files
just to answer basic queries like PMNS enumeration, log start and
*end* times, which are generally in the .meta / .index. (The
archive end is not so nigh -- maybe we need a pcp-archive format
tweak, or perhaps an fstat()-based heuristic?)
- tweak libpcp so that decompressed files are kept in a cache
for a while, to avoid redecompression
- a way of letting pmwebd open up numerous long-lived pmcontext's,
so any explicit decompression step would have to be paid-for only
once