Hi -
> >>[...]
> >>+ page alignment means that the buffers should be sized in units of
> >>multiple pages also [...]
> >
> >Wouldn't valloc() do that, without rounding-up on our side?
>
> I don't think so. valloc(size) is equivalent to
> memalign(sysconf(_SC_PAGESIZE),size) which enforces alignment, but does
> no size rounding below the call AFAIK.
It returns a page-aligned memory block of at least 'size' bytes. For
traditional direct I/O, the I/O size would have to match some multiple
of disk sector or kernel page size, but we don't do that - just the
exact record sizes.)
My guess is that the rounding-up was not for this purpose, but for the
hypothetical easier reuse of the PDUbufs after unpinning &
free-listing - i.e., trying to avoid fragmentation.
> [...] I'd be curious on the distribution of buffer sizes in the
> pool when pmwebd has reached some sort of steady state [...]
A steady state between active requests is all-zeroes :-). Will see
about getting a mid-run peak set of numbers.
> By comparison, the buffer pool for my pmcd looks like this:
> kenj@bozo:~/src/pcp/src/pcp2graphite$ pminfo -f pmcd.buf
>
> pmcd.buf.alloc
> inst [12 or "0012"] value 1
> inst [20 or "0020"] value 1
> inst [1024 or "1024"] value 1
> inst [2048 or "2048"] value 2
> inst [4196 or "4196"] value 0
> inst [8192 or "8192"] value 0
> inst [8193 or "8192+"] value 1
>
> pmcd.buf.free
> inst [12 or "0012"] value 1
> inst [20 or "0020"] value 1
> inst [1024 or "1024"] value 1
> inst [2048 or "2048"] value 2
> inst [4196 or "4196"] value 0
> inst [8192 or "8192"] value 0
> inst [8193 or "8192+"] value 1
Similar here, with the new code:
pmcd.buf.alloc
inst [12 or "0012"] value 2
inst [20 or "0020"] value 2
inst [1024 or "1024"] value 2
and all zeroes elsewhere. But that's in nearly-idle state. The
pdubufs get much busier mid-archive-processing.
> Once we have some distribution stats, I think it needs a purpose build
> simulation to seed the pool to a certain distribution, then time some N
> iterations of repeat K times (Find+Pin) repeat K times (Unpin)).
> If you can help with the stats, I can help with the simulation (I think
> I've got some code I wrote 44 years ago that I could redeploy, if I
> could just find a 7-track tape drive to read the backup).
Sure, we could microbenchmark, but it may be even better to designate
some big pdubuf-intensive realistic workload (some tiny job? a big
pmlogextract? pmwebd-graphite gigaquery?), and compare those.
- FChE
|