pcp
[Top] [All Lists]

Re: multithreading bottleneck: pdubuf.c

To: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Subject: Re: multithreading bottleneck: pdubuf.c
From: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Date: Tue, 3 Mar 2015 17:19:29 -0500
Cc: pcp developers <pcp@xxxxxxxxxxx>
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <54F6313C.1040408@xxxxxxxxxxxxxxxx>
References: <20150302015436.GB21203@xxxxxxxxxx> <54F61811.3090400@xxxxxxxxxxxxxxxx> <y0md24p7pj2.fsf@xxxxxxxx> <54F6313C.1040408@xxxxxxxxxxxxxxxx>
User-agent: Mutt/1.4.2.2i
Hi -

> >>[...]
> >>+ page alignment means that the buffers should be sized in units of
> >>multiple pages also [...]
> >
> >Wouldn't valloc() do that, without rounding-up on our side?
> 
> I don't think so.  valloc(size) is equivalent to 
> memalign(sysconf(_SC_PAGESIZE),size) which enforces alignment, but does 
> no size rounding below the call AFAIK.

It returns a page-aligned memory block of at least 'size' bytes.  For
traditional direct I/O, the I/O size would have to match some multiple
of disk sector or kernel page size, but we don't do that - just the
exact record sizes.)

My guess is that the rounding-up was not for this purpose, but for the
hypothetical easier reuse of the PDUbufs after unpinning &
free-listing - i.e., trying to avoid fragmentation.


> [...]  I'd be curious on the distribution of buffer sizes in the
> pool when pmwebd has reached some sort of steady state [...]

A steady state between active requests is all-zeroes :-).  Will see
about getting a mid-run peak set of numbers.


> By comparison, the buffer pool for my pmcd looks like this:
> kenj@bozo:~/src/pcp/src/pcp2graphite$ pminfo -f pmcd.buf
> 
> pmcd.buf.alloc
>     inst [12 or "0012"] value 1
>     inst [20 or "0020"] value 1
>     inst [1024 or "1024"] value 1
>     inst [2048 or "2048"] value 2
>     inst [4196 or "4196"] value 0
>     inst [8192 or "8192"] value 0
>     inst [8193 or "8192+"] value 1
> 
> pmcd.buf.free
>     inst [12 or "0012"] value 1
>     inst [20 or "0020"] value 1
>     inst [1024 or "1024"] value 1
>     inst [2048 or "2048"] value 2
>     inst [4196 or "4196"] value 0
>     inst [8192 or "8192"] value 0
>     inst [8193 or "8192+"] value 1

Similar here, with the new code:

pmcd.buf.alloc
    inst [12 or "0012"] value 2
    inst [20 or "0020"] value 2
    inst [1024 or "1024"] value 2

and all zeroes elsewhere.  But that's in nearly-idle state.  The
pdubufs get much busier mid-archive-processing.


> Once we have some distribution stats, I think it needs a purpose build 
> simulation to seed the pool to a certain distribution, then time some N 
> iterations of repeat K times (Find+Pin) repeat K times (Unpin)).
> If you can help with the stats, I can help with the simulation (I think 
> I've got some code I wrote 44 years ago that I could redeploy, if I 
> could just find a 7-track tape drive to read the backup).

Sure, we could microbenchmark, but it may be even better to designate
some big pdubuf-intensive realistic workload (some tiny job?  a big
pmlogextract?  pmwebd-graphite gigaquery?), and compare those.


- FChE

<Prev in Thread] Current Thread [Next in Thread>