pcp
[Top] [All Lists]

Re: multithreading bottleneck: pdubuf.c

To: "Frank Ch. Eigler" <fche@xxxxxxxxxx>
Subject: Re: multithreading bottleneck: pdubuf.c
From: Ken McDonell <kenj@xxxxxxxxxxxxxxxx>
Date: Wed, 04 Mar 2015 09:10:04 +1100
Cc: pcp@xxxxxxxxxxx
Delivered-to: pcp@xxxxxxxxxxx
In-reply-to: <y0md24p7pj2.fsf@xxxxxxxx>
References: <20150302015436.GB21203@xxxxxxxxxx> <54F61811.3090400@xxxxxxxxxxxxxxxx> <y0md24p7pj2.fsf@xxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
On 04/03/15 08:13, Frank Ch. Eigler wrote:
..
OK, I see the valloc(), but not direct I/O (in the sense of fcntl
O_DIRECT or mmap), so there's going to be some user->kernel buffer
copying regardless of alignment.

Not all operating systems are Linux ... 8^)> ... in the current mix of operating systems I don't think this matters any more.

[...]
+ page alignment means that the buffers should be sized in units of
multiple pages also [...]

Wouldn't valloc() do that, without rounding-up on our side?

I don't think so. valloc(size) is equivalent to memalign(sysconf(_SC_PAGESIZE),size) which enforces alignment, but does no size rounding below the call AFAIK.

...
Right - some early results were posted on the thread a few days back,
looking promising.  Can you suggest some specific benchmarking
scenarios expected to stress this area?

I'm behind on email, I had not got down to the performance data you posted. Looks good.

I'd be curious on the distribution of buffer sizes in the pool when pmwebd has reached some sort of steady state ... calling __pmFindPDUBuf(-1) will dump the current pool contents on stderr, so could you add that call for the purposes of collecting info?

By comparison, the buffer pool for my pmcd looks like this:

kenj@bozo:~/src/pcp/src/pcp2graphite$ pminfo -f pmcd.buf

pmcd.buf.alloc
    inst [12 or "0012"] value 1
    inst [20 or "0020"] value 1
    inst [1024 or "1024"] value 1
    inst [2048 or "2048"] value 2
    inst [4196 or "4196"] value 0
    inst [8192 or "8192"] value 0
    inst [8193 or "8192+"] value 1

pmcd.buf.free
    inst [12 or "0012"] value 1
    inst [20 or "0020"] value 1
    inst [1024 or "1024"] value 1
    inst [2048 or "2048"] value 2
    inst [4196 or "4196"] value 0
    inst [8192 or "8192"] value 0
    inst [8193 or "8192+"] value 1

which pretty much matches our historical assumptions.

Once we have some distribution stats, I think it needs a purpose build simulation to seed the pool to a certain distribution, then time some N iterations of repeat K times (Find+Pin) repeat K times (Unpin)).

If you can help with the stats, I can help with the simulation (I think I've got some code I wrote 44 years ago that I could redeploy, if I could just find a 7-track tape drive to read the backup).

<Prev in Thread] Current Thread [Next in Thread>