iomap infrastructure and multipage writes V2
Christoph Hellwig
hch at lst.de
Mon May 2 13:23:41 CDT 2016
Hi Dave,
sorry for taking forever to get back to this - travel to LSF and some
other meetings and a dealine last week didn't leave me any time for
XFS work.
On Thu, Apr 14, 2016 at 07:54:42AM +1000, Dave Chinner wrote:
> Christoph, have you done any perf testing of this patchset yet to
> check that it does indeed reduce the CPU overhead of large write
> operations? I'd also be interested to know if there is any change in
> overhead for single page (4k) IOs as well, even though I suspect
> there won't be.
I've done a lot of testing earlier, and this version also looks very
promising. On the sort of hardware I have access to now, the 4k
numbers don't change much, but with 1M writes we both increase the
write bandwith a little bit and significantly lower the cpu usage.
The simple test that demonstrates this is this, the runs are from
a 4p VM with 4G of RAM, access to a fast NVMe SSD and a small enough
data size so that writeback shouldn't throttle the buffered write
path:
MNT=/mnt
PERF="perf_3.16" # soo smart to have tools in the kernel tree..
#BS=4k
#COUNT=65536
BS=1M
COUNT=256
$PERF stat dd if=/dev/zero of=$MNT/testfile bs=$BS count=$COUNT
with the baseline for-next tree I get the following bandwith and
cpu utilization:
BS=4k: ~600MB/s 0.856 CPUs utilized ( +- 0.32% )
BS=1M: 1.45GB/s 0.820 CPUs utilized ( +- 0.77% )
with all patches applied:
BS=4k: ~610MB/s 0.848 CPUs utilized ( +- 0.36% )
BS=1M: ~1.55GB/s 0.615 CPUs utilized ( +- 0.80% )
This is also visible in the walltime
baseline, 4k:
real 0m0.540s
user 0m0.000s
sys 0m0.533s
baseline, 1M:
real 0m0.310s
user 0m0.000s
sys 0m0.313s
multipage, 4k:
real 0m0.541s
user 0m0.010s
sys 0m0.527s
multipage, 1M:
real 0m0.272s
user 0m0.000s
sys 0m0.263s
More information about the xfs
mailing list