Hi flks,
This is version 2 of the delayed logging series.
I won't repeat everything about what it is, just point you
here:
http://marc.info/?l=linux-xfs&m=126862777118946&w=2
for the description, and here:
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfs.git delayed-logging
for the current code.
To address the known issues from the first posting:
1. xfslogd spining for long periods - unable to reproduce
2. memory leaks - some fixed, could be others.
3. recovery failure in 121 - NOT FIXED, in progress
4. Checkpoitn log ticket allocation - fixed
5. stress testing - I can't break it anymore with fs_mark,
postmark, dbench, bonnie++ or xfsqa, so is much better
than the previous posting.
6. Scalabilty - good enough for now - see results below.
7. checkpoint sizing - good enough for now.
There are no new known issues with this release.
To address the algorithmic optimisations:
1. busy extent tracking - separated and posted for review.
2. log IO barriers -> later
3. commit record synchronisation -> later
4. AIL pushing causing log forces -> later
5. CPU usage optimisations -> later
The change has reduced in size now that much of the preliminary log
and transaction changes are in the main tree. These numbers still
both include the busy extent tracking work:
Version 1: 19 files changed, 2594 insertions(+), 580 deletions(-)
Version 2: 22 files changed, 2188 insertions(+), 377 deletions(-)
Anyway, at this point I'd like to have this considered for inclusion
in the dev tree as an experimental feature to get it out to a wider
testing audience. I'm working on the recovery issue, but I don't
want that to hold up the review process. The full pull request is
below.
Scalability:
These tests were run on a VM with 8p and 4GB RAM, with a 10GB
filesystem that can do about 5kiop/s and 530MB/s.
$ sudo mkfs.xfs -f -l size=128m /dev/vdb
meta-data=/dev/vdb isize=256 agcount=4, agsize=655360 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=2621440, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=32768, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
$ sudo mount -o logbsize=262144,nobarrier /dev/vdb /mnt/scratch
For delayed logging, the mount options were teh only difference. The
test creates a directory per thread, and creates 100,000 zero length
files in each directory, then removes them. The work per thread is
the same, there is no contention between them until the log it
reached, so the number of files/s should increase in proportion with
the number of threads active or the log subsystem becomes the
bottleneck. The command line for each test looks like:
$ fs_mark -S0 -s 0 -n 100000 -d /mnt/scratch/0 -d ...
Results:
files/s log IOPS/MB/s
Threads vanilla delaylog vanilla delaylog
1 6190 6790 300/70 20/5
2 11700 12400 600/140 50/15
4 20790 23292 1000/250 120/30
8 19760 21960 1400/350 140/35
16 12210 15723 650/150 120/30
Running the same test on the same VM, but with a block device that
can do 100MB/s and maybe 500 iops/s, we see:
files/s log IOPS/MB/s
Threads vanilla delaylog vanilla delaylog
1 6430 6650 300/70 20/5
2 7870 12150 400/100 50/15
4 8830 22130 500/120 120/30
8 8010 21000 400/100 140/35
16 5560 14560 250/70 120/30
These results tell me that without any special analysis or tuning,
delayed logging is showing equivalent performance and scalability on
high end storage, and significantly better scalability on low-end
storage.
The drop-off at higher thread counts is not a transaction/log
subsystem limitation - it's caused by the fact that the creation of
800k files takes long enough for background writeback to kick in, so
new creates compete with inode cluster writeback and other metadata
for IO. Further, at 16 threads, the 1.6M inodes did not all fit in
the cache, so about 25% of them ended up getting re-read from disk
during the unlink phase, slowing that down further.
However, the results are good enough for me at this point.
-----
The following changes since commit 29db3370a1369541d58d692fbfb168b8a0bd7f41:
Alex Elder (1):
xfs: kill off l_sectbb_mask
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfs.git delayed-logging
Dave Chinner (14):
xfs: Improve scalability of busy extent tracking
xfs: allow log ticket allocation to take allocation flags
xfs: Delayed logging design documentation
xfs: introduce delayed logging mount option
xfs: Introduce the Committed Item List
xfs: Add delayed logging checkpoint context infrastructure
xfs: introduce new chained log vector transaction formatting code
xfs: format and insert log vectors into the CIL
xfs: attach transactions to the checkpoint context
xfs: checkpoint transaction infrastructure
xfs: Allow multiple in-flight checkpoints
xfs: forced unmounts need to push the CIL
xfs: enable background pushing of the CIL
xfs: modify buffer item reference counting for delayed logging
.../filesystems/xfs-delayed-logging-design.txt | 819 ++++++++++++++++++++
fs/xfs/Makefile | 1 +
fs/xfs/linux-2.6/xfs_buf.c | 9 +
fs/xfs/linux-2.6/xfs_quotaops.c | 1 +
fs/xfs/linux-2.6/xfs_super.c | 9 +
fs/xfs/linux-2.6/xfs_trace.h | 80 ++-
fs/xfs/support/debug.c | 1 +
fs/xfs/xfs_ag.h | 21 +-
fs/xfs/xfs_alloc.c | 272 ++++---
fs/xfs/xfs_alloc.h | 5 +-
fs/xfs/xfs_buf_item.c | 33 +-
fs/xfs/xfs_filestream.c | 1 +
fs/xfs/xfs_log.c | 113 ++-
fs/xfs/xfs_log.h | 11 +-
fs/xfs/xfs_log_cil.c | 685 ++++++++++++++++
fs/xfs/xfs_log_priv.h | 118 +++-
fs/xfs/xfs_mount.h | 1 +
fs/xfs/xfs_trans.c | 207 ++++-
fs/xfs/xfs_trans.h | 44 +-
fs/xfs/xfs_trans_extfree.c | 1 +
fs/xfs/xfs_trans_item.c | 114 +---
fs/xfs/xfs_trans_priv.h | 19 +-
22 files changed, 2188 insertions(+), 377 deletions(-)
create mode 100644 Documentation/filesystems/xfs-delayed-logging-design.txt
create mode 100644 fs/xfs/xfs_log_cil.c
Cheers,
Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
|