GNU 'tar', Schilling's 'tar', write-cache/barrier
Peter Grandi
pg_xf2 at xf2.for.sabi.co.UK
Sat Mar 24 11:27:19 CDT 2012
>> [ ... ] there has been quite some other metadata related
>> performance improvements. Thus IMHO reducing the recent
>> improvements in metadata performance is underselling XFS and
>> overselling delaylog. [ ... ]
> That's a good way of putting it, and I am pleased that I finally
> get a reasonable comment on this story, and one that agrees with
> one of my previous points in this thread: [ ... ]
[ ... ]
> http://xfs.org/images/d/d1/Xfs-scalability-lca2012.pdf
> «* Ext4 can be up 20-50x times than XFS when data is also being
> written as well (e.g. untarring kernel tarballs).
> * This is XFS @ 2009-2010.
> * Unless you have seriously fast storage, XFS just won't
> perform well on metadata modification heavy workloads.»
> It is never mentioned that 'ext4' is 20-50x faster on metadata
> modification workloads because it implements much weaker
> semantics than «XFS @ 2009-2010», and that 'delaylog' matches
> 'ext4' because it implements similarly weaker semantics, by
> reducing the frequency of commits, as the XFS FAQ briefly
> summarizes: [ ... ]
As to this, I have realized that there is a very big detail that
I have given for implicit but that perhaps at this point should
be made explicit as to the deliberately misleading propaganda
that «Ext4 can be up 20-50x times than XFS when data is also
being written as well (e.g. untarring kernel tarballs).»:
Almost all «untarring kernel tarballs» "benchmarks" are done
with GNU 'tar', and it does not 'fsync'.
This matters because XFS has done the "right thing" with 'fsync'
for a long time, and if the application does 'fsync' then 'ext4',
XFS without and with 'delaylog' are mostly equivalent.
Conversely Schilling's 'tar' does 'fsync' and as a result it is
often considered (by the gullible crowd to which the presentation
propaganda referred to above is addressed) to have less
"performance" than GNU 'tar'.
To illustrate I have done a tiny test '.tar' file with a
directory and two files within, and this is what happens with
Schilling's 'tar':
$ strace -f -e trace=file,fsync,fdatasync,read,write star xf d.tar
open("d.tar", O_RDONLY) = 7
read(7, "d/\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
Process 8201 attached
[ ... ]
[pid 8200] lstat("d/", 0x7fff174d9490) = -1 ENOENT (No such file or directory)
[pid 8200] lstat("d/", 0x7fff174d9330) = -1 ENOENT (No such file or directory)
[pid 8200] access("d", F_OK) = -1 ENOENT (No such file or directory)
[pid 8200] mkdir("d", 0700) = 0
[pid 8200] lstat("d/", {st_mode=S_IFDIR|0700, st_size=6, ...}) = 0
[pid 8200] lstat("d/f1", 0x7fff174d9490) = -1 ENOENT (No such file or directory)
[pid 8200] open("d/f1", O_WRONLY|O_CREAT|O_TRUNC, 0600) = 4
[pid 8200] write(4, "3\275@&{U(\356\332\25z\250\236\256v\6U[5\334\265\313\206:\351\335\366Q\21\231\210H"..., 128) = 128
[pid 8200] fsync(4 <unfinished ...>
[pid 8201] <... write resumed> ) = 1
[pid 8201] read(7, "", 10240) = 0
Process 8201 detached
<... fsync resumed> ) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
utimes("d/f1", {{1332588240, 0}, {1332588240, 0}}) = 0
utimes("d/f1", {{1332588240, 0}, {1332588240, 0}}) = 0
lstat("d/f2", 0x7fff174d9490) = -1 ENOENT (No such file or directory)
open("d/f2", O_WRONLY|O_CREAT|O_TRUNC, 0600) = 4
write(4, "\377\325\253\257,\210\2719e\24\347*P\325x\357\345\220\375Ei\375\355\22063\17\355\312.\6\347"..., 4096) = 4096
fsync(4) = 0
utimes("d/f2", {{1332588257, 0}, {1332588257, 0}}) = 0
utimes("d/f2", {{1332588257, 0}, {1332588257, 0}}) = 0
utimes("d", {{1332588242, 0}, {1332588242, 0}}) = 0
write(2, "star: 1 blocks + 0 bytes (total "..., 58star: 1 blocks + 0 bytes (total of 10240 bytes = 10.00k).
) = 58
Compare with GNU 'tar':
$ strace -f -e trace=file,fsync,fdatasync,read,write tar xf d.tar
[ ... ]
open("d.tar", O_RDONLY) = 3
read(3, "d/\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 10240) = 10240
[ ... ]
mkdir("d", 0700) = -1 EEXIST (File exists)
stat("d", {st_mode=S_IFDIR|0700, st_size=24, ...}) = 0
open("d/f1", O_WRONLY|O_CREAT|O_EXCL, 0600) = -1 EEXIST (File exists)
unlink("d/f1") = 0
open("d/f1", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
write(4, "3\275@&{U(\356\332\25z\250\236\256v\6U[5\334\265\313\206:\351\335\366Q\21\231\210H"..., 128) = 128
close(4) = 0
utimensat(AT_FDCWD, "d/f1", {{1332589368, 193330071}, {1332588240, 0}}, 0) = 0
open("d/f2", O_WRONLY|O_CREAT|O_EXCL, 0600) = -1 EEXIST (File exists)
unlink("d/f2") = 0
open("d/f2", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
write(4, "\377\325\253\257,\210\2719e\24\347*P\325x\357\345\220\375Ei\375\355\22063\17\355\312.\6\347"..., 4096) = 4096
close(4) = 0
utimensat(AT_FDCWD, "d/f2", {{1332589368, 193330071}, {1332588257, 0}}, 0) = 0
close(3) = 0
utimensat(AT_FDCWD, "d", {{1332589368, 193330071}, {1332588242, 0}}, 0) = 0
close(1) = 0
close(2) = 0
In effect running GNU 'tar x' (GNU 'tar') is the same as running
'eatmydata tar x ...'; and indeed as its documentation says,
'eatmydata' is designed to achieve higher "performance" by
turning programs that behave like Schilling's 'tar' into programs
that behave like GNU 'tar'.
When GNU 'tar' is used as a "benchmark" for 'delaylog' and there
are no 'fsync's, the longer the interval between commits (and
thus the implicit unsafety) the higher the "performance", or at
least that's the argument I think propagandists and buffoons may
be using.
That's one important reason why I mentioned 'eatmydata' as one
performance enhancing technique in a group with 'nobarrier' and
'delaylog'; and why I was amused by this buffoonery:
«So you're comparing delaylog's volatile buffer architecture to
software that *intentionally and transparently disables fsync*?»
Because when the 'delaylog' propagandists write that:
«Ext4 can be up 20-50x times than XFS when data is also being
written as well (e.g. untarring kernel tarballs).»
it is them who are comparing "performance" using GNU 'tar' which
intentionally and transparently does not use at all 'fsync'.
To illustrate here are some "benchmarks", which hopefully should
be revealing as to the merit of the posturings of some of the
buffoons or propagandists that have been discontributing to this
discussion (note that there are somewhat subtle details both as
to the setup and the results):
--------------------------------------------------------------
# uname -a
Linux base.ty.sabi.co.uk 2.6.18-274.18.1.el5 #1 SMP Thu Feb 9 12:20:03 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
# egrep ' (/tmp|/tmp/(ext4|xfs))' /proc/mounts; sysctl vm | egrep '_(bytes|centisecs)' | sort
none /tmp tmpfs rw 0 0
/dev/sdd8 /tmp/xfs xfs rw,nouuid,attr2,inode64,logbsize=256k,sunit=8,swidth=8,noquota 0 0
/dev/sdd3 /tmp/ext4 ext4 rw,barrier=1,data=ordered 0 0
vm.dirty_background_bytes = 900000000
vm.dirty_bytes = 500000000
vm.dirty_expire_centisecs = 2000
vm.dirty_writeback_centisecs = 1000
--------------------------------------------------------------
# (cd /tmp/ext4; rm -rf linux-2.6.32; sync; time tar -x -f /tmp/linux-2.6.32.tar; egrep 'Dirty|Writeback' /proc/meminfo; time sync)
real 0m1.027s
user 0m0.105s
sys 0m0.922s
Dirty: 419700 kB
Writeback: 0 kB
real 0m5.163s
user 0m0.000s
sys 0m0.473s
--------------------------------------------------------------
# (cd /tmp/ext4; rm -rf linux-2.6.32; sync; time star -no-fsync -x -f /tmp/linux-2.6.32.tar; egrep 'Dirty|Writeback' /proc/meminfo; time sync)
star: 37343 blocks + 0 bytes (total of 382392320 bytes = 373430.00k).
real 0m1.204s
user 0m0.139s
sys 0m1.270s
Dirty: 419456 kB
Writeback: 0 kB
real 0m5.012s
user 0m0.000s
sys 0m0.458s
--------------------------------------------------------------
# (cd /tmp/ext4; rm -rf linux-2.6.32; sync; time star -x -f /tmp/linux-2.6.32.tar; egrep 'Dirty|Writeback' /proc/meminfo; time sync)
star: 37343 blocks + 0 bytes (total of 382392320 bytes = 373430.00k).
real 23m29.346s
user 0m0.327s
sys 0m2.280s
Dirty: 108 kB
Writeback: 0 kB
real 0m0.236s
user 0m0.000s
sys 0m0.199s
--------------------------------------------------------------
# (cd /tmp/xfs; rm -rf linux-2.6.32; sync; time tar -x -f /tmp/linux-2.6.32.tar; egrep 'Dirty|Writeback' /proc/meminfo; time sync)
real 0m46.554s
user 0m0.107s
sys 0m1.271s
Dirty: 415168 kB
Writeback: 0 kB
real 1m54.913s
user 0m0.000s
sys 0m0.325s
----------------------------------------------------------------
# (cd /tmp/xfs; rm -rf linux-2.6.32; sync; time star -x -f /tmp/linux-2.6.32.tar; egrep 'Dirty|Writeback' /proc/meminfo; time sync)
star: 37343 blocks + 0 bytes (total of 382392320 bytes = 373430.00k).
real 60m15.723s
user 0m0.442s
sys 0m7.009s
Dirty: 4 kB
Writeback: 0 kB
real 0m0.222s
user 0m0.000s
sys 0m0.194s
----------------------------------------------------------------
More information about the xfs
mailing list