On 2011.05.01 at 12:55 -0400, Christoph Hellwig wrote:
> On Sun, May 01, 2011 at 06:52:46PM +1000, Dave Chinner wrote:
> > > > more than likely your problem is that barriers have been enabled for
> > > > MD/DM devices on the new kernel, and they aren't on the old kernel.
> > > > XFS uses barriers by default, ext3 does not. Hence XFS performance
> > > > will change while ext3 will not. Check dmesg output when mounting
> > > > the filesystems on the different kernels.
> > >
> > > But didn't 2.6.38 replace barriers by explicit flushes the filesystem has
> > > to
> > > wait for - mitigating most of the performance problems with barriers?
> > IIRC, it depends on whether the hardware supports FUA or not. If it
> > doesn't then device cache flushes are used to emulate FUA and so
> > performance can still suck. Christoph will no doubt correct me if I
> > got that wrong ;)
> Mitigating most of the barrier performance issues is a bit of a strong
> word. Yes, it remove useless ordering requirements, but fundamentally
> you still have to flush the disk cache to the physical medium, which
> is always going to be slower than just filling up a DRAM cache like
> ext3's default behaviour in mainline does (interestingly both SLES
> and RHEL have patched it to provide safe behaviour by default).
> Both the old barrier and new flush code will use the FUA bit if
> available, and those optimize the post-flush for a log write out.
> Note that currently libata by default always disables FUA support,
> even if the disk supports it, so you'll need a SAS/FC/iSCSI/etc
> device to actually see FUA requests, which is quite sad as it
> should provide a nice speedup epecially for SATA where the cache
> flush command is not queueable and thus requires us to still
> drain any outstanding I/O at least for a short duration.
I've recently asked on the IDE list why FUA is disabled by default in
libata and this is what Tejun Heo had to say (calling it a misfeature):
»The way flushes are used by filesystems is that FUA is usually only
used right after another FLUSH. ie. Using FUA replaces FLUSH + commit
block write + FLUSH sequence to FLUSH + FUA commit block write. Due
to the preceding FLUSH, the cache is already empty, so the only
difference between WRITE + FLUSH and FUA WRITE becomes the extra
command issue overhead which is usually almost unnoticeable compared
to the actual IO.
Another thing is that with the recent updates to block FLUSH handling,
using FUA might even be less efficient. The new implementation
aggressively merges those commit writes and flushes. IOW, depending
on timing, multiple consecutive commit writes can be merged as,
FLUSH + commit writes + FLUSH
FLUSH + some commit writes + FLUSH + other commit writes + FLUSH
and so on,
These merges will happen with fsync heavy workloads where FLUSH
performance actually matters and, in these scenarios, FUA writes is
less effective because it puts extra ordering restrictions on each FUA
write. ie. With surrounding FLUSHes, the drive is free to reorder
commit writes to maximize performance, with FUA, the disk has to jump
around all over the place to execute each command in the exact issue
I personally think FUA is a misfeature. It's a microoptimization with
shallow benefits even when used properly while putting much heavier
restriction on actual IO order, which usually is the slow part.
That said, if someone can show FUA actually brings noticeable
performance benefits, sure, let's do it, but till then, I think it
would be best to leave it up in the attic.«