[Top] [All Lists]

Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]

To: Bill Davidsen <davidsen@xxxxxxx>
Subject: Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 19 Dec 2008 09:26:21 +1100
Cc: Peter Grandi <pg_xf2@xxxxxxxxxxxxxxxxxx>, Linux RAID <linux-raid@xxxxxxxxxxxxxxx>, Linux XFS <xfs@xxxxxxxxxxx>
In-reply-to: <494971B2.1000103@xxxxxxx>
Mail-followup-to: Bill Davidsen <davidsen@xxxxxxx>, Peter Grandi <pg_xf2@xxxxxxxxxxxxxxxxxx>, Linux RAID <linux-raid@xxxxxxxxxxxxxxx>, Linux XFS <xfs@xxxxxxxxxxx>
References: <alpine.DEB.1.10.0812060928030.14215@xxxxxxxxxxxxxxxx> <1229225480.16555.152.camel@localhost> <18757.4606.966139.10342@xxxxxxxxxxxxxxxxxx> <200812141912.59649.Martin@xxxxxxxxxxxx> <18757.33373.744917.457587@xxxxxxxxxxxxxxxxxx> <494971B2.1000103@xxxxxxx>
User-agent: Mutt/1.5.18 (2008-05-17)
On Wed, Dec 17, 2008 at 04:40:02PM -0500, Bill Davidsen wrote:
> What really bothers me is that there's no obvious need for
> barriers at the device level if the file system is just a bit
> smarter and does it's own async io (like aio_*), because you can
> track writes outstanding on a per-fd basis, so instead of stopping
> the flow of data to the drive, you can just block a file
> descriptor and wait for the count of outstanding i/o to drop to
> zero. That provides the order semantics of barriers as far as I
> can see, having tirelessly thought about it for ten minutes or so.

Well, you've pretty much described the algorithm XFS uses in it's
transaction system - it's entirely asynchronous - and it's been
clear for many, many years that this model is broken when you have
devices with volatile write caches and internal re-ordering.  I/O
completion on such devices does not guarantee data is safe on stable

If the device does not commit writes to stable storage in the same
order they are signalled as complete (i.e. internal device
re-ordering occurred after completion), then the device violates
fundamental assumptions about I/O completion that the above model
relies on.

XFS uses barriers to guarantee that the devices don't lie about the
completion order of critical I/O, not that the I/Os are on stable
storage. The fact that this causes cache flushes to stable storage
is result of the implementation of that guarantee of ordering. I'm
sure the linux barrier implementation could be smarter and faster
(for some hardware), but for an operation that is used to guarantee
integrity I'll take conservative and safe over smart and fast any
day of the week....


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>