[Top] [All Lists]

Re: XFS and write barriers.

To: Neil Brown <neilb@xxxxxxx>
Subject: Re: XFS and write barriers.
From: David Chinner <dgc@xxxxxxx>
Date: Sun, 25 Mar 2007 15:17:55 +1100
Cc: David Chinner <dgc@xxxxxxx>, xfs@xxxxxxxxxxx, hch@xxxxxxxxxxxxx
In-reply-to: <17923.34462.210758.852042@xxxxxxxxxxxxxx>
References: <17923.11463.459927.628762@xxxxxxxxxxxxxx> <20070323053043.GD32602149@xxxxxxxxxxxxxxxxx> <17923.34462.210758.852042@xxxxxxxxxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/
On Fri, Mar 23, 2007 at 06:49:50PM +1100, Neil Brown wrote:
> On Friday March 23, dgc@xxxxxxx wrote:
> > On Fri, Mar 23, 2007 at 12:26:31PM +1100, Neil Brown wrote:
> > > Secondly, if a barrier write fails due to EOPNOTSUPP, it should be
> > > retried without the barrier (after possibly waiting for dependant
> > > requests to complete).  This is what other filesystems do, but I
> > > cannot find the code in xfs which does this.
> > 
> > XFS doesn't handle this - I was unaware that the barrier status of the
> > underlying block device could change....
> > 
> > OOC, when did this behaviour get introduced?
> Probably when md/raid1 started supported barriers....
> The problem is that this interface is (as far as I can see) undocumented
> and not fully specified.

And not communicated very far, either.

> Barriers only make sense inside drive firmware.

I disagree. e.g. Barriers have to be handled by the block layer to
prevent reordering of I/O in the request queues as well. The
block layer is responsible for ensuring barrier I/Os, as
indicated by the filesystem, act as real barriers.

> Trying to emulate it
> in the md layer doesn't make any sense as the filesystem is in a much
> better position to do any emulation required.

You're saying that the emulation of block layer functionality is the
responsibility of layers above the block layer. Why is this not
considered a layering violation?

> > > This is particularly important for md/raid1 as it is quite possible
> > > that barriers will be supported at first, but after a failure and
> > > different device on a different controller could be swapped in that
> > > does not support barriers.
> > 
> > I/O errors are not the way this should be handled. What happens if
> > the opposite happens? A drive that needs barriers is used as a
> > replacement on a filesystem that has barriers disabled because they
> > weren't needed? Now a crash can result in filesystem corruption, but
> > the filesystem has not been able to warn the admin that this
> > situation occurred. 
> There should never be a possibility of filesystem corruption.
> If the a barrier request fails, the filesystem should:
>   wait for any dependant request to complete
>   call blkdev_issue_flush
>   schedule the write of the 'barrier' block
>   call blkdev_issue_flush again.

IOWs, the filesystem has to use block device calls to emulate a block device
barrier I/O. Why can't the block layer, on reception of a barrier write
and detecting that barriers are no longer supported by the underlying
device (i.e. in MD), do:

        wait for all queued I/Os to complete
        call blkdev_issue_flush
        schedule the write of the 'barrier' block
        call blkdev_issue_flush again.

And not involve the filesystem at all? i.e. why should the filesystem
have to do this?

> My understand is that that sequence is as safe as a barrier, but maybe
> not as fast.

Yes, and my understanding is that the block device is perfectly capable
of implementing this just as safely as the filesystem.

> The patch looks at least believable.  As you can imagine it is awkward
> to test thoroughly.

As well as being pretty much impossible to test reliably with an
automated testing framework. Hence so ongoing test coverage will
approach zero.....


Dave Chinner
Principal Engineer
SGI Australian Software Group

<Prev in Thread] Current Thread [Next in Thread>