[Top] [All Lists]

Re: [PATCH] dio: track and serialise unaligned direct IO

To: Mingming Cao <cmm@xxxxxxxxxx>
Subject: Re: [PATCH] dio: track and serialise unaligned direct IO
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 4 Aug 2010 08:56:58 +1000
Cc: Matthew Wilcox <matthew@xxxxxx>, linux-fsdevel@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx, sandeen@xxxxxxxxxxx
In-reply-to: <1280856865.2436.31.camel@mingming-laptop>
References: <1280443516-14448-1-git-send-email-david@xxxxxxxxxxxxx> <20100730025324.GO25774@xxxxxxxxxxxxxxxx> <20100730045331.GA2126@dastard> <1280856865.2436.31.camel@mingming-laptop>
User-agent: Mutt/1.5.20 (2009-06-14)
On Tue, Aug 03, 2010 at 10:34:25AM -0700, Mingming Cao wrote:
> On Fri, 2010-07-30 at 14:53 +1000, Dave Chinner wrote:
> > On Thu, Jul 29, 2010 at 08:53:24PM -0600, Matthew Wilcox wrote:
> > > On Fri, Jul 30, 2010 at 08:45:16AM +1000, Dave Chinner wrote:
> > > > If we get two unaligned direct IO's to the same filesystem block
> > > > that is marked as a new allocation (i.e. buffer_new), then both IOs will
> > > > zero the portion of the block they are not writing data to. As a
> > > > result, when the IOs complete there will be a portion of the block
> > > > that contains zeros from the last IO to complete rather than the
> > > > data that should be there.
> > I don't want any direct IO for XFS to go through the page cache -
> > unaligned or not. using the page cache for the unaligned blocks
> > would also be much worse for performance that this method because it
> > turns unaligned direct IO into 3 IOs - the unaligned head block, the
> > aligned body and the unaligned tail block. It would also be a
> > performance hit you take on every single dio, whereas this way the
> > hit is only taken when an overlap is detected.
> Does this problem also possible for DIO and non AIO case? (Ext4 case
> this only happy with AIO+DIO+unaligned).  If not, could we simply force
> unaligned AIO+DIO to be synchronous? Still direct IO...

There is nothing specific to AIO about this bug. XFS (at least)
allows concurrent DIO writes to the same inode regardless of whether
they are dispatched via AIO or multiple separate threads and so the
race condition exists outside just the AIO context...

> > And besides, such decisions on whether to use buffered IO have to be
> > made high up in the filesystem when we are deciding how to lock the
> > inode for the dio - buffered IO requires exclusive locking, not the
> > shared locking we do for dio writes. That further reduces the
> > performance of unaligned direct IO even when there are no overlaps,
> > and it's a solution that every filesystem needs to implement
> > themselves in some way.
> > 
> > I've looked at a couple of different ways to fix this (e.g. passing
> > the unaligned state to get_blocks() to allow the filesystem to
> > serialise there) but they've all died a horrible death of dodgy
> > locking and hacky IO completion detection. not to mention
> > requiring a different solution in every filesystem.
> I also have been thinking other ways to fix this, initial thoughts about
> this but concerned about the scalability. 
> I was looking at ways to use buffer head state to indicate any unaligned
> AIO DIO write to buffer_new buffers(do_direct_IO set the state), then
> for any futhur unaligned IO need to wait for the AIO+DIO complete on
> that buffer.

That's effectively what my fix does, except it does not overload
bufferheads to do it.

> but the buffer head passed from do_direct_IO to
> get_blocks() is a dummy buffer, have to get the buffer head from
> device...

Which, IMO, seems like a dangerous layering violation - it's mixing
device associated bufferheads with filesystem IO (i.e. from
different address spaces) and, as such, the two are never guaranteed
to be coherent.  To make matters more complex, XFS can have at least two
block devices (data and real-time) that direct IO is targeted at, so
doing something that is block device specific seems much more
complex than just tracking the IO in a separate list...

> this would only impact to all IOs that are really conflict with
> pending AIO DIOs... it should work for ext4 case...would this something
> usable for XFS? I have got the approach started last week but not very
> far.

It might, but I don't think it's a viable approach.


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>