[Top] [All Lists]

Re: kernel bug in xfs_lrw.c (centos v5.5, directio, aio)

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: kernel bug in xfs_lrw.c (centos v5.5, directio, aio)
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 19 Aug 2010 11:34:33 +1000
Cc: Nohez <nohez@xxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
In-reply-to: <341C002D-C57C-4F73-8B36-5D12B0B91CD5@xxxxxxxxxxx>
References: <alpine.LNX.2.00.1008171906220.21398@xxxxxxxxxxxxxxxxxx> <20100818114305.GR10429@dastard> <341C002D-C57C-4F73-8B36-5D12B0B91CD5@xxxxxxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Wed, Aug 18, 2010 at 07:47:09PM -0500, Eric Sandeen wrote:
> On Aug 18, 2010, at 6:43 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > On Tue, Aug 17, 2010 at 07:12:12PM +0530, Nohez wrote:
> >> 
> >> Hi,
> >> 
> >> I had a kernel bug today when running xfs on CentOS v5.5. I moved to
> >> xfs from ext3 today.
> >> 
> >> The only application accessing the xfs filesystem is Sybase ASE v15.x.
> >> Database has been configured to use directio with native kernel
> >> asynchronous disk i/o enabled.
> > 
> > The warning is being issued because the application is mixing
> > buffered IO with direct IO on the same file. i.e. data corruption
> > waiting to happen. This is an application bug - the responsibility
> > for ensuring data coherency and integrity is assumed by the
> > application issuing the direct IO.
> > 
> You know... A clearer kernel message might help a lot here...

Yeah, probably would given we've had more reports of this in the
last month or two than we've had in the last five years. What sort
of text do you think we should add? I'd argue on the scary side,

"XFS: filesystem 〈blah>: detected potential data corruption issue
caused by application(s) mixing concurrent buffered and direct IO to
the same inode. Inode #12345, pid 6789. Please report this issue
to your application vendor."

What do you think?

As it is, I suspect that the test for this race condition will
need to change somewhat with range-based flushing now working.
Just checking mapping->nr_pages is not sufficient anymore, I think.


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>