xfs
[Top] [All Lists]

Re: kernel bug in xfs_lrw.c (centos v5.5, directio, aio)

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: kernel bug in xfs_lrw.c (centos v5.5, directio, aio)
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Wed, 18 Aug 2010 20:38:33 -0500
Cc: Nohez <nohez@xxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
In-reply-to: <20100819013433.GP7362@dastard>
References: <alpine.LNX.2.00.1008171906220.21398@xxxxxxxxxxxxxxxxxx> <20100818114305.GR10429@dastard> <341C002D-C57C-4F73-8B36-5D12B0B91CD5@xxxxxxxxxxx> <20100819013433.GP7362@dastard>
On Aug 18, 2010, at 8:34 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:

> On Wed, Aug 18, 2010 at 07:47:09PM -0500, Eric Sandeen wrote:
>> On Aug 18, 2010, at 6:43 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> 
>>> On Tue, Aug 17, 2010 at 07:12:12PM +0530, Nohez wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I had a kernel bug today when running xfs on CentOS v5.5. I moved to
>>>> xfs from ext3 today.
>>>> 
>>>> The only application accessing the xfs filesystem is Sybase ASE v15.x.
>>>> Database has been configured to use directio with native kernel
>>>> asynchronous disk i/o enabled.
>>> 
>>> The warning is being issued because the application is mixing
>>> buffered IO with direct IO on the same file. i.e. data corruption
>>> waiting to happen. This is an application bug - the responsibility
>>> for ensuring data coherency and integrity is assumed by the
>>> application issuing the direct IO.
>>> 
>> You know... A clearer kernel message might help a lot here...
> 
> Yeah, probably would given we've had more reports of this in the
> last month or two than we've had in the last five years. What sort
> of text do you think we should add? I'd argue on the scary side,
> say:
> 
> "XFS: filesystem 〈blah>: detected potential data corruption issue
> caused by application(s) mixing concurrent buffered and direct IO to
> the same inode. Inode #12345, pid 6789. Please report this issue
> to your application vendor."
> 
> What do you think?
> 
Plenty verbose, might want to limit/throttle it, but sure.  Maybe include 
current->comm?

-Eric

> As it is, I suspect that the test for this race condition will
> need to change somewhat with range-based flushing now working.
> Just checking mapping->nr_pages is not sufficient anymore, I think.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> 

<Prev in Thread] Current Thread [Next in Thread>