kernel bug in xfs_lrw.c (centos v5.5, directio, aio)

Eric Sandeen sandeen at sandeen.net
Wed Aug 18 20:38:33 CDT 2010


On Aug 18, 2010, at 8:34 PM, Dave Chinner <david at fromorbit.com> wrote:

> On Wed, Aug 18, 2010 at 07:47:09PM -0500, Eric Sandeen wrote:
>> On Aug 18, 2010, at 6:43 AM, Dave Chinner <david at fromorbit.com> wrote:
>> 
>>> On Tue, Aug 17, 2010 at 07:12:12PM +0530, Nohez wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I had a kernel bug today when running xfs on CentOS v5.5. I moved to
>>>> xfs from ext3 today.
>>>> 
>>>> The only application accessing the xfs filesystem is Sybase ASE v15.x.
>>>> Database has been configured to use directio with native kernel
>>>> asynchronous disk i/o enabled.
>>> 
>>> The warning is being issued because the application is mixing
>>> buffered IO with direct IO on the same file. i.e. data corruption
>>> waiting to happen. This is an application bug - the responsibility
>>> for ensuring data coherency and integrity is assumed by the
>>> application issuing the direct IO.
>>> 
>> You know... A clearer kernel message might help a lot here...
> 
> Yeah, probably would given we've had more reports of this in the
> last month or two than we've had in the last five years. What sort
> of text do you think we should add? I'd argue on the scary side,
> say:
> 
> "XFS: filesystem 〈blah>: detected potential data corruption issue
> caused by application(s) mixing concurrent buffered and direct IO to
> the same inode. Inode #12345, pid 6789. Please report this issue
> to your application vendor."
> 
> What do you think?
> 
Plenty verbose, might want to limit/throttle it, but sure.  Maybe include current->comm?

-Eric

> As it is, I suspect that the test for this race condition will
> need to change somewhat with range-based flushing now working.
> Just checking mapping->nr_pages is not sufficient anymore, I think.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david at fromorbit.com
> 




More information about the xfs mailing list