xfs
[Top] [All Lists]

Re: XFS buffer IO performance is very poor

To: "Dave Chinner"<david@xxxxxxxxxxxxx>
Subject: Re: XFS buffer IO performance is very poor
From: "yy" <yy@xxxxxxxxxxx>
Date: Fri, 13 Feb 2015 10:20:44 +0800
Cc: "xfs"<xfs@xxxxxxxxxxx>, "Eric Sandeen"<sandeen@xxxxxxxxxxx>, "bfoster"<bfoster@xxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx

Dave,

Thank you very much for your explanation.


I hit this issue when run MySQL on XFS. Direct IO is very import for MySQL on XFS,but I canât found any document explanation this problem.Maybe this will cause great confusion for other MySQL users also, so maybe  this problem should be explained in XFS document.


Best regards,
yy

 ååéä 
åää: Dave Chinner<david@xxxxxxxxxxxxx>
æää: yy<yy@xxxxxxxxxxx>
æé: xfs<xfs@xxxxxxxxxxx>; Eric Sandeen<sandeen@xxxxxxxxxxx>; bfoster<bfoster@xxxxxxxxxx>
åéæé: 2015å2æ13æ(åä)â05:04
äé: Re: XFS buffer IO performance is very poor

On Thu, Feb 12, 2015 at 02:59:52PM +0800, yy wrote:
> In functionxfs_file_aio_read, will requestXFS_IOLOCK_SHARED lock
> for both direct IO and buffered IO:

> so write will prevent read in XFS.
> 
> However, in function generic_file_aio_read for ext3, will not
> lockinode-i_mutex, so write will not prevent read in ext3.
> 
> I think this maybe the reason of poor performance for XFS. I do
> not know if this is a bug, or design flaws of XFS.

This is a bug and design flaw in ext3, and most other Linux
filesystems. Posix states that write() must execute atomically and
so no concurrent operation that reads or modifies data should should
see a partial write. The linux page cache doesn't enforce this - a
read to the same range as a write can return partially written data
on page granularity, as read/write only serialise on page locks in
the page cache.

XFS is the only Linux filesystem that actually follows POSIX
requirements here - the shared/exclusive locking guarantees that a
buffer write completes wholly before a read is allowed to access the
data. There is a down side - you can't run concurrent buffered reads
and writes to the same file - if you need to do that then that's
what direct IO is for, and coherency between overlapping reads and
writes is then the application's problem, not the filesystem...

Maybe at some point in the future we might address this with ranged
IO locks, but there really aren't many multithreaded programs that
hit this issue...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
<Prev in Thread] Current Thread [Next in Thread>