Question about non asynchronous aio calls.
Avi Kivity
avi at scylladb.com
Wed Oct 7 13:13:06 CDT 2015
On 07/10/15 18:13, Eric Sandeen wrote:
>
> On 10/7/15 10:08 AM, Brian Foster wrote:
>> On Wed, Oct 07, 2015 at 09:24:15AM -0500, Eric Sandeen wrote:
>>>
>>> On 10/7/15 9:18 AM, Gleb Natapov wrote:
>>>> Hello XFS developers,
>>>>
>>>> We are working on scylladb[1] database which is written using seastar[2]
>>>> - highly asynchronous C++ framework. The code uses aio heavily: no
>>>> synchronous operation is allowed at all by the framework otherwise
>>>> performance drops drastically. We noticed that the only mainstream FS
>>>> in Linux that takes aio seriously is XFS. So let me start by thanking
>>>> you guys for the great work! But unfortunately we also noticed that
>>>> sometimes io_submit() is executed synchronously even on XFS.
>>>>
>>>> Looking at the code I see two cases when this is happening: unaligned
>>>> IO and write past EOF. It looks like we hit both. For the first one we
>>>> make special afford to never issue unaligned IO and we use XFS_IOC_DIOINFO
>>>> to figure out what alignment should be, but it does not help. Looking at the
>>>> code though xfs_file_dio_aio_write() checks alignment against m_blockmask which
>>>> is set to be sbp->sb_blocksize - 1, so aio expects buffer to be aligned to
>>>> filesystem block size not values that DIOINFO returns. Is it intentional? How
>>>> should our code know what it should align buffers to?
>>> /* "unaligned" here means not aligned to a filesystem block */
>>> if ((pos & mp->m_blockmask) || ((pos + count) & mp->m_blockmask))
>>> unaligned_io = 1;
>>>
>>> It should be aligned to the filesystem block size.
>>>
>> I'm not sure exactly what kinds of races are opened if the above locking
>> were absent, but I'd guess it's related to the buffer/block state
>> management, block zeroing and whatnot that is buried in the depths of
>> the generic dio code.
> Yep:
>
> commit eda77982729b7170bdc9e8855f0682edf322d277
> Author: Dave Chinner <dchinner at redhat.com>
> Date: Tue Jan 11 10:22:40 2011 +1100
>
> xfs: serialise unaligned direct IOs
>
> When two concurrent unaligned, non-overlapping direct IOs are issued
> to the same block, the direct Io layer will race to zero the block.
> The result is that one of the concurrent IOs will overwrite data
> written by the other IO with zeros. This is demonstrated by the
> xfsqa test 240.
>
> To avoid this problem, serialise all unaligned direct IOs to an
> inode with a big hammer. We need a big hammer approach as we need to
> serialise AIO as well, so we can't just block writes on locks.
> Hence, the big hammer is calling xfs_ioend_wait() while holding out
> other unaligned direct IOs from starting.
>
> We don't bother trying to serialised aligned vs unaligned IOs as
> they are overlapping IO and the result of concurrent overlapping IOs
> is undefined - the result of either IO is a valid result so we let
> them race. Hence we only penalise unaligned IO, which already has a
> major overhead compared to aligned IO so this isn't a major problem.
>
> Signed-off-by: Dave Chinner <dchinner at redhat.com>
> Reviewed-by: Alex Elder <aelder at sgi.com>
> Reviewed-by: Christoph Hellwig <hch at lst.de>
>
> I fixed something similar in ext4 at the time, FWIW.
Makes sense.
Is there a way to relax this for reads? It's pretty easy to saturate
the disk read bandwidth with 4K reads, and there shouldn't be a race
there, at least for reads targeting already-written blocks. For us at
least small reads would be sufficient.
More information about the xfs
mailing list