On 1/16/14, 5:21 PM, Dave Chinner wrote:
> On Wed, Jan 15, 2014 at 04:52:05PM -0600, Eric Sandeen wrote:
>> On 1/15/14, 4:38 PM, Dave Chinner wrote:
>>> On Wed, Jan 15, 2014 at 11:59:45AM -0600, Eric Sandeen wrote:
>>>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>>>> index 33ad9a7..1f3431f 100644
>>>> --- a/fs/xfs/xfs_ioctl.c
>>>> +++ b/fs/xfs/xfs_ioctl.c
>>>> @@ -1587,7 +1587,12 @@ xfs_file_ioctl(
>>>> XFS_IS_REALTIME_INODE(ip) ?
>>>> mp->m_rtdev_targp : mp->m_ddev_targp;
>>>> - da.d_mem = da.d_miniosz = 1 << target->bt_sshift;
>>>> + /*
>>>> + * Report device physical sector size as "optimal" minimum,
>>>> + * unless blocksize is smaller than that.
>>>> + */
>>>> + da.d_miniosz = min(target->bt_pssize, target->bt_bsize);
>>> Just grab the filesysetm block size from the xfs_mount:
>>> da.d_miniosz = min(target->bt_pssize, mp->m_sb.sb_blocksize);
>>>> + da.d_mem = da.d_miniosz;
>>> I'd suggest that this should be PAGE_SIZE - it's for memory buffer
>>> alignment, not IO alignment, so using the IO alignment just seems
>>> wrong to me...
>> Ok. Was just sticking close to what we had before.
>> da.d_miniosz = min(target->bt_pssize, mp->m_sb.sb_blocksize);
>> da.d_mem = PAGE_SIZE;
>> ? Then we can have a minimum IO size of 512, and a memory alignment of
>> 4k, isn't that a little odd?
>> (IOWs we could do 512-aligned memory before, right? What's the downside,
>> or the value in changing it now?)
> We can do arbitrary byte aligned buffers if I understand
> get_user_pages() correctly - it just maps the page under the buffer
> into the kernel address space and then the bio is pointed at it.
> AFAICT, the reason for the "memory buffer needs 512 byte alignment"
> is simply that this is the minimum IO size supported.
Actually, it's fs/direct-io.c which enforces this (not sure why I couldn't
find that before), in do_blockdev_direct_IO(); the *enforced* minimum
memory alignment is the size of the bev's logical block size.
> However, for large IOs, 512 byte alignment is not really optimal. If
> we don't align the buffer to PAGE_SIZE, then we have partial head
> and tail pages, so for a 512k IO we need to map 129 pages into a bio
> instead of 128. When you have hardware that can only handle 128
> segments in a DMA transfer, that means the 512k IO needs to be sent
> in two IOs rather than one.
Ok, but I have a bit of a problem with changing what XFS_IOC_DIOINFO
reports. (I had originally thought to change the minimum IO size, but
I have talked myself out of that, too).
The xfsctl(3) manpage says that XFS_IOC_DIOINFO: "Get(s) information
required to perform direct I/O on the specified file descriptor."
and "the user’s data buffer must conform to the same type of
constraints as required for accessing a raw disk partition."
IOWs, the ioctl is documented as returning minimum, not optimal,
requirements, and it has always been implemented as such. Changing
its meaning now seems wrong. At least, I would not like to do so
as part of this functional change; to do so would probably be best
in a new ioctl which reports both minimum & optimal sizes. And at
that point we should just do a vfs interface. :)
Of course, applications don't have to use the minimum sizes reported
by the ioctl; they are free to be smarter and do larger sizes and
alignments. But if the ioctl was designed and documented to report
required *minimums*, I think we should leave it as such.
I'm going to resend the change, split up a bit more to separate
cleanups from functional changes, and maybe we can discuss the ioctl
change as a potential additional patch.
> There's quite a bit of hardware out there that have a limit of 128
> segments to each IO....