On Wed, Jan 15, 2014 at 04:52:05PM -0600, Eric Sandeen wrote:
> On 1/15/14, 4:38 PM, Dave Chinner wrote:
> > On Wed, Jan 15, 2014 at 11:59:45AM -0600, Eric Sandeen wrote:
> >> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> >> index 33ad9a7..1f3431f 100644
> >> --- a/fs/xfs/xfs_ioctl.c
> >> +++ b/fs/xfs/xfs_ioctl.c
> >> @@ -1587,7 +1587,12 @@ xfs_file_ioctl(
> >> XFS_IS_REALTIME_INODE(ip) ?
> >> mp->m_rtdev_targp : mp->m_ddev_targp;
> >> - da.d_mem = da.d_miniosz = 1 << target->bt_sshift;
> >> + /*
> >> + * Report device physical sector size as "optimal" minimum,
> >> + * unless blocksize is smaller than that.
> >> + */
> >> + da.d_miniosz = min(target->bt_pssize, target->bt_bsize);
> > Just grab the filesysetm block size from the xfs_mount:
> > da.d_miniosz = min(target->bt_pssize, mp->m_sb.sb_blocksize);
> >> + da.d_mem = da.d_miniosz;
> > I'd suggest that this should be PAGE_SIZE - it's for memory buffer
> > alignment, not IO alignment, so using the IO alignment just seems
> > wrong to me...
> Ok. Was just sticking close to what we had before.
> da.d_miniosz = min(target->bt_pssize, mp->m_sb.sb_blocksize);
> da.d_mem = PAGE_SIZE;
> ? Then we can have a minimum IO size of 512, and a memory alignment of
> 4k, isn't that a little odd?
> (IOWs we could do 512-aligned memory before, right? What's the downside,
> or the value in changing it now?)
We can do arbitrary byte aligned buffers if I understand
get_user_pages() correctly - it just maps the page under the buffer
into the kernel address space and then the bio is pointed at it.
AFAICT, the reason for the "memory buffer needs 512 byte alignment"
is simply that this is the minimum IO size supported.
However, for large IOs, 512 byte alignment is not really optimal. If
we don't align the buffer to PAGE_SIZE, then we have partial head
and tail pages, so for a 512k IO we need to map 129 pages into a bio
instead of 128. When you have hardware that can only handle 128
segments in a DMA transfer, that means the 512k IO needs to be sent
in two IOs rather than one.
There's quite a bit of hardware out there that have a limit of 128
segments to each IO....