concurrent direct IO write in xfs

Zheng Da zhengda1936 at gmail.com
Mon Jan 23 13:34:34 CST 2012


Hello,

On Mon, Jan 23, 2012 at 12:11 AM, Dave Chinner <david at fromorbit.com> wrote:
>
> > >
> > This is weird. Yes, I'm sure. I use pwrite() to write data to a 4G file,
> > and I check the offset of each write and they are always smaller than 4G.
> > I instrument the code with systemtap and it shows me that ip->i_new_size
> > and new_size in xfs_aio_write_newsize_update are both 0.
> > Since in my case there is only overwrite, ip->i_new_size will always be 0
> > (the only place that updates ip->i_new_size is
> xfs_file_aio_write_checks).
> > Because of the same reason, new_size returned by
> xfs_file_aio_write_checks
> > is always 0.
> > Is it what you expected?
>
> No idea. I don't know what the problem you are seeing is yet, or if
> indeed there even is a problem as I don't really understand what you
> are trying to do or what results you are expecting to see...
>
Here I was just wondering if i_new_size is always 0 if there are only
overwrites. I think it has nothing to do with the pattern of my workloads
or the device I used for the test.

>
> Indeed, have you run the test on something other than a RAM disk and
> confirmed that the problem exists on a block device that has real IO
> latency? If your IO takes close to zero time, then there isn't any
> IO level concurrency you can extract from single file direct IO; it
> will all just serialise on the extent tree lookups.
>
It's difficult to test the scalability problem in the traditional disks.
They provide very low IOPS (IO per second). Even two SSDs can't provide
enough IOPS.
I don't think all direct IO will serialized on the extent tree lookups.
Direct IO reads can parallelized pretty well and they also need extent tree
lookups.

>
> > > >  0xffffffff812829f4 : __xfs_get_blocks+0x94/0x4a0 [kernel]
> > >
> > > And for direct IO writes, this will be the block mapping lookup so
> > > always hit.
> > >
> > >
> > > What this says to me is that you are probably doing is lots of very
> > > small concurrent write IOs, but I'm only guessing.  Can you provide
> > > your test case and a description of your test hardware so we can try
> > > to reproduce the problem?
> > >
> > I build XFS on the top of ramdisk. So yes, there is a lot of small
> > concurrent writes in a second.
> > I create a file of 4GB in XFS (the ramdisk has 5GB of space). My test
> > program overwrites 4G of data to the file and each time writes a page of
> > data randomly to the file. It's always overwriting, and no appending. The
> > offset of each write is always aligned to the page size. There is no
> > overlapping between writes.
>
> Why are you using XFS for this? tmpfs was designed to do this sort
> of stuff as efficiently as possible....
>
OK, I can try that.

>
> > So the test case is pretty simple and I think it's easy to reproduce it.
> > It'll be great if you can try the test case.
>
> Can you post your test code so I know what I test is exactly what
> you are running?
>
I can do that. My test code gets very complicated now. I need to simplify
it.

Thanks,
Da
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20120123/911a811e/attachment.htm>


More information about the xfs mailing list