concurrent direct IO write in xfs
zhengda1936 at gmail.com
Mon Jan 23 13:34:34 CST 2012
On Mon, Jan 23, 2012 at 12:11 AM, Dave Chinner <david at fromorbit.com> wrote:
> > >
> > This is weird. Yes, I'm sure. I use pwrite() to write data to a 4G file,
> > and I check the offset of each write and they are always smaller than 4G.
> > I instrument the code with systemtap and it shows me that ip->i_new_size
> > and new_size in xfs_aio_write_newsize_update are both 0.
> > Since in my case there is only overwrite, ip->i_new_size will always be 0
> > (the only place that updates ip->i_new_size is
> > Because of the same reason, new_size returned by
> > is always 0.
> > Is it what you expected?
> No idea. I don't know what the problem you are seeing is yet, or if
> indeed there even is a problem as I don't really understand what you
> are trying to do or what results you are expecting to see...
Here I was just wondering if i_new_size is always 0 if there are only
overwrites. I think it has nothing to do with the pattern of my workloads
or the device I used for the test.
> Indeed, have you run the test on something other than a RAM disk and
> confirmed that the problem exists on a block device that has real IO
> latency? If your IO takes close to zero time, then there isn't any
> IO level concurrency you can extract from single file direct IO; it
> will all just serialise on the extent tree lookups.
It's difficult to test the scalability problem in the traditional disks.
They provide very low IOPS (IO per second). Even two SSDs can't provide
I don't think all direct IO will serialized on the extent tree lookups.
Direct IO reads can parallelized pretty well and they also need extent tree
> > > > 0xffffffff812829f4 : __xfs_get_blocks+0x94/0x4a0 [kernel]
> > >
> > > And for direct IO writes, this will be the block mapping lookup so
> > > always hit.
> > >
> > >
> > > What this says to me is that you are probably doing is lots of very
> > > small concurrent write IOs, but I'm only guessing. Can you provide
> > > your test case and a description of your test hardware so we can try
> > > to reproduce the problem?
> > >
> > I build XFS on the top of ramdisk. So yes, there is a lot of small
> > concurrent writes in a second.
> > I create a file of 4GB in XFS (the ramdisk has 5GB of space). My test
> > program overwrites 4G of data to the file and each time writes a page of
> > data randomly to the file. It's always overwriting, and no appending. The
> > offset of each write is always aligned to the page size. There is no
> > overlapping between writes.
> Why are you using XFS for this? tmpfs was designed to do this sort
> of stuff as efficiently as possible....
OK, I can try that.
> > So the test case is pretty simple and I think it's easy to reproduce it.
> > It'll be great if you can try the test case.
> Can you post your test code so I know what I test is exactly what
> you are running?
I can do that. My test code gets very complicated now. I need to simplify
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the xfs