On Tue, Jan 17, 2012 at 02:19:52PM -0500, Zheng Da wrote:
> On Mon, Jan 16, 2012 at 6:25 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > > 0xffffffff81288b6a : xfs_aio_write_newsize_update+0x3a/0x90 [kernel]
> > Only ever taken when doing appending writes. Are you -sure- you are
> > not doing appending writes?
> This is weird. Yes, I'm sure. I use pwrite() to write data to a 4G file,
> and I check the offset of each write and they are always smaller than 4G.
> I instrument the code with systemtap and it shows me that ip->i_new_size
> and new_size in xfs_aio_write_newsize_update are both 0.
> Since in my case there is only overwrite, ip->i_new_size will always be 0
> (the only place that updates ip->i_new_size is xfs_file_aio_write_checks).
> Because of the same reason, new_size returned by xfs_file_aio_write_checks
> is always 0.
> Is it what you expected?
No idea. I don't know what the problem you are seeing is yet, or if
indeed there even is a problem as I don't really understand what you
are trying to do or what results you are expecting to see...
Indeed, have you run the test on something other than a RAM disk and
confirmed that the problem exists on a block device that has real IO
latency? If your IO takes close to zero time, then there isn't any
IO level concurrency you can extract from single file direct IO; it
will all just serialise on the extent tree lookups.
> > > 0xffffffff812829f4 : __xfs_get_blocks+0x94/0x4a0 [kernel]
> > And for direct IO writes, this will be the block mapping lookup so
> > always hit.
> > What this says to me is that you are probably doing is lots of very
> > small concurrent write IOs, but I'm only guessing. Can you provide
> > your test case and a description of your test hardware so we can try
> > to reproduce the problem?
> I build XFS on the top of ramdisk. So yes, there is a lot of small
> concurrent writes in a second.
> I create a file of 4GB in XFS (the ramdisk has 5GB of space). My test
> program overwrites 4G of data to the file and each time writes a page of
> data randomly to the file. It's always overwriting, and no appending. The
> offset of each write is always aligned to the page size. There is no
> overlapping between writes.
Why are you using XFS for this? tmpfs was designed to do this sort
of stuff as efficiently as possible....
> So the test case is pretty simple and I think it's easy to reproduce it.
> It'll be great if you can try the test case.
Can you post your test code so I know what I test is exactly what
you are running?