Hello,<br><br><div class="gmail_quote">On Mon, Jan 23, 2012 at 12:11 AM, Dave Chinner <span dir="ltr"><<a href="mailto:david@fromorbit.com">david@fromorbit.com</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">
> ><br>
> This is weird. Yes, I'm sure. I use pwrite() to write data to a 4G file,<br>
> and I check the offset of each write and they are always smaller than 4G.<br>
> I instrument the code with systemtap and it shows me that ip->i_new_size<br>
> and new_size in xfs_aio_write_newsize_update are both 0.<br>
> Since in my case there is only overwrite, ip->i_new_size will always be 0<br>
> (the only place that updates ip->i_new_size is xfs_file_aio_write_checks).<br>
> Because of the same reason, new_size returned by xfs_file_aio_write_checks<br>
> is always 0.<br>
> Is it what you expected?<br>
<br>
</div>No idea. I don't know what the problem you are seeing is yet, or if<br>
indeed there even is a problem as I don't really understand what you<br>
are trying to do or what results you are expecting to see...<br></blockquote><div>Here I was just wondering if i_new_size is always 0 if there are only overwrites. I think it has nothing to do with the pattern of my workloads or the device I used for the test.</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Indeed, have you run the test on something other than a RAM disk and<br>
confirmed that the problem exists on a block device that has real IO<br>
latency? If your IO takes close to zero time, then there isn't any<br>
IO level concurrency you can extract from single file direct IO; it<br>
will all just serialise on the extent tree lookups.<br></blockquote><div>It's difficult to test the scalability problem in the traditional disks. They provide very low IOPS (IO per second). Even two SSDs can't provide enough IOPS. </div>
<div>I don't think all direct IO will serialized on the extent tree lookups. Direct IO reads can parallelized pretty well and they also need extent tree lookups.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> > > 0xffffffff812829f4 : __xfs_get_blocks+0x94/0x4a0 [kernel]<br>
> ><br>
> > And for direct IO writes, this will be the block mapping lookup so<br>
> > always hit.<br>
> ><br>
> ><br>
> > What this says to me is that you are probably doing is lots of very<br>
> > small concurrent write IOs, but I'm only guessing. Can you provide<br>
> > your test case and a description of your test hardware so we can try<br>
> > to reproduce the problem?<br>
> ><br>
> I build XFS on the top of ramdisk. So yes, there is a lot of small<br>
> concurrent writes in a second.<br>
> I create a file of 4GB in XFS (the ramdisk has 5GB of space). My test<br>
> program overwrites 4G of data to the file and each time writes a page of<br>
> data randomly to the file. It's always overwriting, and no appending. The<br>
> offset of each write is always aligned to the page size. There is no<br>
> overlapping between writes.<br>
<br>
</div>Why are you using XFS for this? tmpfs was designed to do this sort<br>
of stuff as efficiently as possible....<br></blockquote><div>OK, I can try that. </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
> So the test case is pretty simple and I think it's easy to reproduce it.<br>
> It'll be great if you can try the test case.<br>
<br>
</div>Can you post your test code so I know what I test is exactly what<br>
you are running?<br></blockquote><div>I can do that. My test code gets very complicated now. I need to simplify it.</div><div><br></div><div>Thanks,</div><div>Da</div></div><br>