Hello Dave, <br><br><div class="gmail_quote">On Mon, Jan 23, 2012 at 10:54 PM, Dave Chinner <span dir="ltr"><<a href="mailto:david@fromorbit.com">david@fromorbit.com</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">
> >> > So the test case is pretty simple and I think it's easy to reproduce it.<br>
> >> > It'll be great if you can try the test case.<br>
> >><br>
> >> Can you post your test code so I know what I test is exactly what<br>
> >> you are running?<br>
> >><br>
> > I can do that. My test code gets very complicated now. I need to simplify<br>
> > it.<br>
> ><br>
> Here is the code. It's still a bit long. I hope it's OK.<br>
> You can run the code like "rand-read file option=direct pages=1048576<br>
> threads=8 access=write/read".<br>
<br>
</div>With 262144 pages on a 2Gb ramdisk, the results I get on 3.2.0 are<br>
<br>
Threads Read Write<br>
1 0.92s 1.49s<br>
2 0.51s 1.20s<br>
4 0.31s 1.34s<br>
8 0.22s 1.59s<br>
16 0.23s 2.24s<br>
<br>
the contention is on the ip->i_ilock, and the newsize update is one<br>
of the offenders It probably needs this change to<br>
xfs_aio_write_newsize_update():<br>
<br>
- if (new_size == ip->i_new_size) {<br>
+ if (new_size && new_size == ip->i_new_size) {<br>
<br>
to avoid the lock being taken here.<br>
<br>
But all that newsize crap is gone in the current git Linus tree,<br>
so how much would that gains us:<br>
<br>
Threads Read Write<br>
1 0.88s 0.85s<br>
2 0.54s 1.20s<br>
4 0.31s 1.23s<br>
8 0.27s 1.40s<br>
16 0.25s 2.36s<br>
<br>
Pretty much nothing. IOWs, it's just like I suspected - you are<br>
doing so many write IOs that you are serialising on the extent<br>
lookup and write checks which use exclusive locking..<br>
<br>
Given that it is 2 lock traversals per write IO, we're limiting at<br>
about 4-500,000 exclusive lock grabs per second and decreasing as<br>
contention goes up.<br>
<br>
For reads, we are doing 2 shared (nested) lookups per read IO, we<br>
appear to be limiting at around 2,000,000 shared lock grabs per<br>
second. Ahmdals law is kicking in here, but it means if we could<br>
make the writes to use a shared lock, it would at least scale like<br>
the reads for this "no metadata modification except for mtime"<br>
overwrite case.<br>
<br>
I don't think that the generic write checks absolutely need<br>
exclusive locking - we probably could get away with a shared lock<br>
and only fall back to exclusive when we need to do EOF zeroing.<br>
Similarly, for the block mapping code if we don't need to do<br>
allocation, a shared lock is all we need. So maybe in that case for<br>
direct IO when create == 1, we can do a read lookup first and only<br>
grab the lock exclusively if that falls in a hole and requires<br>
allocation.....</blockquote><div><br></div><div>Do you think if you will provide a patch for the changes?</div><div><br></div><div>Thanks,</div><div>Da</div></div>