On Thu, Jun 04, 2015 at 12:26:19AM -0700, Shrinand Javadekar wrote:
> I made two changes based on the suggestions above:
> 1. Reverted the agcount back to the default: 4.
> 2. Bumped the directory block size to 8k (-n size=8k)
> This definitely has made things better. My throughput for one run of
> my 40GB (5GB on each disk) test has gone up from ~70MB/s to 88MB/s.
> The pauses started off being very small : ~1 sec. Right now, with 20GB
> data in each disk, I see the pauses are ~4 seconds.
> I ran echo w > /proc/sysrq-trigger as soon as the system went into one
> of these pauses. Attached here is the output of dmesg after that. I'm
Ok, it didn't catch anything blocked, just dumped scheduler info for
each CPU. But the fact the changes had a positive impact means we
are probably on the right track.
> going to run a test overnight to see how it performs. Especially, how
> big do the pauses get as more and more data is written into the
> Also, unfortunately, I don't have a kernel dev setup ready to try out
> the patch immediately. I will try and setup the environment to try it
Ok, I'll be doing more testing here on it, but it would be great if
you could see what difference it makes and report back. No hurry,
such a change is probably too late for the next merge window, so
there's plenty of time to get it right...
Thanks for all the time you've spent triaging this problem so far!