On Mon, Jul 18, 2011 at 10:03:17PM -0400, Christoph Hellwig wrote:
> Generally looks okay, but doing a context switch in every log force
> might bite us. Less the general context switch overhead, but more
> the nasty interactions with cfq, which are causing huge problems
> for ext3/4,
Quite frankly, I don't recommend CFQ unless you need block level
throttling or use IO prioritisation seriously. CFQ is way too smart
for it's own good trying to do everything for everyone, and as such
suffers from different regressions every release. It has weird
workload specific heuristics in it to try to address issues that
don't solve the general class of problem, and so is always being
patched to fix the next occurrence of the same problem. e.g. the IO
stalls caused by dependent IOs being issued by different threads
that ext3/4 fsync hits all the time.
> with no good way to fix them for workqueues.
Right, which I pointed out to them last time round of ext4 specific
hacks that tried to tell the journal thread that it's IO had
And let's face it - every time we move IO into a workqueue, we
introduce new cases of IO dependencies between threads. e.g.
anything waiting on a log force in progress is already dependent on
dispatch from a different thread, so the xfssyncd xfsaild and busy
extent log forces all will suffer to some extent from CFQ's existing
deficiencies in this regard. Moving the log IO into a workqueue
doesn't change this at all....
I'm of the opinion that anyone with a RAID controller with a BBWC
doesn't need the smarts in CFQ because the BBWC provides a much
larger and smarter IO re-order window than the Linux IO schedulers
and hence do a better job of IO scheduling than Linux can ever do.
We shouldn't penalise the target market for XFS for having fast
storage by catering to difficiencies of IO schedulers that are
mostly redundant for the hardware XFS typically runs on....