Thanks for the numbers, though there are enough variables here that it's
hard to make any hard conclusions.
When I've seen these comparisons in the past, it turned out to be one of
two things:
1) The system with the smaller I/Os (I/O = unit seen by the device) had
more CPU time per megabyte in the code path to start I/O, so that it
started less I/O. The small I/Os are a consequence of the lower
throughput, not a cause. You can often rule this out just by looking at
CPU utilization.
2) The system with the smaller I/Os had a window tuning problem in which
it was waiting for previous I/O to complete before starting more, with
queues not full, and thus starting less I/O. Some devices, with good
intentions, suck the Linux queue dry, one tiny I/O at a time, and then
perform miserably processing those tiny I/Os. Properly tuned, the device
would buffer fewer I/Os and thus let the queues build inside Linux and
thus cause Linux to send larger I/Os.
People have done ugly queue plugging algorithms to try to defeat this
queue sucking by withholding I/O from a device willing to take it. Others
defeat it by withholding I/O from a willing Linux block layer, instead
saving up I/O and submitting it in large bios.
>Ext3 (writeback mode)
>
>Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await >svctm %util
>sdc 0.00 21095.60 21.00 244.40 168.00 170723.20 84.00
85361.60 643.90 11.15 42.15 > 3.45 91.60
>
>We see 21k merges per second going on, and an average request size of
>only 643 sectors where the device can handle up to 1Mb (2048 sectors).
>
>Here is iostat from the same test w/ JFS instead:
>
>Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await >svctm %util
>sdc 0.00 1110.58 0.00 97.80 0.00 201821.96 0.00
100910.98 2063.53 117.09 1054.11 >10.21 99.84
>
>So, in this case I think it is making a difference 1k merges and a big
difference in
>throughput, though there could be other issues.
--
Bryan Henderson IBM Almaden Research Center
San Jose CA Filesystems
|