I will look at the hardware. But, I think, there's also a possible
software problem here.
If you look at the sequence of events, first a tmp file is created in
<mount-point>/tmp/tmp_blah. After a few writes, this file is renamed
to a different path in the filesystem.
The "tmp" directory above is created only once. Temp files get created
inside it and then get renamed. We wondered if this causes disk layout
issues resulting in slower performance. And then, we stumbled upon
this. Someone complaining about the exact same problem.
One quick way to validate this was to delete the "tmp" directory
periodically and see what numbers we get. And they do. With 15 runs of
writing 80K objects in each run, our performance was dropping from
~100MB/s to 30MB/s. With deleting the tmp directory after each run, we
saw the performance only drop from ~100MB/s to 80MB/s.
The explanation in the link below says that when xfs does not find
free extents in an existing allocation group, it frees up the extents
by copying data from existing extents to their target allocation group
(which happens because of renames). Is that explanation still valid?
Thanks in advance.
On Thu, Apr 23, 2015 at 11:15 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Thu, Apr 23, 2015 at 04:48:51PM -0700, Shrinand Javadekar wrote:
>> > from the iostat log:
>> > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
>> > avgqu-sz await r_await w_await svctm %util
>> > .....
>> > dm-6 0.00 0.00 0.20 22.40 0.00 0.09 8.00
>> > 22.28 839.01 1224.00 835.57 44.25 100.00
>> > dm-7 0.00 0.00 0.00 1.20 0.00 0.00 8.00
>> > 2.82 1517.33 0.00 1517.33 833.33 100.00
>> > dm-8 0.00 0.00 0.00 195.20 0.00 0.76 8.00
>> > 1727.51 4178.89 0.00 4178.89 5.12 100.00
>> > ...
>> > dm-7 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>> > 1.00 0.00 0.00 0.00 0.00 100.00
>> > dm-8 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>> > 1178.85 0.00 0.00 0.00 0.00 100.00
>> > dm-7 is showing almost a second for single IO wait times, when it is
>> > actually completing IO. dm-8 has a massive queue depth - I can only
>> > assume you've tuned sys/block/*/queue/nr_requests to something
>> > really large? But like dm-7, it's showing very long IO times, and
>> > that's likely the source of your latency problems.
>> I see that /sys/block/*/queue/nr_requests is set to 128 which is way
>> less than the queue depth shown in the iostat numbers. What gives?
> No idea, but it's indicative of a problem below XFS. Work out what
> is happening with your storage hardware first, then work your way up
> the stack...
>> One other observation we had was that xfs shows a large amount of
>> directory fragmentation. Directory fragmentation was shown at ~40%
>> whereas file fragmentation was very low at 0.1%.
> Pretty common. Directories are only accessed a single block at a
> time, and sequential offset reads are pretty rare, so fragmentation
> makes little difference to performance. You're seeing almost zero
> read IO load, so the directory layout is not a concern for this
> Dave Chinner