[Top] [All Lists]

Re: XFS: Abysmal write performance because of excessive seeking (allocat

To: stan@xxxxxxxxxxxxxxxxx
Subject: Re: XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?)
From: Stefan Ring <stefanrin@xxxxxxxxx>
Date: Tue, 10 Apr 2012 08:11:24 +0200
Cc: Linux fs XFS <xfs@xxxxxxxxxxx>
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=4au5lAk2M6b1bHJayiVMWRLO65UMrdKjMdh/snBSqIU=; b=SwAc5tRxLJXY+NpyOM+WRv4AGzIUykV/0Im3kAA1w5o07mELQ2qyhdKvTCk8UdDFCZ Q0Pt22ErzGxZ1udDC5PYzchWwQjndG4eHkgVMRbRMceH7jR/PllJPsOYUB5hN483Ms0v K+2JtcysuAzHgqwQf03nG/LFSRYb1Cpqe9ipx+QAwuvCkRXipuzMPP4M/jqBBtvQebme KPsBt2LqpZ5j8wO6XETSt4t3LVP98a1enjjIENRcg3jaByq6KD5tkvljtySx1YUBXd1U PYlH4t1VLk2wglFIoCNlCadag3OPhEs5zlYy4hcY2RgyHYl2Lh+/uNv+HUUgTs0WKDPc Azsg==
In-reply-to: <4F8372DC.7030405@xxxxxxxxxxxxxxxxx>
References: <CAAxjCEwBMbd0x7WQmFELM8JyFu6Kv_b+KDe3XFqJE6shfSAfyQ@xxxxxxxxxxxxxx> <20350.9643.379841.771496@xxxxxxxxxxxxxxxxxx> <20350.13616.901974.523140@xxxxxxxxxxxxxxxxxx> <CAAxjCEzkemiYin4KYZX62Ei6QLUFbgZESdwS8krBy0dSqOn6aA@xxxxxxxxxxxxxx> <4F7F7C25.8040605@xxxxxxxxxxxxxxxxx> <CAAxjCEyJW1b4dbKctbrgdWjykQt8Hb4Sw1RKdys3oUsehNHCcQ@xxxxxxxxxxxxxx> <4F8055E4.1000808@xxxxxxxxxxxxxxxxx> <CAAxjCEz8TpRvjvbuYPp1xf9X2HwskN5AuPak62R5Jhkg+mmFHA@xxxxxxxxxxxxxx> <4F8372DC.7030405@xxxxxxxxxxxxxxxxx>
> 150MB/s isn't correct.  Should be closer to 450MB/s.  This makes it
> appear that you're writing all these files to a single directory.  If
> you're writing them fairly evenly to 3 directories or a multiple of 3,
> you should see close to 450MB/s, if using mdraid linear over 3 P400
> RAID1 pairs.  If this is what you're doing then something seems wrong
> somewhere.  Try unpacking a kernel tarball.  Lots of subdirectories to
> exercise all 3 AGs thus all 3 spindles.

The spindles were exercised; I watched it with iostat. Maybe I could
have reached more with more parallelism, but that wasn’t my goal at
all. Although, over the course of these experiments, I got to doubt
that the controller could even handle this data rate.

>> simple copy of the tar onto the XFS file system yields the same linear
>> performance, the same as with ext4, btw. So 150 MB/sec seems to be the
>> best these disks can do, meaning that theoretically, with 3 AGs, it
>> should be able to reach 450 MB/sec under optimal conditions.
> The optimal condition, again, requires writing 3 of this file to 3
> directories to hit ~450MB/s, which you should get close to if using
> mdraid linear over RAID1 pairs.  XFS is a filesystem after all, so it's
> parallelism must come from manipulating usage of filesystem structures.
>  I thought I explained all of this previously when I introduced the "XFS
> concat" into this thread.

The optimal condition would be 3 parallel writes of huge files, which
can be easily written linearly. Not thousands of tiny files.

>> But then I guess I’m back to ext4 land. XFS just doesn’t offer enough
>> benefits in this case to justify the hassle.
> If you were writing to only one directory I can understand this
> sentiment.  Again, if you were writing 3 directories fairly evenly, with
> the md concat, then your sentiment here should be quite different.

Haha, I made a U-turn on this one. XFS is back on the table (and on
the disks now) ;). When I thought I was done, I wanted to restore a
few large KVM images which were on the disks prior to the RAID
reconfiguration. With ext4, I watched iostat writing at 130MB/s for a
while. After 2 or 3 minutes, it broke down completely and languished
at 30-40MB/s for many minutes, even after I had SIGSTOPed the writing
process, during which it was nearly impossible to use vim to edit a
file on the ext4 partition. It would pause for tens of seconds all the
time. It’s not even clear why it broke down so badly. From another
seekwatcher sample I took, it looked like fairly linear writing.

So I threw XFS back in, restarted the restore, and it went very
smoothly while still providing acceptable interactivity.

XFS is not a panacea (obviously), and it may be a bit slower in many
cases, and doesn’t seem to cope well with fragmented free space (which
is what this entire thread is really about), but overall it feels more
well-rounded. After all, I don’t really care how much it writes per
time unit, as long as it’s not ridiculously little and it doesn’t
bring everything else to a halt.

<Prev in Thread] Current Thread [Next in Thread>