[Top] [All Lists]

Re: XFS: Abysmal write performance because of excessive seeking (allocat

To: Stefan Ring <stefanrin@xxxxxxxxx>
Subject: Re: XFS: Abysmal write performance because of excessive seeking (allocation groups to blame?)
From: Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>
Date: Mon, 09 Apr 2012 18:38:04 -0500
Cc: Linux fs XFS <xfs@xxxxxxxxxxx>
In-reply-to: <CAAxjCEz8TpRvjvbuYPp1xf9X2HwskN5AuPak62R5Jhkg+mmFHA@xxxxxxxxxxxxxx>
References: <CAAxjCEwBMbd0x7WQmFELM8JyFu6Kv_b+KDe3XFqJE6shfSAfyQ@xxxxxxxxxxxxxx> <20350.9643.379841.771496@xxxxxxxxxxxxxxxxxx> <20350.13616.901974.523140@xxxxxxxxxxxxxxxxxx> <CAAxjCEzkemiYin4KYZX62Ei6QLUFbgZESdwS8krBy0dSqOn6aA@xxxxxxxxxxxxxx> <4F7F7C25.8040605@xxxxxxxxxxxxxxxxx> <CAAxjCEyJW1b4dbKctbrgdWjykQt8Hb4Sw1RKdys3oUsehNHCcQ@xxxxxxxxxxxxxx> <4F8055E4.1000808@xxxxxxxxxxxxxxxxx> <CAAxjCEz8TpRvjvbuYPp1xf9X2HwskN5AuPak62R5Jhkg+mmFHA@xxxxxxxxxxxxxx>
Reply-to: stan@xxxxxxxxxxxxxxxxx
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20120327 Thunderbird/11.0.1
On 4/9/2012 6:02 AM, Stefan Ring wrote:
>> Not at all.  You can achieve this performance with the 6 300GB spindles
>> you currently have, as Christoph and I both mentioned.  You simply lose
>> one spindle of capacity, 300GB, vs your current RAID6 setup.  Make 3
>> RAID1 pairs in the p400 and concatenate them.  If the p400 can't do this
>> concat the mirror pair devices with md --linear.  Format the resulting
>> Linux block device with the following and mount with inode64.
>> $ mkfs.xfs -d agcount=3 /dev/[device]
>> That will give you 1 AG per spindle, 3 horizontal AGs total instead of 4
>> vertical AGs as you get with default striping setup.  This is optimal
>> for your high IOPS workload as it eliminates all 'extraneous' seeks
>> yielding a per disk access pattern nearly identical to EXT4.  And it
>> will almost certainly outrun EXT4 on your RAID6 due mostly to the
>> eliminated seeks, but also to elimination of parity calculations.
>> You've wiped the array a few times in your testing already right, so one
>> or two more test setups should be no sweat.  Give it a go.  The results
>> will be pleasantly surprising.
> Well I had to move around quite a bit of data, but for the sake of
> completeness, I had to give it a try.
> With a nice and tidy fresh XFS file system, performance is indeed
> impressive – about 16 sec for the same task that would take 2 min 25
> before. So that’s about 150 MB/sec, which is not great, but for many
> tiny files it would perhaps be a bit unreasonable to expect more. A

150MB/s isn't correct.  Should be closer to 450MB/s.  This makes it
appear that you're writing all these files to a single directory.  If
you're writing them fairly evenly to 3 directories or a multiple of 3,
you should see close to 450MB/s, if using mdraid linear over 3 P400
RAID1 pairs.  If this is what you're doing then something seems wrong
somewhere.  Try unpacking a kernel tarball.  Lots of subdirectories to
exercise all 3 AGs thus all 3 spindles.

> simple copy of the tar onto the XFS file system yields the same linear
> performance, the same as with ext4, btw. So 150 MB/sec seems to be the
> best these disks can do, meaning that theoretically, with 3 AGs, it
> should be able to reach 450 MB/sec under optimal conditions.

The optimal condition, again, requires writing 3 of this file to 3
directories to hit ~450MB/s, which you should get close to if using
mdraid linear over RAID1 pairs.  XFS is a filesystem after all, so it's
parallelism must come from manipulating usage of filesystem structures.
 I thought I explained all of this previously when I introduced the "XFS
concat" into this thread.

> I will still do a test with the free space fragmentation priming on
> the concatenated AG=3 volume, because it seems to be rather slow as
> well.

> But then I guess I’m back to ext4 land. XFS just doesn’t offer enough
> benefits in this case to justify the hassle.

If you were writing to only one directory I can understand this
sentiment.  Again, if you were writing 3 directories fairly evenly, with
the md concat, then your sentiment here should be quite different.


<Prev in Thread] Current Thread [Next in Thread>