xfs
[Top] [All Lists]

Re: Filesystem writes on RAID5 too slow

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: Filesystem writes on RAID5 too slow
From: Martin Boutin <martboutin@xxxxxxxxx>
Date: Thu, 21 Nov 2013 04:50:51 -0500
Cc: Eric Sandeen <sandeen@xxxxxxxxxx>, "Kernel.org-Linux-RAID" <linux-raid@xxxxxxxxxxxxxxx>, xfs-oss <xfs@xxxxxxxxxxx>, "Kernel.org-Linux-EXT4" <linux-ext4@xxxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Uv90xC0AnNUl8+4ZedN4GMioVemv0T8ccGwzfWCNMZ4=; b=yfO08vGg+k7T4LVifCT0r9P2Zfe7FxxZPqbkgDHwpI+F7xlJPmqkmIi+oGMH9gYbz4 bjfa5pfLOBQi/m2UhHnjBMFQEzN76vI1lbysEoVS+DnxIza3igbw/hNT7sqTRAxHDTbb 0GtzSvOj0xd7JdJdeqmAdkk/Q5Bv5MPdia50FOFJ6pKVNmGJx22Eb91C+SVSHFjLXbHV WUBtWbDxjm1iLdA/8JxuGbN0JMxj3H2yK5H31sTA7C5yQeA/KnmGX1qom8QeD10UIoZR NDGoYQ+4H9HtAEIpsZnqK4eGoD/TKnRW3lal24LoJl094ZUxga+0QJ7tY8LBCJGsHdtm xvQQ==
In-reply-to: <20131121092606.GU11434@dastard>
References: <CACtJ3HZxp6xEjY_wOucCcqX4scNzEGuiAsovQYObJS9whtYJsQ@xxxxxxxxxxxxxx> <528A5C45.4080906@xxxxxxxxxx> <20131119005740.GY6188@dastard> <CACtJ3Ha3C7JNi5VZRnNMn+-okNheygmbj=j9AnUMvfzfZjNwug@xxxxxxxxxxxxxx> <20131121092606.GU11434@dastard>
On Thu, Nov 21, 2013 at 4:26 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Thu, Nov 21, 2013 at 04:11:41AM -0500, Martin Boutin wrote:
>> On Mon, Nov 18, 2013 at 7:57 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> > On Mon, Nov 18, 2013 at 12:28:21PM -0600, Eric Sandeen wrote:
>> >> On 11/18/13, 10:02 AM, Martin Boutin wrote:
>> >> > Dear list,
>> >> >
>> >> > I am writing about an apparent issue (or maybe it is normal, that's my
>> >> > question) regarding filesystem write speed in in a linux raid device.
>> >> > More specifically, I have linux-3.10.10 running in an Intel Haswell
>> >> > embedded system with 3 HDDs in a RAID-5 configuration.
>> >> > The hard disks have 4k physical sectors which are reported as 512
>> >> > logical size. I made sure the partitions underlying the raid device
>> >> > start at sector 2048.
>> >>
>> >> (fixed cc: to xfs list)
>> >>
>> >> > The RAID device has version 1.2 metadata and 4k (bytes) of data
>> >> > offset, therefore the data should also be 4k aligned. The raid chunk
>> >> > size is 512K.
>> >> >
>> >> > I have the md0 raid device formatted as ext3 with a 4k block size, and
>> >> > stride and stripes correctly chosen to match the raid chunk size, that
>> >> > is, stride=128,stripe-width=256.
>> >> >
>> >> > While I was working in a small university project, I just noticed that
>> >> > the write speeds when using a filesystem over raid are *much* slower
>> >> > than when writing directly to the raid device (or even compared to
>> >> > filesystem read speeds).
>> >> >
>> >> > The command line for measuring filesystem read and write speeds was:
>> >> >
>> >> > $ dd if=/tmp/diskmnt/filerd.zero of=/dev/null bs=1M count=1000 
>> >> > iflag=direct
>> >> > $ dd if=/dev/zero of=/tmp/diskmnt/filewr.zero bs=1M count=1000 
>> >> > oflag=direct
>> >> >
>> >> > The command line for measuring raw read and write speeds was:
>> >> >
>> >> > $ dd if=/dev/md0 of=/dev/null bs=1M count=1000 iflag=direct
>> >> > $ dd if=/dev/zero of=/dev/md0 bs=1M count=1000 oflag=direct
>> >> >
>> >> > Here are some speed measures using dd (an average of 20 runs).:
>> >> >
>> >> > device       raw/fs  mode   speed (MB/s)    slowdown (%)
>> >> > /dev/md0    raw    read    207
>> >> > /dev/md0    raw    write    209
>> >> > /dev/md1    raw    read    214
>> >> > /dev/md1    raw    write    212
>> >
>> > So, that's writing to the first 1GB of /dev/md0, and all the writes
>> > are going to be aligned to the MD stripe.
>> >
>> >> > /dev/md0    xfs    read    188    9
>> >> > /dev/md0    xfs    write    35    83o
>> >
>> > And these will not be written to the first 1GB of the block device
>> > but somewhere else. Most likely a region that hasn't otherwise been
>> > used, and so isn't going to be overwriting the same blocks like the
>> > /dev/md0 case is going to be. Perhaps there's some kind of stripe
>> > caching effect going on here? Was the md device fully initialised
>> > before you ran these tests?
>> >
>> >> >
>> >> > /dev/md1    ext3    read    199    7
>> >> > /dev/md1    ext3    write    36    83
>> >> >
>> >> > /dev/md0    ufs    read    212    0
>> >> > /dev/md0    ufs    write    53    75
>> >> >
>> >> > /dev/md0    ext2    read    202    2
>> >> > /dev/md0    ext2    write    34    84
>> >
>> > I suspect what you are seeing here is either the latency introduced
>> > by having to allocate blocks before issuing the IO, or the file
>> > layout due to allocation is not idea. Single threaded direct IO is
>> > latency bound, not bandwidth bound and, as such, is IO size
>> > sensitive. Allocation for direct IO is also IO size sensitive -
>> > there's typically an allocation per IO, so the more IO you have to
>> > do, the more allocation that occurs.
>>
>> I just did a few more tests, this time with ext4:
>>
>> device       raw/fs  mode   speed (MB/s)    slowdown (%)
>> /dev/md0    ext4    read    199    4%
>> /dev/md0    ext4    write    210    0%
>>
>> This time, no slowdown at all on ext4. I believe this is due to the
>> multiblock allocation feature of ext4 (I'm using O_DIRECT, so it
>> should be it). So I guess for the other filesystems, it was indeed
>> the latency introduced by block allocation.
>
> Except that XFS does extent based allocation as well, so that's not
> likely the reason. The fact that ext4 doesn't see a slowdown like
> every other filesystem really doesn't make a lot of sense to
> me, either from an IO dispatch point of view or an IO alignment
> point of view.
>
> Why? Because all the filesystems align identically to the underlying
> device and all should be doing 4k block aligned IO, and XFS has
> roughly the same allocation overhead for this workload as ext4.
> Did you retest XFS or any of the other filesystems directly after
> running the ext4 tests (i.e. confirm you are testing apples to
> apples)?

Yes I did, the performance figures did not change for either XFS or ext3.
>
> What we need to determine why other filesystems are slow (and why
> ext4 is fast) is more information about your configuration and block
> traces showing what is happening at the IO level, like was requested
> in a previous email....

Ok, I'm going to try coming up with meaningful data. Thanks.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx



-- 
Martin Boutin

<Prev in Thread] Current Thread [Next in Thread>