[Top] [All Lists]

Re: mkfs.xfs states log stripe unit is too large

To: Ingo Jürgensmann <ij@xxxxxxxxxxxxxxxxxx>
Subject: Re: mkfs.xfs states log stripe unit is too large
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Tue, 26 Jun 2012 12:30:59 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <d71834a062ffd666ab53a4695eb643e9@xxxxxxxxxxxxxxxxxxxx>
References: <D3F781FA-CEB0-4896-9441-772A9E533354@xxxxxxxxxxxxxxxxxx> <20120623234445.GZ19223@dastard> <4FE67970.2030008@xxxxxxxxxxx> <4FE710B7.5010704@xxxxxxxxxxxxxxxxx> <d71834a062ffd666ab53a4695eb643e9@xxxxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Sun, Jun 24, 2012 at 05:03:47PM +0200, Ingo Jürgensmann wrote:
> On 2012-06-24 15:05, Stan Hoeppner wrote:
> >The the log stripe unit mismatch error is a direct result of Ingo
> >manually choosing a rather large chunk size for his two stripe
> >spindle
> >md array, yielding a 1MB stripe, and using an internal log with it.
> >Maybe there is a good reason for this, but I'm going to challenge it.
> To cite man mdadm:
>        -c, --chunk=
>               Specify chunk size of kibibytes.  The  default  when
>               creating an array is 512KB.  To ensure compatibility
>               with earlier versions, the default when Building and
>               array  with no persistent metadata is 64KB.  This is
>               only meaningful for RAID0, RAID4, RAID5, RAID6,  and
>               RAID10.
> So, actually there's a mismatch with the default of mdadm an
> mkfs.xfs. Maybe it's worthwhile to think of raising the log stripe
> maximum size to at least 512 kiB? I don't know what implications
> this could have, though...

You can't, simple as that. The maximum supported is 256k. As it is,
a default chunk size of 512k is probably harmful to most workloads -
large chunk sizes mean that just about every write will trigger a
RMW cycle in the RAID because it is pretty much impossible to issue
full stripe writes. Writeback doesn't do any alignment of IO (the
generic page cache writeback path is the problem here), so we will
lamost always be doing unaligned IO to the RAID, and there will be
little opportunity for sequential IOs to merge and form full stripe
writes (24 disks @ 512k each on RAID6 is a 11MB full stripe write).

IOWs, every time you do a small isolated write, the MD RAID volume
will do a RMW cycle, reading 11MB and writing 12MB of data to disk.
Given that most workloads are not doing lots and lots of large
sequential writes this is, IMO, a pretty bad default given typical
RAID5/6 volume configurations we see....

Without the warning, nobody would have noticed this. I think the
warning has value - even if it is just to indicate MD now uses a
bad default value for common workloads..


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>