[Top] [All Lists]

Re: relationship of nested stripe sizes, was: Question regarding XFS on

To: "stan@xxxxxxxxxxxxxxxxx Hoeppner" <stan@xxxxxxxxxxxxxxxxx>
Subject: Re: relationship of nested stripe sizes, was: Question regarding XFS on LVM over hardware RAID.
From: Chris Murphy <lists@xxxxxxxxxxxxxxxxx>
Date: Sun, 2 Feb 2014 11:09:11 -0700
Cc: xfs <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <52ED6AAF.6030703@xxxxxxxxxxxxxxxxx>
References: <7A732267-B34F-4286-9B49-3AF8767C0B89@xxxxxxxxxxxxxxxxx> <52ED4143.6090303@xxxxxxxxxxxxxxxxx> <EDBD7355-F1EC-4773-9138-CA864CB2E84B@xxxxxxxxxxxxxxxxx> <52ED6AAF.6030703@xxxxxxxxxxxxxxxxx>
On Feb 1, 2014, at 2:44 PM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote:

> On 2/1/2014 2:55 PM, Chris Murphy wrote:
>> On Feb 1, 2014, at 11:47 AM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>
>> wrote:
>>> On 1/31/2014 12:35 AM, Chris Murphy wrote:
>>>> Hopefully this is an acceptable way to avoid thread jacking, by 
>>>> renaming the  subject…
>>>> On Jan 30, 2014, at 10:58 PM, Stan Hoeppner
>>>> <stan@xxxxxxxxxxxxxxxxx> wrote:
>>>>> RAID60 is a nested RAID level just like RAID10 and RAID50.  It
>>>>> is a stripe, or RAID0, across multiple primary array types,
>>>>> RAID6 in this case.  The stripe width of each 'inner' RAID6
>>>>> becomes the stripe unit of the 'outer' RAID0 array:
>>>>> RAID6 geometry     128KB * 12 = 1536KB RAID0 geometry  1536KB * 3
>>>>> = 4608KB
>>>> My question is on this particular point. If this were hardware
>>>> raid6, but I wanted to then stripe using md raid0, using the
>>>> numbers above would I choose a raid0 chunk size of 1536KB? How
>>>> critical is this value for, e.g. only large streaming read/write
>>>> workloads? If it were smaller, say 256KB or even 32KB, would
>>>> there be a significant performance consequence?
>>> You say 'if it were smaller...256/32KB'.  What is "it"
>>> referencing?
>> it = chunk size for md raid0.
>> So chunk size 128KB * 12 disks, hardware raid6. Chunk size 32KB [1]
>> striping the raid6's with md raid0.
> Frankly, I don't know whether you're pulling my chain, or really don't
> understand the concept of nested striping.  I'll assume the latter.

The former would be inappropriate, and the latter is more plausible anyway, so 
this is the better assumption.

> When nesting stripes, the chunk size of the outer stripe is -always-
> equal to the stripe width of each inner striped array, as I clearly
> demonstrated earlier:

Except when it's hardware raid6, and software raid0, and the user doesn't know 
they need to specify the chunk size in this manner. And instead they use the 
mdadm default. What you're saying makes complete sense, but I don't think this 
is widespread knowledge or well documented anywhere that regular end users 
would know this by and large.

> 3 RAID6 arrays
> RAID6  geometry        128KB * 12 = 1536KB
> RAID60 geometry 1536KB *  3 = 4608KB
> mdadm allows you enough rope to hang yourself in this situation because
> it doesn't know the geometry of the underlying hardware arrays, and has
> no code to do sanity checking even if it did.  Thus it can't save you
> from yourself.

That's right, and this is the exact scenario I'm suggesting. Depending on 
version, mdadm has two possible default chunk sizes, either 64KB or 512KB.

How bad is the resulting performance hit? Would a 64KB chunk be equally bad as 
a 512KB chunk? Or is this only quantifiable with testing (i.e. it could be a 
negligible performance hit, or it could be huge)?

> RAID HBA and SAN controller firmware simply won't allow this.  They
> configure the RAID60 chunk size automatically equal to the RAID6 stripe
> width.  If some vendor's firmware allows one to manually enter the
> RAID60 chunk size with a value different from the RAID6 stripe width,
> stay away from that vendor.

I understand that, but the scenario and question I'm posing is for multiple 
hardware raid6's striped with md raid0. The use case are enclosures with raid6 
but not raid60, so the enclosures are striped using software raid. I'm trying 
to understand the consequence magnitude when choosing an md raid0 chunk size 
other than the correct one. Is this a 5% performance hit, or a 30% performance 

Chris Murphy

<Prev in Thread] Current Thread [Next in Thread>