xfs
[Top] [All Lists]

Re: swidth in RAID

To: stan@xxxxxxxxxxxxxxxxx
Subject: Re: swidth in RAID
From: aurfalien <aurfalien@xxxxxxxxx>
Date: Sun, 30 Jun 2013 19:54:39 -0700
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=kDD9fP3XD31dRoqJUaP+BfAL0iI5db+rlUA1p0F8hyg=; b=h8BOfqdZEH+yRzJxmqHT3Fxpl38McTrpZ8T+ytalroi+wPnHF9xgOSzmjhZTEoNTO4 /JzmGEPaARQeGa4mvOo6NrlKGbsfLujxwpyB+cF4L1ynB1RFsiPt3/b9tP0yuKLhMF7p QBWjjIC5zwg91vxiR4UTC+ljtgbDIHAI3ZG082ActnlULVLbBKYJChfe+aY8xSwUm9lz vqJvWjnJzAQQpbXX5CAuMwl1x3SPLHRYrB68U+wSkPkOkPOm/Od16ISzE81mRtFxCwjk iP6onvTwBzygeRUMHzlBNkni8IhXp31qAGvTr5EqXwq8aCP18qAMDCzCq2+O6xvEmN0M V4VQ==
In-reply-to: <51D0EDC8.2090706@xxxxxxxxxxxxxxxxx>
References: <557F888F-34EA-4669-B861-C0B684DAD13D@xxxxxxxxx> <51D0A62E.2020309@xxxxxxxxxxxxxxxxx> <20130701013851.GC27780@dastard> <0DD94D98-18AA-441B-8F41-AD3AC1BCEC60@xxxxxxxxx> <20130701020939.GF27780@dastard> <51D0EDC8.2090706@xxxxxxxxxxxxxxxxx>
On Jun 30, 2013, at 7:47 PM, Stan Hoeppner wrote:

> On 6/30/2013 9:09 PM, Dave Chinner wrote:
>> On Sun, Jun 30, 2013 at 06:54:31PM -0700, aurfalien wrote:
>>> 
>>> On Jun 30, 2013, at 6:38 PM, Dave Chinner wrote:
>>> 
>>>> On Sun, Jun 30, 2013 at 04:42:06PM -0500, Stan Hoeppner wrote:
>>>>> On 6/30/2013 1:43 PM, aurfalien wrote:
>>>>> 
>>>>>> I understand swidth should = #data disks.
>>>>> 
>>>>> No.  "swidth" is a byte value specifying the number of 512 byte blocks
>>>>> in the data stripe.
>>>>> 
>>>>> "sw" is #data disks.
>>>>> 
>>>>>> And the docs say for RAID 6 of 8 disks, that means 6.
>>>>>> 
>>>>>> But parity is distributed and you actually have 8 disks/spindles working 
>>>>>> for you and a bit of parity on each.
>>>>>> 
>>>>>> So shouldn't swidth equal disks in raid when its concerning distributed 
>>>>>> parity raid?
>>>>> 
>>>>> No.  Lets try visual aids.
>>>>> 
>>>>> Set 8 coffee cups (disk drives) on a table.  Grab a bag of m&m's.
>>>>> Separate 24 blues (data) and 8 reds (parity).
>>>>> 
>>>>> Drop a blue m&m in cups 1-6 and a red into 7-8.  You just wrote one RAID
>>>>> stripe.  Now drop a blue into cups 3-8 and a red in 1-2.  Your second
>>>>> write, this time rotating two cups (drives) to the right.  Now drop
>>>>> blues into 5-2 and reds into 3-4.  You've written your third stripe,
>>>>> rotating by two cups (disks) again.
>>>>> 
>>>>> This is pretty much how RAID6 works.  Each time we wrote we dropped 8
>>>>> m&m's into 8 cups, 6 blue (data chunks) and 2 red (parity chunks).
>>>>> Every RAID stripe you write will be constructed of 6 blues and 2 reds.
>>>> 
>>>> Right, that's how they are constructed, but not all RAID distributes
>>>> parity across different disks in the array. Some are symmetric, some
>>>> are asymmetric, some rotate right, some rotate left, and some use
>>>> statistical algorithms to give an overall distribution without being
>>>> able to predict where a specific parity block might lie within a
>>>> stripe...
>>>> 
>>>> And at the other end of the scale, isochronous RAID arrays tend to
>>>> have dedicated parity disks so that data read and write behaviour is
>>>> deterministic and therefore predictable from a high level....
>>>> 
>>>> So, assuming that a RAID5/6 device has a specific data layout (be it
>>>> distributed or fixed) at the filesystem level is just a bad idea. We
>>>> simply don't know. Even if we did, the only thing we can optimise is
>>>> the thing that is common between all RAID5/6 devices - writing full
>>>> stripe widths is the most optimal method of writing to them....
>>> 
>>> Am I interpreting this to say;
>>> 
>>> 16 disks is sw=16 regardless of parity?
>> 
>> No. I'm just saying that parity layout is irrelevant to the
>> filesystem and that all we care about is sw does not include parity
>> disks.
> 
> So, here's the formula aurfalien, where #disks is the total number of
> active disks (excluding spares) in the RAID array.  In the case of
> 
> RAID5 sw = (#disks - 1)
> RAID6 sw = (#disks - 2)
> RAID10  sw = (#disks / 2) [1]
> 
> [1] If using the Linux md/RAID10 driver with one of the non-standard
> layouts such as n2 or f2, the formula may change.  This is beyond the
> scope of this discussion.  Visit the linux-raid mailing list for further

I totally got your original post with the cup o M&Ms.

Just wanted his take on it is all.

And I'm on too many mailing lists as it is :)

- aurf
<Prev in Thread] Current Thread [Next in Thread>