[Top] [All Lists]

Re: Question regarding XFS on LVM over hardware RAID.

To: "C. Morgan Hamill" <chamill@xxxxxxxxxxxx>
Subject: Re: Question regarding XFS on LVM over hardware RAID.
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 31 Jan 2014 07:28:19 +1100
Cc: stan <stan@xxxxxxxxxxxxxxxxx>, xfs <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1391090527-sup-4664@xxxxxxxxxxxxxxx>
References: <1391005406-sup-1881@xxxxxxxxxxxxxxx> <52E91923.4070706@xxxxxxxxxxx> <1391022066-sup-5863@xxxxxxxxxxxxxxx> <52E99504.4030902@xxxxxxxxxxxxxxxxx> <1391090527-sup-4664@xxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Jan 30, 2014 at 09:28:45AM -0500, C. Morgan Hamill wrote:
> First, thanks very much for your help.  We're weening ourselves off
> unnecessarily expensive storage and as such I unfortunately haven't had
> as much experience with physical filesystems as I'd like.  I am also
> unfamiliar with XFS.  I appreciate the help immensely.
> Excerpts from Stan Hoeppner's message of 2014-01-29 18:55:48 -0500:
> > This is not correct.  You must align to either the outer stripe or the
> > inner stripe when using a nested array.  In this case it appears your
> > inner stripe is RAID6 su 128KB * sw 12 = 1536KB.  You did not state your
> > outer RAID0 stripe geometry.  Which one you align to depends entirely on
> > your workload.
> Ahh this makes sense; it had occurred to me that something like this
> might be the case.  I'm not exactly sure what you mean by inner and
> outer; I can imagine it going both ways.
> Just to clarify, it looks like this:
>      XFS     |      XFS    |     XFS      |      XFS
> ---------------------------------------------------------
>                     LVM volume group
> ---------------------------------------------------------
>                          RAID 0
> ---------------------------------------------------------
> RAID 6 (14 disks) | RAID 6 (14 disks) | RAID 6 (14 disks)
> ---------------------------------------------------------
>                     42 4TB SAS disks

So optimised for sequential IO. The time-honoured method of setting
up XFS for this if the workload is large files is to use a stripe
unit that is equal to the width of the underlying RAID6 volumes with
a stripe width of 3. That way XFS tries to align files to the start
of each RAID6 volume, and allocate in full RAID6 stripe chunks. This
mostly avoids RMW cycles for large files and sequential IO. i.e. su
= 1536k, sw = 3.

> ...more or less.
> I agree that it's quite weird, but I'll describe the workload and the
> constraints.


summary: concurrent (initially slow) sequential writes of ~4GB files.

> Now, here's the constraints, which is why I was planning on setting
> things up as above:
>   - This is a budget job, so sane things like RAID 10 are our.  RAID
>     6 or 60 are (as far as I can tell, correct me if I'm wrong) our only
>     real options here, as anything else either sacrifices too much
>     storage or is too susceptible failure from UREs.

RAID6 is fine for this.

>   - I need to expose, in the end, three-ish (two or four would be OK)
>     filesystems to the backup software, which should come fairly close
>     to minimizing the effects of the archive maintenance jobs (integrity
>     checks, mostly).  CrashPlan will spawn 2 jobs per store point, so
>     a max of 8 at any given time should be a nice balance between
>     under-utilizing and saturating the IO.

So concurrency is up to 8 files being written at a time. That's
pretty much on the money for striped RAID. Much more than this and
you end up with performance being limited by seeking on the slowest
disk in the RAID sets.

> So I had thought LVM over RAID 60 would make sense because it would give
> me the option of leaving a bit of disk unallocated and being able to
> tweak filesystem sizes a bit as time goes on.


And it allows you, in future, to add more disks and grow across them
via linear concatentation of more RAID60 luns of the same layout...

> Now that I think of it though, perhaps something like 2 or 3 RAID6
> volumes would make more sense, with XFS directly on top of them.  In
> that case I have to balance number of volumes against the loss of
> 2 parity disks, however.

Probably not worth the complexity.


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>