xfs
[Top] [All Lists]

Re: mkfs options for a 16x hw raid5 and xfs (mostly large files)

To: Ralf Gross <Ralf-Lists@xxxxxxxxxxxx>
Subject: Re: mkfs options for a 16x hw raid5 and xfs (mostly large files)
From: "Bryan J. Smith" <thebs413@xxxxxxxxx>
Date: Tue, 25 Sep 2007 10:41:56 -0700 (PDT)
Cc: linux-xfs@xxxxxxxxxxx
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID; b=THqIPWRHcNlur4t9NH9Ve+WOT1u00SpO2lCTW3sPFWGkMVkMu8WbF9K7AxxsSCPZF552OIrIAv5MOppPj96nAGFBz7ymIxnnD2M9lr0MLvK4cZ995sG7NQovPsozacDEEA3rpq4ToOpL+0Lcu1wvJkwPcuM55lzS+1JQBDyNSX4=;
In-reply-to: <20070925172535.GD20499@p15145560.pureserver.info>
Reply-to: b.j.smith@xxxxxxxx
Sender: xfs-bounce@xxxxxxxxxxx
Ralf Gross <Ralf-Lists@xxxxxxxxxxxx> wrote:
> Thanks for all the details. Before I leave the office (it's getting
> dark here): I think the Overland RAID we have (48x Disk) is from
> the same manufacturer (Xyratex) that builds some devices for
NetApp.

There's a lot of cross-fabbing these days.  I was referring more to
NetApp's combined hardware-OS-volume approach, although that was
clearly a poor tangent by myself.

> Our profile is not that performance driven, thus the ~200MB/s
> read/write performace is ok. We just need cheap storage ;)

For what application?  That is the question.  I mean, sustained
software RAID-5 writes can be a PITA.  E.g., the dd example prior
doesn't even do XOR recalculation, it merely copies the existing
parity block with data.  Doing sustained software RAID-5 writes can
easily drop under 50MBps, as the PC interconnect was not designed to
stream data (programmed I/O), only direct it (Direct Memory Access).

> Still I'm wondering how other people saturate a 4 Gb FC controller
> with one single RAID 5. At least that's what I've seen in some
> benchmarks and here on the list.

Depends on the solution, the benchmark, etc...

> If dd doesn't give me more than 200MB/s, the problem could only be
> the array, the controller or the FC connection.

I think you're getting confused.

There are many factors in how dd performs.  Using an OS-managed
volume will result in non-blocking I/O, of which dd will scream. 
Especially when the OS knows it's merely just copying one block to
another, unlike the FC array, and doesn't need to recalculate the
parity block.  I know software RAID proponents like to show those
numbers, but they are beyond removed from "real world," they
literally leverage the fact that parity doesn't need to be
recalculated for the blocks moved.

You need to benchmark from your application -- e.g., clients.  If you
want "raw" disk access benchmarks, then build a software RAID volume
with a massive number of SATA channels using "dumb" SATA ASICs. 
Don't even use an intelligent hardware RAID card in JBOD mode, that
will only slow the DTR.

> Given that other setup are similar and not using different
> controllers and stripes.

Again, benchmark from your application -- e.g., clients.  Everything
else means squat.

I cannot stress this enough.  The only way I can show otherwise, is
with hardware taps (e.g., PCI-X, PCIe).  I literally couldn't explain
"well enough" to one client was only getting 60MBps and seeing only
10% CPU utilization why their software RAID was the bottleneck until
I put in a PCI-X card and showed the amount of traffic on the bus. 
And even that wasn't the system interconnect (although it should be
possible with a HTX card on an AMD solution, although the card would
probably cost 5 figures and have some limits).


-- 
Bryan J. Smith   Professional, Technical Annoyance
b.j.smith@xxxxxxxx    http://thebs413.blogspot.com
--------------------------------------------------
     Fission Power:  An Inconvenient Solution


<Prev in Thread] Current Thread [Next in Thread>