xfs
[Top] [All Lists]

Re: XFS/Linux Sanity check

To: xfs@xxxxxxxxxxx
Subject: Re: XFS/Linux Sanity check
From: Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>
Date: Wed, 04 May 2011 01:18:16 -0500
In-reply-to: <20110503031856.GA9114@dastard>
References: <BANLkTik4YjSr7-VA+f9Sh+UxvKfFKMy=+w@xxxxxxxxxxxxxx> <20110503031856.GA9114@dastard>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10
On 5/2/2011 10:18 PM, Dave Chinner wrote:

Also, knowing how you spread out the disks in each RAID-6 group
between controllers, trays, etc as that has important performance
and failure implications.

e.g. I'm guessing that you are taking 6 drives from each enclosure
for each 18-drive raid-6 group, which would split the RAID-6 group
across all three SAS controllers and enclosures. That means if you
lose a SAS controller or enclosure you lose all RAID-6 groups at
once which is effectively catastrophic from a recovery point of view.
It also means that one slow controller slows down everything so load
balancing is difficult.

Assuming Paul's SC847 SAS chassis have the standard EL1 backplanes, his bandwidth profile per chassis is:

24 x 6Gb/s drives on 4 x 6Gb/s host ports via 36 port LSI expander
21 x 6Gb/s drives on 4 x 6Gb/s host ports via 36 port LSI expander

Not balanced but not horribly bad. I recommend using one LSI 9285-8E RAID card per SC847 chassis, one SFF8088 cable connected to the front backplane the other connected to the rear. Create two 21 drive RAID6 arrays, taking care than one array consists only of drives on the front backplane, the other array consisting only of drives on the rear backplane. Configure the remaining 3 drives on the front backplane as cold spares. Not perfect, but I think the best solution given the unbalanced nature of the chassis backplanes.

Large stripes might look like a good idea, but when you get to this
scale concatenation of high throughput LUNs provides better
throughput because of less contention through the storage
controllers and enclosures.

Now create an LVM or mdraid concatenated device of the 6 hardware RAID6 LUNs. Format the resulting device with mkfs.xfs defaults allowing XFS allocation groups to drive your parallelism and throughput instead of a big stripe, just as Dave recommends. Each 9285-8E should be able to pump streaming reads at about 3.2 to 3.5GB/s, a little less than the 38 RAID6 spindle streaming aggregate capability. At this throughput level you're bumping against the PCIe 2.0 x8 one way bandwidth limit after encoding and error correction overhead. So overall I think you're fairly well balanced now, overcoming the slight imbalance of the disk chassis configuration.

Assuming you're able to load balance interrupts and tune things optimally, and assuming the Intel chipset in the R810 is up to the task, the above recommended setup should be capable of 8-10GB/s throughput with a parallel workload. Newegg carries both the 9285-8E and the cache battery unit, ~$1200 total. So it'll run you about $18,000 for 15 units for 5 servers, about 3x what you spent on the 9200-8E cards, and worth every sweet penny.

--
Stan

<Prev in Thread] Current Thread [Next in Thread>