XFS/Linux Sanity check

Stan Hoeppner stan at hardwarefreak.com
Wed May 4 01:18:16 CDT 2011


On 5/2/2011 10:18 PM, Dave Chinner wrote:

> Also, knowing how you spread out the disks in each RAID-6 group
> between controllers, trays, etc as that has important performance
> and failure implications.
>
> e.g. I'm guessing that you are taking 6 drives from each enclosure
> for each 18-drive raid-6 group, which would split the RAID-6 group
> across all three SAS controllers and enclosures. That means if you
> lose a SAS controller or enclosure you lose all RAID-6 groups at
> once which is effectively catastrophic from a recovery point of view.
> It also means that one slow controller slows down everything so load
> balancing is difficult.

Assuming Paul's SC847 SAS chassis have the standard EL1 backplanes, his 
bandwidth profile per chassis is:

24 x 6Gb/s drives on 4 x 6Gb/s host ports via 36 port LSI expander
21 x 6Gb/s drives on 4 x 6Gb/s host ports via 36 port LSI expander

Not balanced but not horribly bad.  I recommend using one LSI 9285-8E 
RAID card per SC847 chassis, one SFF8088 cable connected to the front 
backplane the other connected to the rear.  Create two 21 drive RAID6 
arrays, taking care than one array consists only of drives on the front 
backplane, the other array consisting only of drives on the rear 
backplane.  Configure the remaining 3 drives on the front backplane as 
cold spares.  Not perfect, but I think the best solution given the 
unbalanced nature of the chassis backplanes.

> Large stripes might look like a good idea, but when you get to this
> scale concatenation of high throughput LUNs provides better
> throughput because of less contention through the storage
> controllers and enclosures.

Now create an LVM or mdraid concatenated device of the 6 hardware RAID6 
LUNs.  Format the resulting device with mkfs.xfs defaults allowing XFS 
allocation groups to drive your parallelism and throughput instead of a 
big stripe, just as Dave recommends.  Each 9285-8E should be able to 
pump streaming reads at about 3.2 to 3.5GB/s, a little less than the 38 
RAID6 spindle streaming aggregate capability.  At this throughput level 
you're bumping against the PCIe 2.0 x8 one way bandwidth limit after 
encoding and error correction overhead.  So overall I think you're 
fairly well balanced now, overcoming the slight imbalance of the disk 
chassis configuration.

Assuming you're able to load balance interrupts and tune things 
optimally, and assuming the Intel chipset in the R810 is up to the task, 
the above recommended setup should be capable of 8-10GB/s throughput 
with a parallel workload.  Newegg carries both the 9285-8E and the cache 
battery unit, ~$1200 total.  So it'll run you about $18,000 for 15 units 
for 5 servers, about 3x what you spent on the 9200-8E cards, and worth 
every sweet penny.

-- 
Stan




More information about the xfs mailing list