On 4/7/2012 12:10 PM, Joe Landman wrote:
> On 04/07/2012 12:50 PM, Peter Grandi wrote:
>> * Your storage layer does not seem to deliver parallel
>> operations: as the ~100MB/s overall 'ext4' speed and the
>> seek graphs show, in effect your 4+2 RAID6 performs in this
>> case as if it were a single drive with a single arm.
> This is what lept out at me. I retried a very similar test (pulled
> Icedtea 2.1, compiled it, tarred it, measured untar on our boxen). I
> was getting a fairly consistent 4 +/- delta seconds.
That's an interesting point. I guess I'd chalked the low throughput up
to high seeks.
> 100MB/s on some supposedly fast drives with a RAID card indicates that
> either the RAID is badly implemented, the RAID layout is suspect, or
> similar. He should be getting closer to N(data disks) * BW(single disk)
> for something "close" to a streaming operation.
Reading this thread seems to indicate you're onto something Joe:
Add this to the mix:
"The HP Smart Array P400 is HP's first PCI-Express (PCIe) serial
attached SCSI (SAS) RAID controller"
First gen products aren't always duds, but the likelihood is often much
higher. Everyone posting to that forum is getting low throughput, and
most of them are testing streaming reads/writes, not massively random IO
as is Stefan's case.
> This isn't suggesting that he didn't hit some bug which happens to over
> specify use of ag=0, but he definitely had a weak RAID system (at best).
> If he retries with a more capable system, or one with a saner RAID
> layout (16k chunk size? For spinning rust? Seriously? Short stroking
> DB layout?), an agcount of 32 or higher, and still sees similar issues,
> then I'd be more suspicious of a bug.
Or merely a weak/old product. The P400 was an entry level RAID HBA,
HP's first PCIe/SAS RAID card. It was discontinued quite some time ago.
The use of DDR2/533 memory indicates it's design stage started probably
somewhere around 2004, 8 years ago.
Now that I've researched the P400, and assuming Stefan currently has the
card firmware optimally configured, I'd bet this workload is simply
overwhelming the RAID ASIC. To confirm this, simply configure each
drive as a RAID0 array, so all 6 drives are exported as block devices.
Configure them as an md RAID6 and test the workload. Be sure to change
the Linux elevator to noop first since you're using hardware write cache:
$ echo deadline > /sys/block/sd[a-e]/queue/scheduler
Execute this 6 times, once for each of the 6 drives, changing the device
name each time, obviously. This is not a persistent change.
The gap between EXT4 and XFS will likely still exist, but overall
numbers should jump substantially Northward, if the problem is indeed a
slow RAID ASIC.