On Wed, Nov 24, 2010 at 01:58:25AM +0100, Spelic wrote:
> On 11/23/2010 09:46 PM, Dave Chinner wrote:
> >Hmmmm. We get plenty of reports about problems with 3ware RAID
> >controllers, many of which are RAID controller problems. Can you
> >make sure you are running the latest firmware on the controller?
> No, sorry, my firmware is: FE9X 4.06.00.004
> But when controllers hang, there is usually something in dmesg, and
> in my case there wasn't. Then after a while it resets (it has
> something like a watchdog in it).
> In the past during testing I did have reproducible hangups on high
> load with these controllers (seemed like a lost interrupt), but they
> were fixed by disabling NCQ.
> The controller would reset in those cases, drives caches would reset
> to "off", and there were entries in dmesg.
> But that issue was definitely fixed by disabling NCQ: I tested many
> times with and without NCQ with reproducible results; and after that
> we had reliable operation for more than 1 year on that machine.
> >I've been unable to reproduce the problem with your test case (been
> >running over night) on a 12-disk, 16TB dm RAID0 array, but I'll keep
> >trying to reproduce it for a while.
> It seems to me that 12 disk raid0 dm is quite different from 16 disk
> md raid5 array because you don't have the stripe cache and there are
> likely to be fewer in-flight operations, if it was a pool of
> something which was drained you might not hit it...
But if that is the cause, then it would indicate an MD problem
rather than an XFS problem. Testing on a similar but slightly
different configuration helps isolate where the problem may lie.
As it is, it is probably caused by the same bug as the problem Nick
reported. See my last post in the thread "XFS performance oddity"
for the patch that fixes the hang Nick reported.