188.8.131.52 kernel bug in XFS or megaraid driver with heavy I/O load
jesper at krogh.cc
Tue Oct 11 11:07:40 CDT 2011
On 2011-10-11 16:13, Anders Ossowicki wrote:
> On Tue, Oct 11, 2011 at 03:34:48PM +0200, Christoph Hellwig wrote:
>> This is core VM code, and operates purely on on-stack variables except
>> for the page cache radix tree nodes / pages. So this either could be a
>> core VM bug that no one has noticed yet, or memory corruption. Can you
>> run memtest86 on the box?
> Unfortunately not, as it is a production server. Pulling it out to memtest 256G
> properly would take too long. But it seems unlikely to me that it should be
> memory corruption. The machine has been running with the same (ecc) memory for
> more than a year and neither the service processor nor the kernel (according to
> dmesg) has caught anything before this. It would be a rare (though I admit not
> impossible) coincidence if we got catastrophic, undetected memory corruption a
> week after attaching a new raid controller with a new disk array.
A sidenote that Anders forgot.. the system was stable for very long time,
but on a 2.6.37 kernel. We upgraded to 2.6.38 to get the raid-controller
support and then it crashed.
Now we're trying to get the new hardware in the air on 2.6.37 with
megaraid driver for the RAID-controller.
More information about the xfs