Am Dienstag, 15. Juli 2008 schrieb Eric Sandeen:
> Martin Steigerwald wrote:
> > Okay... we recommended the customer to do it the safe way unmounting the
> > filesystem completely. He did and the filesystem appear to be intact
> > *phew*. XFS appeared to detect the in memory corruption early enough.
> >
> > Its a bit strange however, cause we now know that the server sports ECC
> > RAM. Well we will see what memtest86+ has to say about it.
>
> in-memory corruption could mean, but certainly does not absolutely mean,
> problematic memory. It could be, and usually is, a plain ol' bug (in
> xfs or elsewhere).
Ok, just as a follow up:
Now we got similar XFS errors on the second backend server, this time on a
local hardware RAID1 while on the first backend server it was on logical
volumes on a soft RAID spread over two dislocated external hardware RAID
boxes.
So this appears to be an XFS bug to me. Maybe when running for long time it
corrupts its in-memory structures. Fortunately we did not see errors in
on-disk structures.
A colleague did a kernel update on the inactive backend 1 server from 2.6.21
to 2.6.26 kernel from backports.org, tommorow backend 2 will follow. Let's
see whether that solves the issue.
Anyway it seems to be a hard to trigger bug and before bugging you with
something in kernel 2.6.21, we at least update to the latest backports.org
kernel.
--
Martin Steigerwald - team(ix) GmbH - http://www.teamix.de
gpg: 19E3 8D42 896F D004 08AC A0CA 1E10 C593 0399 AE90
|