xfs
[Top] [All Lists]

Re: Is it possible the check an frozen XFS filesytem to avoid downtime

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: Is it possible the check an frozen XFS filesytem to avoid downtime
From: Martin Steigerwald <ms@xxxxxxxxx>
Date: Mon, 27 Oct 2008 17:57:09 +0100
Cc: Timothy Shimmin <tes@xxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <487CC1EB.6030100@xxxxxxxxxxx>
Organization: team(ix) GmbH
References: <200807141542.51613.ms@xxxxxxxxx> <200807150944.13277.ms@xxxxxxxxx> <487CC1EB.6030100@xxxxxxxxxxx>
User-agent: KMail/1.9.9
Am Dienstag, 15. Juli 2008 schrieb Eric Sandeen:
> Martin Steigerwald wrote:
> > Okay... we recommended the customer to do it the safe way unmounting the
> > filesystem completely. He did and the filesystem appear to be intact
> > *phew*. XFS appeared to detect the in memory corruption early enough.
> >
> > Its a bit strange however, cause we now know that the server sports ECC
> > RAM. Well we will see what memtest86+ has to say about it.
>
> in-memory corruption could mean, but certainly does not absolutely mean,
> problematic memory.  It could be, and usually is, a plain ol' bug (in
> xfs or elsewhere).

Ok, just as a follow up:

Now we got similar XFS errors on the second backend server, this time on a 
local hardware RAID1 while on the first backend server it was on logical 
volumes on a soft RAID spread over two dislocated external hardware RAID 
boxes.

So this appears to be an XFS bug to me. Maybe when running for long time it 
corrupts its in-memory structures. Fortunately we did not see errors in 
on-disk structures.

A colleague did a kernel update on the inactive backend 1 server from 2.6.21 
to 2.6.26 kernel from backports.org, tommorow backend 2 will follow. Let's 
see whether that solves the issue.

Anyway it seems to be a hard to trigger bug and before bugging you with 
something in kernel 2.6.21, we at least update to the latest backports.org 
kernel.

-- 
Martin Steigerwald - team(ix) GmbH - http://www.teamix.de
gpg: 19E3 8D42 896F D004 08AC A0CA 1E10 C593 0399 AE90

<Prev in Thread] Current Thread [Next in Thread>