xfs
[Top] [All Lists]

Re: Is it possible the check an frozen XFS filesytem to avoid downtime

To: Timothy Shimmin <tes@xxxxxxx>
Subject: Re: Is it possible the check an frozen XFS filesytem to avoid downtime
From: Martin Steigerwald <ms@xxxxxxxxx>
Date: Tue, 15 Jul 2008 09:44:12 +0200
Cc: xfs@xxxxxxxxxxx
In-reply-to: <487C1BAF.2030404@sgi.com>
Organization: team(ix) GmbH
References: <200807141542.51613.ms@teamix.de> <487C1BAF.2030404@sgi.com>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: KMail/1.9.9
Am Dienstag, 15. Juli 2008 05:38:23 schrieb Timothy Shimmin:
> Hi Martin,

Hi Tim,

> Martin Steigerwald wrote:
> > Hi!
> >
> > We seen in-memory corruption on two XFS filesystem on a server heartbeat
> > cluster of one of our customers:
> >
> >
> > XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1563 of file
> > fs/xfs/xfs_alloc.c.  Caller 0xffffffff8824eb5d
> >
> > Call Trace:
> >  [<ffffffff8824cff3>] :xfs:xfs_free_ag_extent+0x1a6/0x6b5
> >  [<ffffffff8824eb5d>] :xfs:xfs_free_extent+0xa9/0xc9
> >  [<ffffffff88258636>] :xfs:xfs_bmap_finish+0xf0/0x169
> >  [<ffffffff88278b4c>] :xfs:xfs_itruncate_finish+0x180/0x2c1
> >  [<ffffffff8829071a>] :xfs:xfs_setattr+0x841/0xe59
> >  [<ffffffff8022e868>] sock_common_recvmsg+0x30/0x45
> >  [<ffffffff8829adc8>] :xfs:xfs_vn_setattr+0x121/0x144
> >  [<ffffffff8022a06d>] notify_change+0x156/0x2ef
> >  [<ffffffff883bf9c6>] :nfsd:nfsd_setattr+0x334/0x4b1
> >  [<ffffffff883c61d6>] :nfsd:nfsd3_proc_setattr+0xa2/0xae
> >  [<ffffffff883bb24d>] :nfsd:nfsd_dispatch+0xdd/0x19e
> >  [<ffffffff8833a10e>] :sunrpc:svc_process+0x3cb/0x6d9
> >  [<ffffffff8025b20b>] __down_read+0x12/0x9a
> >  [<ffffffff883bb816>] :nfsd:nfsd+0x192/0x2b0
> >  [<ffffffff80255f38>] child_rip+0xa/0x12
> >  [<ffffffff883bb684>] :nfsd:nfsd+0x0/0x2b0
> >  [<ffffffff80255f2e>] child_rip+0x0/0x12
> >
> > xfs_force_shutdown(dm-1,0x8) called from line 4261 of file
> > fs/xfs/xfs_bmap.c. Return address = 0xffffffff88258673
> > Filesystem "dm-1": Corruption of in-memory data detected.  Shutting down
> > filesystem: dm-1
> > Please umount the filesystem, and rectify the problem(s)
> >
> > on
> >
> > Linux version 2.6.21-1-amd64 (Debian 2.6.21-4~bpo.1)
> > (nobse@xxxxxxxxxxxxx) (gcc version 4.1.2 20061115 (prerelease) (Debian
> > 4.1.1-21)) #1 SMP Tue Jun 5 07:43:32 UTC 2007
> >
> >
> > We plan to do a takeover so that the server which appears to have memory
> > errors can be memtested.
> >
> > After the takeover we would like to make sure that the XFS filesystems
> > are intact. Is it possible to do so without taking the filesystem
> > completely offline?
> >
> > I thought about mounting read only and it might be the best choice
> > available, but then it will *fail* write accesses. I would prefer if
> > these are just stalled.
> >
> > I tried xfs_freeze -f on my laptop home directory, but then did not
> > machine to get it check via xfs_check or xfs_repair -nd... is it possible
> > at all?
> >
> > Ciao,
>
> When I last tried (and I don't think Barry has done anything to it to
> change things) it wouldn't work.
> However, I think it could/should be changed to make it work.

Okay... we recommended the customer to do it the safe way unmounting the 
filesystem completely. He did and the filesystem appear to be intact *phew*. 
XFS appeared to detect the in memory corruption early enough.

Its a bit strange however, cause we now know that the server sports ECC RAM. 
Well we will see what memtest86+ has to say about it.

> My notes from the SGI bug:
>
> 958642: running xfs_check and "xfs_repair -n" on a frozen xfs filesystem
>
> > We've been asked a few times about the possibility of running xfs_check
> > or xfs_repair -n on a frozen filesystem.
> > And a while back I looked into what some of the hinderances were.
> > And now I've forgotten ;-))
> >
> > I think there are hinderances for libxfs_init (check_open()) and
> > for having a dirty log.
> >
> > For libxfs_init, I found that I couldn't run the tools without error'ing
> > out. I think I found out that I needed the INACTIVE flag,
> > without READONLY/DANGEROUSLY, like xfs_logprint does.
> >
> > ----------------------------------------
> > Date: Thu, 19 Oct 2006 11:24:06 +1000
> > From: Timothy Shimmin <tes@xxxxxxx>
> > To: lachlan@xxxxxxx
> > cc: xfs-dev@xxxxxxx
> > Subject: Re: init.c patch
> > ------------------------------------------------------
> >   Ok, my understanding of the READONLY/DANGEROUSLY flags were wrong.
> >   I thought they were just overriding flags when you were guaranteeing
> > you were only reading and it would be more permissive,
> >   but they are for doing stuff on readonly (ro) mounts.
> >
> >   They are rather confusing to me. When you go with defaults for repair
> > and db then it doesn't set the INACTIVE flag.
> >   It means if I do _not_ want to be fatal then I need to set INACTIVE but
> > not set READONLY or DANGEROUSLY - which is what logprint does.
> >

I think that there should be different options for readonly / frozen fs 
checking and dangerous repair... since I think readonly checks are a 
different thing than repairing a mounted filesystem and hoping that the 
running XFS will not choke upon the filesystem that xfs_repair changes under 
its hood.

I expected a "-r" for read only in xfs_check and xfs_repair, well but for 
xfs_repair this option is already taken for specifying the realtime volume.

Ciao,
-- 
Martin Steigerwald - team(ix) GmbH - http://www.teamix.de
gpg: 19E3 8D42 896F D004 08AC A0CA 1E10 C593 0399 AE90


<Prev in Thread] Current Thread [Next in Thread>