xfs
[Top] [All Lists]

Re: xfs_repair hung...safe to terminate?

To: Jon Lewis <jlewis@xxxxxxxxx>
Subject: Re: xfs_repair hung...safe to terminate?
From: Net Llama! <netllama@xxxxxxxxxxxxx>
Date: Thu, 16 Jun 2005 08:36:16 -0500 (EST)
Cc: linux-xfs <linux-xfs@xxxxxxxxxxx>
In-reply-to: <Pine.LNX.4.58.0506160950230.21993@web1.mmaero.com>
References: <Pine.LNX.4.58.0506152011310.21993@web1.mmaero.com> <Pine.LNX.4.58.0506160950230.21993@web1.mmaero.com>
Sender: linux-xfs-bounce@xxxxxxxxxxx
You realize you're using a truly ancient kernel, right?  Are you at least
using a relatively current version of xfs-progs?

On Thu, 16 Jun 2005, Jon Lewis wrote:

> This xfs_repair did eventually finish after spinning in Phase 5 for a
> couple hours.  Unfortunately, it doesn't appear to have done us any good,
> and the system is still doing
>
> xfs_force_shutdown(md(9,2),0x8) called from line 1071 of file
> xfs_trans.c.  Return address = 0xf8dbe6eb
> Filesystem "md(9,2)": Corruption of in-memory data detected.  Shutting down
> filesystem: md(9,2)
> Please umount the filesystem, and rectify the problem(s)
>
> pretty frequently.  I've tried running a non-SMP kernel, but that didn't
> help.  Next, I decreased the read/write block sizes for the NFS clients
> mounting this xfs.  I don't know yet if that makes any difference.
>
> On Wed, 15 Jun 2005, Jon Lewis wrote:
>
> > After having a system crash twice today with messages like (from the first
> > crash):
> >
> > xfs_iget_core: ambiguous vns: vp/0xc6f0e680, invp/0xecbed200
> > ------------[ cut here ]------------
> > kernel BUG at debug.c:106!
> > invalid operand: 0000
> > nfsd lockd sunrpc autofs eepro100 mii ipt_REJECT iptable_filter ip_tables
> > xfs raid5 xor ext3 jbd raid1 isp_mod sd_mod scsi_mod
> > CPU:    1
> > EIP:    0010:[<f8dbf16e>]    Not tainted
> > EFLAGS: 00010246
> >
> > EIP is at cmn_err [xfs] 0x9e (2.4.20-35_39.rh8.0.atsmp)
> > eax: 00000000   ebx: 00000000   ecx: 00000096   edx: 00000001
> > esi: f8dd9412   edi: f8dec63e   ebp: 00000293   esp: f5d2bd44
> > ds: 0018   es: 0018   ss: 0018
> > Process nfsd (pid: 661, stackpage=f5d2b000)
> > Stack: f8dd9412 f8dd93e8 f8dec600 ecbed220 7b1f202d 00000000 e4cca100
> > f8d8aeac
> >        00000000 f8dda160 c6f0e680 ecbed200 f65d0c00 7b1f202d f7bfcc38
> > c62aea90
> >        f65d0924 00000000 00000003 c62aea8c 00000000 00000000 e4cca100
> > ecbed220
> > Call Trace:   [<f8dd9412>] .rodata.str1.1 [xfs] 0x11c2 (0xf5d2bd44))
> > [<f8dd93e8>] .rodata.str1.1 [xfs] 0x1198 (0xf5d2bd48))
> > [<f8dec600>] message [xfs] 0x0 (0xf5d2bd4c))
> > [<f8d8aeac>] xfs_iget_core [xfs] 0x45c (0xf5d2bd60))
> > [<f8dda160>] .rodata.str1.32 [xfs] 0x5a0 (0xf5d2bd68))
> > [<f8d8b0c3>] xfs_iget [xfs] 0x143 (0xf5d2bdb0))
> > [<f8da8247>] xfs_vget [xfs] 0x77 (0xf5d2bdf0))
> > [<f8dbe563>] vfs_vget [xfs] 0x43 (0xf5d2be20))
> > [<f8dbdc9d>] linvfs_fh_to_dentry [xfs] 0x5d (0xf5d2be30))
> > [<f8e3a8c6>] nfsd_get_dentry [nfsd] 0xb6 (0xf5d2be5c))
> > [<f8e3ad17>] find_fh_dentry [nfsd] 0x57 (0xf5d2be80))
> > [<f8e3b1b9>] fh_verify [nfsd] 0x189 (0xf5d2beb0))
> > [<f8e19616>] svc_sock_enqueue [sunrpc] 0x1b6 (0xf5d2befc))
> > [<f8e42bdf>] nfsd3_proc_getattr [nfsd] 0x6f (0xf5d2bf10))
> > [<f8e44a93>] nfs3svc_decode_fhandle [nfsd] 0x33 (0xf5d2bf28))
> > [<f8e4b384>] nfsd_procedures3 [nfsd] 0x24 (0xf5d2bf3c))
> > [<f8e3863e>] nfsd_dispatch [nfsd] 0xce (0xf5d2bf48))
> > [<f8e4ac98>] nfsd_version3 [nfsd] 0x0 (0xf5d2bf5c))
> > [<f8e38570>] nfsd_dispatch [nfsd] 0x0 (0xf5d2bf60))
> > [<f8e1927f>] svc_process_Rsmp_9d8bc81a [sunrpc] 0x45f (0xf5d2bf64))
> > [<f8e4b384>] nfsd_procedures3 [nfsd] 0x24 (0xf5d2bf84))
> > [<f8e4acb8>] nfsd_program [nfsd] 0x0 (0xf5d2bf88))
> > [<f8e38404>] nfsd [nfsd] 0x224 (0xf5d2bfa4))
> > [<c010758e>] arch_kernel_thread [kernel] 0x2e (0xf5d2bff0))
> > [<f8e381e0>] nfsd [nfsd] 0x0 (0xf5d2bff8))
> >
> >
> > Code: 0f 0b 6a 00 08 94 dd f8 83 c4 0c 5b 5e 5f 5d c3 89 f6 55 b8
> >  <5>xfs_force_shutdown(md(9,2),0x8) called from line 1071 of file
> > xfs_trans.c.  Return address = 0xf8dbe6eb
> > Filesystem "md(9,2)": Corruption of in-memory data detected.  Shutting
> > down
> > filesystem: md(9,2)
> > Please umount the filesystem, and rectify the problem(s)
> >
> > I figured it'd be a good idea to xfs_repair it.  That was a little more
> > than 4 hours ago.  The fs is an software RAID5:
> > md2 : active raid5 sdn2[13] sdg2[12] sdm2[11] sdl2[10] sdk2[9] sdj2[8]
> > sdi2[7] sdh2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
> >       385414656 blocks level 5, 64k chunk, algorithm 2 [12/12]
> > [UUUUUUUUUUUU]
> > md0 : active raid1 sdn1[1] sdg1[0]
> >       803136 blocks [2/2] [UU]
> >
> > xfs_repair [version 2.6.9] has gotten to:
> >
> > Phase 5 - rebuild AG headers and trees...
> >
> > and seems to have stopped progressing.
> >
> > root       798 91.8  1.0 45080 41576 pts/1   R    15:57 242:04 xfs_repair 
> > -l /dev/md0 /dev/md2
> >
> > Its still using lots of CPU, but there is no disk activity.  Further
> > searching suggests this might be a kernel issue and not an actual fs
> > corruption issue.  I'd like to upgrade from 2.4.20-35_39.rh8.0.atsmp to
> > 2.4.20-43_41.rh8.0.atsmp, but the question is, is it safe to stop (kill)
> > xfs_repair?  Will the fs be mountable if I interrupt xfs_repair at this
> > point?
>
> ----------------------------------------------------------------------
>  Jon Lewis                   |  I route
>  Senior Network Engineer     |  therefore you are
>  Atlantic Net                |
> _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
>
>

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Lonni J Friedman                        netllama@xxxxxxxxxxxxx
LlamaLand                               http://netllama.linux-sxs.org


<Prev in Thread] Current Thread [Next in Thread>