xfs
[Top] [All Lists]

Re: xfs_repair hung...safe to terminate?

To: Net Llama! <netllama@xxxxxxxxxxxxx>
Subject: Re: xfs_repair hung...safe to terminate?
From: Jon Lewis <jlewis@xxxxxxxxx>
Date: Thu, 16 Jun 2005 10:53:49 -0400 (EDT)
Cc: linux-xfs <linux-xfs@xxxxxxxxxxx>
In-reply-to: <Pine.LNX.4.58.0506160835510.13228@xxxxxxxxxxxxxxxxxx>
References: <Pine.LNX.4.58.0506152011310.21993@xxxxxxxxxxxxxxx> <Pine.LNX.4.58.0506160950230.21993@xxxxxxxxxxxxxxx> <Pine.LNX.4.58.0506160835510.13228@xxxxxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
It's an old server.  I'm willing to try (have already compiled) 2.4.31
(unmodified source from kernel.org), but from some searches of the archive
and SGI's bugzilla, this looks like a long standing unresolved bug
affecting xfs in both 2.4 and 2.6 kernels.  At this point, I'm mostly
curious if anyone has any ideas for solutions/workarounds, or if it's time
to simply abandon use of XFS.

http://oss.sgi.com/bugzilla/show_bug.cgi?id=375
http://oss.sgi.com/bugzilla/show_bug.cgi?id=272

On Thu, 16 Jun 2005, Net Llama! wrote:

> You realize you're using a truly ancient kernel, right?  Are you at least
> using a relatively current version of xfs-progs?
>
> On Thu, 16 Jun 2005, Jon Lewis wrote:
>
> > This xfs_repair did eventually finish after spinning in Phase 5 for a
> > couple hours.  Unfortunately, it doesn't appear to have done us any good,
> > and the system is still doing
> >
> > xfs_force_shutdown(md(9,2),0x8) called from line 1071 of file
> > xfs_trans.c.  Return address = 0xf8dbe6eb
> > Filesystem "md(9,2)": Corruption of in-memory data detected.  Shutting down
> > filesystem: md(9,2)
> > Please umount the filesystem, and rectify the problem(s)
> >
> > pretty frequently.  I've tried running a non-SMP kernel, but that didn't
> > help.  Next, I decreased the read/write block sizes for the NFS clients
> > mounting this xfs.  I don't know yet if that makes any difference.
> >
> > On Wed, 15 Jun 2005, Jon Lewis wrote:
> >
> > > After having a system crash twice today with messages like (from the first
> > > crash):
> > >
> > > xfs_iget_core: ambiguous vns: vp/0xc6f0e680, invp/0xecbed200
> > > ------------[ cut here ]------------
> > > kernel BUG at debug.c:106!
> > > invalid operand: 0000
> > > nfsd lockd sunrpc autofs eepro100 mii ipt_REJECT iptable_filter ip_tables
> > > xfs raid5 xor ext3 jbd raid1 isp_mod sd_mod scsi_mod
> > > CPU:    1
> > > EIP:    0010:[<f8dbf16e>]    Not tainted
> > > EFLAGS: 00010246
> > >
> > > EIP is at cmn_err [xfs] 0x9e (2.4.20-35_39.rh8.0.atsmp)
> > > eax: 00000000   ebx: 00000000   ecx: 00000096   edx: 00000001
> > > esi: f8dd9412   edi: f8dec63e   ebp: 00000293   esp: f5d2bd44
> > > ds: 0018   es: 0018   ss: 0018
> > > Process nfsd (pid: 661, stackpage=f5d2b000)
> > > Stack: f8dd9412 f8dd93e8 f8dec600 ecbed220 7b1f202d 00000000 e4cca100
> > > f8d8aeac
> > >        00000000 f8dda160 c6f0e680 ecbed200 f65d0c00 7b1f202d f7bfcc38
> > > c62aea90
> > >        f65d0924 00000000 00000003 c62aea8c 00000000 00000000 e4cca100
> > > ecbed220
> > > Call Trace:   [<f8dd9412>] .rodata.str1.1 [xfs] 0x11c2 (0xf5d2bd44))
> > > [<f8dd93e8>] .rodata.str1.1 [xfs] 0x1198 (0xf5d2bd48))
> > > [<f8dec600>] message [xfs] 0x0 (0xf5d2bd4c))
> > > [<f8d8aeac>] xfs_iget_core [xfs] 0x45c (0xf5d2bd60))
> > > [<f8dda160>] .rodata.str1.32 [xfs] 0x5a0 (0xf5d2bd68))
> > > [<f8d8b0c3>] xfs_iget [xfs] 0x143 (0xf5d2bdb0))
> > > [<f8da8247>] xfs_vget [xfs] 0x77 (0xf5d2bdf0))
> > > [<f8dbe563>] vfs_vget [xfs] 0x43 (0xf5d2be20))
> > > [<f8dbdc9d>] linvfs_fh_to_dentry [xfs] 0x5d (0xf5d2be30))
> > > [<f8e3a8c6>] nfsd_get_dentry [nfsd] 0xb6 (0xf5d2be5c))
> > > [<f8e3ad17>] find_fh_dentry [nfsd] 0x57 (0xf5d2be80))
> > > [<f8e3b1b9>] fh_verify [nfsd] 0x189 (0xf5d2beb0))
> > > [<f8e19616>] svc_sock_enqueue [sunrpc] 0x1b6 (0xf5d2befc))
> > > [<f8e42bdf>] nfsd3_proc_getattr [nfsd] 0x6f (0xf5d2bf10))
> > > [<f8e44a93>] nfs3svc_decode_fhandle [nfsd] 0x33 (0xf5d2bf28))
> > > [<f8e4b384>] nfsd_procedures3 [nfsd] 0x24 (0xf5d2bf3c))
> > > [<f8e3863e>] nfsd_dispatch [nfsd] 0xce (0xf5d2bf48))
> > > [<f8e4ac98>] nfsd_version3 [nfsd] 0x0 (0xf5d2bf5c))
> > > [<f8e38570>] nfsd_dispatch [nfsd] 0x0 (0xf5d2bf60))
> > > [<f8e1927f>] svc_process_Rsmp_9d8bc81a [sunrpc] 0x45f (0xf5d2bf64))
> > > [<f8e4b384>] nfsd_procedures3 [nfsd] 0x24 (0xf5d2bf84))
> > > [<f8e4acb8>] nfsd_program [nfsd] 0x0 (0xf5d2bf88))
> > > [<f8e38404>] nfsd [nfsd] 0x224 (0xf5d2bfa4))
> > > [<c010758e>] arch_kernel_thread [kernel] 0x2e (0xf5d2bff0))
> > > [<f8e381e0>] nfsd [nfsd] 0x0 (0xf5d2bff8))
> > >
> > >
> > > Code: 0f 0b 6a 00 08 94 dd f8 83 c4 0c 5b 5e 5f 5d c3 89 f6 55 b8
> > >  <5>xfs_force_shutdown(md(9,2),0x8) called from line 1071 of file
> > > xfs_trans.c.  Return address = 0xf8dbe6eb
> > > Filesystem "md(9,2)": Corruption of in-memory data detected.  Shutting
> > > down
> > > filesystem: md(9,2)
> > > Please umount the filesystem, and rectify the problem(s)
> > >
> > > I figured it'd be a good idea to xfs_repair it.  That was a little more
> > > than 4 hours ago.  The fs is an software RAID5:
> > > md2 : active raid5 sdn2[13] sdg2[12] sdm2[11] sdl2[10] sdk2[9] sdj2[8]
> > > sdi2[7] sdh2[6] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] sda2[0]
> > >       385414656 blocks level 5, 64k chunk, algorithm 2 [12/12]
> > > [UUUUUUUUUUUU]
> > > md0 : active raid1 sdn1[1] sdg1[0]
> > >       803136 blocks [2/2] [UU]
> > >
> > > xfs_repair [version 2.6.9] has gotten to:
> > >
> > > Phase 5 - rebuild AG headers and trees...
> > >
> > > and seems to have stopped progressing.
> > >
> > > root       798 91.8  1.0 45080 41576 pts/1   R    15:57 242:04 xfs_repair 
> > > -l /dev/md0 /dev/md2
> > >
> > > Its still using lots of CPU, but there is no disk activity.  Further
> > > searching suggests this might be a kernel issue and not an actual fs
> > > corruption issue.  I'd like to upgrade from 2.4.20-35_39.rh8.0.atsmp to
> > > 2.4.20-43_41.rh8.0.atsmp, but the question is, is it safe to stop (kill)
> > > xfs_repair?  Will the fs be mountable if I interrupt xfs_repair at this
> > > point?
> >
> > ----------------------------------------------------------------------
> >  Jon Lewis                   |  I route
> >  Senior Network Engineer     |  therefore you are
> >  Atlantic Net                |
> > _________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
> >
> >
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Lonni J Friedman                        netllama@xxxxxxxxxxxxx
> LlamaLand                             http://netllama.linux-sxs.org
>

----------------------------------------------------------------------
 Jon Lewis                   |  I route
 Senior Network Engineer     |  therefore you are
 Atlantic Net                |
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________


<Prev in Thread] Current Thread [Next in Thread>