xfs
[Top] [All Lists]

Re: Log corruption?

To: James Pearson <james-p@xxxxxxxxxxxxxxxxxx>
Subject: Re: Log corruption?
From: Eric Sandeen <sandeen@xxxxxxx>
Date: 11 Oct 2002 12:30:48 -0500
Cc: Stephen Lord <lord@xxxxxxx>, linux-xfs@xxxxxxxxxxx
In-reply-to: <3DA707BC.F6F56592@moving-picture.com>
References: <3DA181D2.B78A9C41@moving-picture.com> <1033996292.1053.32.camel@laptop.americas.sgi.com> <3DA19858.75C9E674@moving-picture.com> <3DA707BC.F6F56592@moving-picture.com>
Sender: linux-xfs-bounce@xxxxxxxxxxx
James - I think Steve previously pointed out that there was a recent fix
that may address this...  We'll get a new 1.2 prerelease spin out there
soon which will contain it.  It would probably also be fairly easy to
get you a patch for 1.1 if you'd prefer.

-Eric

On Fri, 2002-10-11 at 12:17, James Pearson wrote:
> It's just happened on one of my workstations - at bootup I get
> (2.4.18-xfs [XFS 1.1] kernel):
> 
> XFS mounting filesystem sd(8,2)
> XFS: WARNING: recovery required on readonly filesystem.
> XFS: write access will be enabled during mount.
> Starting XFS recovery on filesystem: sd(8,2) (dev: 8/2)
> xfs_inotobp: xfs_imap()  returned error 22 on sd(8,2).  Returning error.
> xfs_iunlink_remove: xfs_inotobp()  returned error 22 on sd(8,2). 
> Returning error
> xfs_inactive:: xfs_ifree() returned error = 22 on sd(8,2)
> xfs_force_shutdown(sd(8,2),0x1) called from line 1962 of file
> xfs_vnodeops.c   Return address = 0xc01cd7a2
> I/O Error Detected.  Shutting down filesystem: sd(8,2)
> Please umount the filesystem, and rectify the problem(s)
> Ending XFS recovery on filesystem: sd(8,2) (dev: 8/2)
> pivotroot: pivot_root(/sysroot,/sysroot/initrd) failed: 2
> Freeing unused kernel memory: 252k freed
> Kernel panic: No init found.  Try passing init= option to kernel
> 
> 
> If I boot off floppy/CD in rescue mode and try to mount the root
> partition by hand I get (2.4.7-10SGI_XFS_PR1BOOT kernel):
> 
> XFS mounting filesystem sd(8,17)
> Starting XFS recovery on filesystem: sd(8,17) (dev: 8/17)
> Ending XFS recovery on filesystem: sd(8,17) (dev: 8/17)
> XFS mounting filesystem sd(8,2)
> Starting XFS recovery on filesystem: sd(8,2) (dev: 8/2)
> Unable to handle kernel NULL pointer dereference at virtual address
> 00000152
>  printing eip:
> fc93faf2
> *pde = 00000000
> Oops: 0000
> CPU:    0
> EIP:    0010:[<fc93faf2>]
> EFLAGS: 00010246
> eax: 00000000   ebx: ffffffe8   ecx: c0226d84   edx: fc96e2c0
> esi: f6aa17e4   edi: f6a6ec00   ebp: 00000000   esp: f7fd58b4
> ds: 0018   es: 0018   ss: 0018
> Process mount (pid: 102, stackpage=f7fd5000)
> Stack: 41d20700 00000000 f6a6ec16 41d20700 fc94cbd0 f6a6ec00 00000000
> 41d20700 
>        00000000 00000000 f7fd5924 00000000 00000000 c21c2b60 00000000
> 00000000 
>        00000000 f6a6ed64 f6a6ed64 41d20700 00000000 c21c2b60 f7fd5924
> 0187d281 
> Call Trace: [<fc94cbd0>] [<fc94d627>] [<fc94734c>] [<fc94f061>]
> [<c0112f97>] 
>    [<fc92b270>] [<fc94dc43>] [<fc9572e6>] [<c0131522>] [<fc95745c>]
> [<fc96ebc0>] 
>    [<fc96ebc0>] [<fc95748b>] [<fc96ebc0>] [<fc969098>] [<fc96ebc0>]
> [<fc96e808>] 
>    [<c012bcfd>] [<c0122467>] [<c012bcb0>] [<c01256ee>] [<c01353c9>]
> [<c01355bb>]
>    [<fc96e808>] [<c0135d70>] [<fc96e808>] [<fc96e808>] [<c0136074>]
> [<c0135f3c>]
>    [<c0136108>] [<c0106ddb>]
> 
> Code: 66 83 bb 6a 01 00 00 00 75 10 80 a3 50 01 00 00 f7 53 e8 6b
> 
> Running xfs_repair -L 'fixes' the problem.
> 
> James Pearson
> 
> James Pearson wrote:
> > 
> > The sequence of events is:
> > 
> > Machine locks up - probably related to some Xwindows/application problem
> > (we use the Nvidia drivers)
> > 
> > Machine is reset
> > 
> > Kernel boots
> > 
> > Fails to mount the root (XFS) file system - either with an oops of some
> > error telling us the file system is corrupt etc.
> > 
> > Attempts to reset again produce same results above.
> > 
> > Booting in rescue mode, running 'xfs_repair -L' and rebooting "fixes"
> > the problem. xfs_repair finds some lost file and puts them in lost+found
> > - these are usually files from /tmp or /var/tmp.
> > 
> > This doesn't happen every time a machine locks up, but it occurs may be
> > once a week or so on one or another of our 60 or so workstations.
> > 
> > James Pearson
> > 
> > Stephen Lord wrote:
> > >
> > > On Mon, 2002-10-07 at 07:45, James Pearson wrote:
> > > > We have a number of workstations running RedHat 7.2 with a 2.4.18 XFS
> > > > 1.1 kernel - every now and then a (different) machine will crash/hang
> > > > and fail to boot with a kernel oops and/or with XFS errors when it tries
> > > > to mount the root file system.
> > > >
> > > > The fix is to boot from floppy/CD in rescue mode and run 'xfs_repair -L'
> > > > on the root partition. The root file system is them mountable and the
> > > > machine reboots OK.
> > > >
> > > > I don't have exact error messages (don't have time to write down the
> > > > exact errors, as the priority is to get the machine up and running ...)
> > > >
> > > > Is this a known problem? If it isn't, I'll attempt to get more
> > > > information when it happens again.
> > > >
> > > > James Pearson
> > > >
> > >
> > > Actually, a change just went into the cvs tree this weekend which might
> > > be related to this, there is some zeroing of part of the log which is
> > > always supposed to happen during mount. For a readonly mount this was
> > > not happening - and the root is mounted this way. Should the machine
> > > be shutdown and rebooted very shortly after this there is a possibility
> > > of the second mount getting confused by the log contents.
> > >
> > > Is there any way this could be what is happening? Is this happening
> > > on the second of two boots which are close together?
> > >
> > > Currently there is no way to get this code except from a cvs kernel,
> > > we just put out some images of the first alpha of xfs 1.2, the next
> > > spin of these should include this fix (hint hint Eric).
> > >
> > > Steve
> 
-- 
Eric Sandeen      XFS for Linux     http://oss.sgi.com/projects/xfs
sandeen@xxxxxxx   SGI, Inc.         651-683-3102


<Prev in Thread] Current Thread [Next in Thread>