xfs
[Top] [All Lists]

Re: Log corruption?

To: Eric Sandeen <sandeen@xxxxxxx>
Subject: Re: Log corruption?
From: James Pearson <james-p@xxxxxxxxxxxxxxxxxx>
Date: Mon, 14 Oct 2002 12:57:13 +0100
Cc: Stephen Lord <lord@xxxxxxx>, linux-xfs@xxxxxxxxxxx
Organization: Moving Picture Company
References: <3DA181D2.B78A9C41@moving-picture.com> <1033996292.1053.32.camel@laptop.americas.sgi.com> <3DA19858.75C9E674@moving-picture.com> <3DA707BC.F6F56592@moving-picture.com> <1034357448.13979.9.camel@stout.americas.sgi.com> <3DA70DEE.190877FF@moving-picture.com> <1034358502.13979.18.camel@stout.americas.sgi.com>
Sender: linux-xfs-bounce@xxxxxxxxxxx
Unfortunately it doesn't work - I get (with XFS 1.1):

xfs_log_recover.c: In function `xlog_find_tail':
xfs_log_recover.c:841: structure has no member named `m_logdev_targp'

I had a quick look through the source of v1.1 and the recent 2.4.19
patch - but got lost ...

James Pearson

Eric Sandeen wrote:
> 
> How about a pseudo-patch, since I don't actually have 1.1 source handy
> at the moment...
> 
> look in xlog_find_tail, in linux/fs/xfs/xfs_log_recover.c
> 
> Change the bit that says:
> 
>         if (!readonly)
>                 error = xlog_clear_stale_blocks(log, tail_lsn);
> 
> to
> 
>         if (!is_read_only(log->l_mp->m_logdev_targp->pbr_kdev)) {
>                 error = xlog_clear_stale_blocks(log, tail_lsn);
>         }
> 
> Note that this won't fix your filesystems where you've already seen
> corruption, but it will hopefully prevent corruption in the future.
> 
> -Eric
> 
> On Fri, 2002-10-11 at 12:44, James Pearson wrote:
> > If a patch against XFS 1.1 is easy to do, then that'll be fine for the
> > moment...
> >
> > Thanks
> >
> > James Pearson
> >
> > Eric Sandeen wrote:
> > >
> > > James - I think Steve previously pointed out that there was a recent fix
> > > that may address this...  We'll get a new 1.2 prerelease spin out there
> > > soon which will contain it.  It would probably also be fairly easy to
> > > get you a patch for 1.1 if you'd prefer.
> > >
> > > -Eric
> > >
> > > On Fri, 2002-10-11 at 12:17, James Pearson wrote:
> > > > It's just happened on one of my workstations - at bootup I get
> > > > (2.4.18-xfs [XFS 1.1] kernel):
> > > >
> > > > XFS mounting filesystem sd(8,2)
> > > > XFS: WARNING: recovery required on readonly filesystem.
> > > > XFS: write access will be enabled during mount.
> > > > Starting XFS recovery on filesystem: sd(8,2) (dev: 8/2)
> > > > xfs_inotobp: xfs_imap()  returned error 22 on sd(8,2).  Returning error.
> > > > xfs_iunlink_remove: xfs_inotobp()  returned error 22 on sd(8,2).
> > > > Returning error
> > > > xfs_inactive:: xfs_ifree() returned error = 22 on sd(8,2)
> > > > xfs_force_shutdown(sd(8,2),0x1) called from line 1962 of file
> > > > xfs_vnodeops.c   Return address = 0xc01cd7a2
> > > > I/O Error Detected.  Shutting down filesystem: sd(8,2)
> > > > Please umount the filesystem, and rectify the problem(s)
> > > > Ending XFS recovery on filesystem: sd(8,2) (dev: 8/2)
> > > > pivotroot: pivot_root(/sysroot,/sysroot/initrd) failed: 2
> > > > Freeing unused kernel memory: 252k freed
> > > > Kernel panic: No init found.  Try passing init= option to kernel
> > > >
> > > >
> > > > If I boot off floppy/CD in rescue mode and try to mount the root
> > > > partition by hand I get (2.4.7-10SGI_XFS_PR1BOOT kernel):
> > > >
> > > > XFS mounting filesystem sd(8,17)
> > > > Starting XFS recovery on filesystem: sd(8,17) (dev: 8/17)
> > > > Ending XFS recovery on filesystem: sd(8,17) (dev: 8/17)
> > > > XFS mounting filesystem sd(8,2)
> > > > Starting XFS recovery on filesystem: sd(8,2) (dev: 8/2)
> > > > Unable to handle kernel NULL pointer dereference at virtual address
> > > > 00000152
> > > >  printing eip:
> > > > fc93faf2
> > > > *pde = 00000000
> > > > Oops: 0000
> > > > CPU:    0
> > > > EIP:    0010:[<fc93faf2>]
> > > > EFLAGS: 00010246
> > > > eax: 00000000   ebx: ffffffe8   ecx: c0226d84   edx: fc96e2c0
> > > > esi: f6aa17e4   edi: f6a6ec00   ebp: 00000000   esp: f7fd58b4
> > > > ds: 0018   es: 0018   ss: 0018
> > > > Process mount (pid: 102, stackpage=f7fd5000)
> > > > Stack: 41d20700 00000000 f6a6ec16 41d20700 fc94cbd0 f6a6ec00 00000000
> > > > 41d20700
> > > >        00000000 00000000 f7fd5924 00000000 00000000 c21c2b60 00000000
> > > > 00000000
> > > >        00000000 f6a6ed64 f6a6ed64 41d20700 00000000 c21c2b60 f7fd5924
> > > > 0187d281
> > > > Call Trace: [<fc94cbd0>] [<fc94d627>] [<fc94734c>] [<fc94f061>]
> > > > [<c0112f97>]
> > > >    [<fc92b270>] [<fc94dc43>] [<fc9572e6>] [<c0131522>] [<fc95745c>]
> > > > [<fc96ebc0>]
> > > >    [<fc96ebc0>] [<fc95748b>] [<fc96ebc0>] [<fc969098>] [<fc96ebc0>]
> > > > [<fc96e808>]
> > > >    [<c012bcfd>] [<c0122467>] [<c012bcb0>] [<c01256ee>] [<c01353c9>]
> > > > [<c01355bb>]
> > > >    [<fc96e808>] [<c0135d70>] [<fc96e808>] [<fc96e808>] [<c0136074>]
> > > > [<c0135f3c>]
> > > >    [<c0136108>] [<c0106ddb>]
> > > >
> > > > Code: 66 83 bb 6a 01 00 00 00 75 10 80 a3 50 01 00 00 f7 53 e8 6b
> > > >
> > > > Running xfs_repair -L 'fixes' the problem.
> > > >
> > > > James Pearson
> > > >
> > > > James Pearson wrote:
> > > > >
> > > > > The sequence of events is:
> > > > >
> > > > > Machine locks up - probably related to some Xwindows/application 
> > > > > problem
> > > > > (we use the Nvidia drivers)
> > > > >
> > > > > Machine is reset
> > > > >
> > > > > Kernel boots
> > > > >
> > > > > Fails to mount the root (XFS) file system - either with an oops of 
> > > > > some
> > > > > error telling us the file system is corrupt etc.
> > > > >
> > > > > Attempts to reset again produce same results above.
> > > > >
> > > > > Booting in rescue mode, running 'xfs_repair -L' and rebooting "fixes"
> > > > > the problem. xfs_repair finds some lost file and puts them in 
> > > > > lost+found
> > > > > - these are usually files from /tmp or /var/tmp.
> > > > >
> > > > > This doesn't happen every time a machine locks up, but it occurs may 
> > > > > be
> > > > > once a week or so on one or another of our 60 or so workstations.
> > > > >
> > > > > James Pearson
> > > > >
> > > > > Stephen Lord wrote:
> > > > > >
> > > > > > On Mon, 2002-10-07 at 07:45, James Pearson wrote:
> > > > > > > We have a number of workstations running RedHat 7.2 with a 2.4.18 
> > > > > > > XFS
> > > > > > > 1.1 kernel - every now and then a (different) machine will 
> > > > > > > crash/hang
> > > > > > > and fail to boot with a kernel oops and/or with XFS errors when 
> > > > > > > it tries
> > > > > > > to mount the root file system.
> > > > > > >
> > > > > > > The fix is to boot from floppy/CD in rescue mode and run 
> > > > > > > 'xfs_repair -L'
> > > > > > > on the root partition. The root file system is them mountable and 
> > > > > > > the
> > > > > > > machine reboots OK.
> > > > > > >
> > > > > > > I don't have exact error messages (don't have time to write down 
> > > > > > > the
> > > > > > > exact errors, as the priority is to get the machine up and 
> > > > > > > running ...)
> > > > > > >
> > > > > > > Is this a known problem? If it isn't, I'll attempt to get more
> > > > > > > information when it happens again.
> > > > > > >
> > > > > > > James Pearson
> > > > > > >
> > > > > >
> > > > > > Actually, a change just went into the cvs tree this weekend which 
> > > > > > might
> > > > > > be related to this, there is some zeroing of part of the log which 
> > > > > > is
> > > > > > always supposed to happen during mount. For a readonly mount this 
> > > > > > was
> > > > > > not happening - and the root is mounted this way. Should the machine
> > > > > > be shutdown and rebooted very shortly after this there is a 
> > > > > > possibility
> > > > > > of the second mount getting confused by the log contents.
> > > > > >
> > > > > > Is there any way this could be what is happening? Is this happening
> > > > > > on the second of two boots which are close together?
> > > > > >
> > > > > > Currently there is no way to get this code except from a cvs kernel,
> > > > > > we just put out some images of the first alpha of xfs 1.2, the next
> > > > > > spin of these should include this fix (hint hint Eric).
> > > > > >
> > > > > > Steve
> > > >
> > > --
> > > Eric Sandeen      XFS for Linux     http://oss.sgi.com/projects/xfs
> > > sandeen@xxxxxxx   SGI, Inc.         651-683-3102
> --
> Eric Sandeen      XFS for Linux     http://oss.sgi.com/projects/xfs
> sandeen@xxxxxxx   SGI, Inc.         651-683-3102


<Prev in Thread] Current Thread [Next in Thread>