xfs
[Top] [All Lists]

Re: Log corruption?

To: Eric Sandeen <sandeen@xxxxxxx>
Subject: Re: Log corruption?
From: James Pearson <james-p@xxxxxxxxxxxxxxxxxx>
Date: Mon, 14 Oct 2002 17:59:02 +0100
Cc: Stephen Lord <lord@xxxxxxx>, linux-xfs@xxxxxxxxxxx
Organization: Moving Picture Company
References: <Pine.LNX.4.44.0210140848450.19698-100000@xxxxxxxxxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
Thanks, that seems to compile OK.

However, I have a couple of questions:

When I was looking through the CVS history for xfs_log_recover.c, I
noticed this (similar) one line change was after a previous bigger
change to handle read-only mounts - does XFS 1.1 need something similar
as well?

Also, when I boot the system from CD/floppy in rescue mode and attempt
to mount the root file system, I get the oops described below - but in
this case the mount is mounting read-write - so I'm a bit confused to as
why this read-only mount fix will help?

Thanks

James Pearson

Eric Sandeen wrote:
> 
> Whoops, sorry, that changed since 1.1.
> 
> Try this, then...
> 
>          if (!is_read_only(log->l_mp->m_logdev)) {
>                  error = xlog_clear_stale_blocks(log, tail_lsn);
>          }
> 
> On Mon, 14 Oct 2002, James Pearson wrote:
> 
> > Unfortunately it doesn't work - I get (with XFS 1.1):
> >
> > xfs_log_recover.c: In function `xlog_find_tail':
> > xfs_log_recover.c:841: structure has no member named `m_logdev_targp'
> >
> > I had a quick look through the source of v1.1 and the recent 2.4.19
> > patch - but got lost ...
> >
> > James Pearson
> >
> > Eric Sandeen wrote:
> > >
> > > How about a pseudo-patch, since I don't actually have 1.1 source handy
> > > at the moment...
> > >
> > > look in xlog_find_tail, in linux/fs/xfs/xfs_log_recover.c
> > >
> > > Change the bit that says:
> > >
> > >         if (!readonly)
> > >                 error = xlog_clear_stale_blocks(log, tail_lsn);
> > >
> > > to
> > >
> > >         if (!is_read_only(log->l_mp->m_logdev_targp->pbr_kdev)) {
> > >                 error = xlog_clear_stale_blocks(log, tail_lsn);
> > >         }
> > >
> > > Note that this won't fix your filesystems where you've already seen
> > > corruption, but it will hopefully prevent corruption in the future.
> > >
> > > -Eric
> > >
> > > On Fri, 2002-10-11 at 12:44, James Pearson wrote:
> > > > If a patch against XFS 1.1 is easy to do, then that'll be fine for the
> > > > moment...
> > > >
> > > > Thanks
> > > >
> > > > James Pearson
> > > >
> > > > Eric Sandeen wrote:
> > > > >
> > > > > James - I think Steve previously pointed out that there was a recent 
> > > > > fix
> > > > > that may address this...  We'll get a new 1.2 prerelease spin out 
> > > > > there
> > > > > soon which will contain it.  It would probably also be fairly easy to
> > > > > get you a patch for 1.1 if you'd prefer.
> > > > >
> > > > > -Eric
> > > > >
> > > > > On Fri, 2002-10-11 at 12:17, James Pearson wrote:
> > > > > > It's just happened on one of my workstations - at bootup I get
> > > > > > (2.4.18-xfs [XFS 1.1] kernel):
> > > > > >
> > > > > > XFS mounting filesystem sd(8,2)
> > > > > > XFS: WARNING: recovery required on readonly filesystem.
> > > > > > XFS: write access will be enabled during mount.
> > > > > > Starting XFS recovery on filesystem: sd(8,2) (dev: 8/2)
> > > > > > xfs_inotobp: xfs_imap()  returned error 22 on sd(8,2).  Returning 
> > > > > > error.
> > > > > > xfs_iunlink_remove: xfs_inotobp()  returned error 22 on sd(8,2).
> > > > > > Returning error
> > > > > > xfs_inactive:: xfs_ifree() returned error = 22 on sd(8,2)
> > > > > > xfs_force_shutdown(sd(8,2),0x1) called from line 1962 of file
> > > > > > xfs_vnodeops.c   Return address = 0xc01cd7a2
> > > > > > I/O Error Detected.  Shutting down filesystem: sd(8,2)
> > > > > > Please umount the filesystem, and rectify the problem(s)
> > > > > > Ending XFS recovery on filesystem: sd(8,2) (dev: 8/2)
> > > > > > pivotroot: pivot_root(/sysroot,/sysroot/initrd) failed: 2
> > > > > > Freeing unused kernel memory: 252k freed
> > > > > > Kernel panic: No init found.  Try passing init= option to kernel
> > > > > >
> > > > > >
> > > > > > If I boot off floppy/CD in rescue mode and try to mount the root
> > > > > > partition by hand I get (2.4.7-10SGI_XFS_PR1BOOT kernel):
> > > > > >
> > > > > > XFS mounting filesystem sd(8,17)
> > > > > > Starting XFS recovery on filesystem: sd(8,17) (dev: 8/17)
> > > > > > Ending XFS recovery on filesystem: sd(8,17) (dev: 8/17)
> > > > > > XFS mounting filesystem sd(8,2)
> > > > > > Starting XFS recovery on filesystem: sd(8,2) (dev: 8/2)
> > > > > > Unable to handle kernel NULL pointer dereference at virtual address
> > > > > > 00000152
> > > > > >  printing eip:
> > > > > > fc93faf2
> > > > > > *pde = 00000000
> > > > > > Oops: 0000
> > > > > > CPU:    0
> > > > > > EIP:    0010:[<fc93faf2>]
> > > > > > EFLAGS: 00010246
> > > > > > eax: 00000000   ebx: ffffffe8   ecx: c0226d84   edx: fc96e2c0
> > > > > > esi: f6aa17e4   edi: f6a6ec00   ebp: 00000000   esp: f7fd58b4
> > > > > > ds: 0018   es: 0018   ss: 0018
> > > > > > Process mount (pid: 102, stackpage=f7fd5000)
> > > > > > Stack: 41d20700 00000000 f6a6ec16 41d20700 fc94cbd0 f6a6ec00 
> > > > > > 00000000
> > > > > > 41d20700
> > > > > >        00000000 00000000 f7fd5924 00000000 00000000 c21c2b60 
> > > > > > 00000000
> > > > > > 00000000
> > > > > >        00000000 f6a6ed64 f6a6ed64 41d20700 00000000 c21c2b60 
> > > > > > f7fd5924
> > > > > > 0187d281
> > > > > > Call Trace: [<fc94cbd0>] [<fc94d627>] [<fc94734c>] [<fc94f061>]
> > > > > > [<c0112f97>]
> > > > > >    [<fc92b270>] [<fc94dc43>] [<fc9572e6>] [<c0131522>] [<fc95745c>]
> > > > > > [<fc96ebc0>]
> > > > > >    [<fc96ebc0>] [<fc95748b>] [<fc96ebc0>] [<fc969098>] [<fc96ebc0>]
> > > > > > [<fc96e808>]
> > > > > >    [<c012bcfd>] [<c0122467>] [<c012bcb0>] [<c01256ee>] [<c01353c9>]
> > > > > > [<c01355bb>]
> > > > > >    [<fc96e808>] [<c0135d70>] [<fc96e808>] [<fc96e808>] [<c0136074>]
> > > > > > [<c0135f3c>]
> > > > > >    [<c0136108>] [<c0106ddb>]
> > > > > >
> > > > > > Code: 66 83 bb 6a 01 00 00 00 75 10 80 a3 50 01 00 00 f7 53 e8 6b
> > > > > >
> > > > > > Running xfs_repair -L 'fixes' the problem.
> > > > > >
> > > > > > James Pearson
> > > > > >
> > > > > > James Pearson wrote:
> > > > > > >
> > > > > > > The sequence of events is:
> > > > > > >
> > > > > > > Machine locks up - probably related to some Xwindows/application 
> > > > > > > problem
> > > > > > > (we use the Nvidia drivers)
> > > > > > >
> > > > > > > Machine is reset
> > > > > > >
> > > > > > > Kernel boots
> > > > > > >
> > > > > > > Fails to mount the root (XFS) file system - either with an oops 
> > > > > > > of some
> > > > > > > error telling us the file system is corrupt etc.
> > > > > > >
> > > > > > > Attempts to reset again produce same results above.
> > > > > > >
> > > > > > > Booting in rescue mode, running 'xfs_repair -L' and rebooting 
> > > > > > > "fixes"
> > > > > > > the problem. xfs_repair finds some lost file and puts them in 
> > > > > > > lost+found
> > > > > > > - these are usually files from /tmp or /var/tmp.
> > > > > > >
> > > > > > > This doesn't happen every time a machine locks up, but it occurs 
> > > > > > > may be
> > > > > > > once a week or so on one or another of our 60 or so workstations.
> > > > > > >
> > > > > > > James Pearson
> > > > > > >
> > > > > > > Stephen Lord wrote:
> > > > > > > >
> > > > > > > > On Mon, 2002-10-07 at 07:45, James Pearson wrote:
> > > > > > > > > We have a number of workstations running RedHat 7.2 with a 
> > > > > > > > > 2.4.18 XFS
> > > > > > > > > 1.1 kernel - every now and then a (different) machine will 
> > > > > > > > > crash/hang
> > > > > > > > > and fail to boot with a kernel oops and/or with XFS errors 
> > > > > > > > > when it tries
> > > > > > > > > to mount the root file system.
> > > > > > > > >
> > > > > > > > > The fix is to boot from floppy/CD in rescue mode and run 
> > > > > > > > > 'xfs_repair -L'
> > > > > > > > > on the root partition. The root file system is them mountable 
> > > > > > > > > and the
> > > > > > > > > machine reboots OK.
> > > > > > > > >
> > > > > > > > > I don't have exact error messages (don't have time to write 
> > > > > > > > > down the
> > > > > > > > > exact errors, as the priority is to get the machine up and 
> > > > > > > > > running ...)
> > > > > > > > >
> > > > > > > > > Is this a known problem? If it isn't, I'll attempt to get more
> > > > > > > > > information when it happens again.
> > > > > > > > >
> > > > > > > > > James Pearson
> > > > > > > > >
> > > > > > > >
> > > > > > > > Actually, a change just went into the cvs tree this weekend 
> > > > > > > > which might
> > > > > > > > be related to this, there is some zeroing of part of the log 
> > > > > > > > which is
> > > > > > > > always supposed to happen during mount. For a readonly mount 
> > > > > > > > this was
> > > > > > > > not happening - and the root is mounted this way. Should the 
> > > > > > > > machine
> > > > > > > > be shutdown and rebooted very shortly after this there is a 
> > > > > > > > possibility
> > > > > > > > of the second mount getting confused by the log contents.
> > > > > > > >
> > > > > > > > Is there any way this could be what is happening? Is this 
> > > > > > > > happening
> > > > > > > > on the second of two boots which are close together?
> > > > > > > >
> > > > > > > > Currently there is no way to get this code except from a cvs 
> > > > > > > > kernel,
> > > > > > > > we just put out some images of the first alpha of xfs 1.2, the 
> > > > > > > > next
> > > > > > > > spin of these should include this fix (hint hint Eric).
> > > > > > > >
> > > > > > > > Steve
> > > > > >
> > > > > --
> > > > > Eric Sandeen      XFS for Linux     http://oss.sgi.com/projects/xfs
> > > > > sandeen@xxxxxxx   SGI, Inc.         651-683-3102
> > > --
> > > Eric Sandeen      XFS for Linux     http://oss.sgi.com/projects/xfs
> > > sandeen@xxxxxxx   SGI, Inc.         651-683-3102
> >


<Prev in Thread] Current Thread [Next in Thread>