On Thu, 26 Apr 2001, Steve Lord wrote:
>
> Having thought about this some more, I am almost certain that the problem
> is arising because of the mix of local and remote access to the filesystem.
>
> I would still like to see stack traces if you get a chance, but I think I
> know what happened.
>
> Steve
Okay -- I have had a chance to hook up a serial console to my machine and
have been able to get it to crash again. The error messages and stack
traces are as follows:
xfs_iget_core: ambiguous vns: vp/0xc3fd43a8, invp/0xc6da98c8
Unable to handle kernel NULL pointer dereference at virtual address
00000008
printing eip:
d08f90bf
*pde = 00000000
Entering kdb (current=0xce112000, pid 859) on processor 1 Oops: Oops
due to oops @ 0xd08f90bf
eax = 0x00000080 ebx = 0xc6da98c8 ecx = 0xc158a800 edx = 0x00000000
esi = 0xc6da98c8 edi = 0xcf84c710 esp = 0xce113a2c eip = 0xd08f90bf
ebp = 0x00000001 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010282
xds = 0x00000018 xes = 0x00000018 origeax = 0xffffffff ®s = 0xce1139f8
kdb> bt
EBP EIP Function(args)
0x00000001 0xd08f90bf [xfs]vn_revalidate+0x1f (0xc6da98c8, 0x80)
xfs .text 0xd0896060 0xd08f90a0 0xd08f9158
0xd08cf0f0 [xfs]xfs_iget_core+0x77c (0xc6da98c8, 0xc158a800, 0x0, 0x)
xfs .text 0xd0896060 0xd08ce974 0xd08cf118
0xd08cf188 [xfs]xfs_vn_iget+0x34 (0xc6da98c8, 0xc158a800, 0x0, 0x5c0)
xfs .text 0xd0896060 0xd08cf154 0xd08cf190
0xd08f8e81 [xfs]vn_initialize+0xd5 (0xc144aa60, 0xc6da97c0, 0x1)
xfs .text 0xd0896060 0xd08f8dac 0xd08f8efc
0xd08f8366 [xfs]linvfs_read_inode+0x1e (0xc6da97c0)
xfs .text 0xd0896060 0xd08f8348 0xd08f8398
0xc0148227 get_new_inode+0xe3 (0xcf860000, 0x5c02354, 0xc1462228, 0x)
kernel .text 0xc0100000 0xc0148144 0xc01482bc
0xc0148575 iget4+0xdd (0xcf860000, 0x5c02354, 0x0, 0x0, 0xcf867508)
kernel .text 0xc0100000 0xc0148498 0xc0148580
0xd08f15c7 [xfs]xfs_open_by_handle+0x113 (0xc01c586b, 0xbfff9a80, 0x)
xfs .text 0xd0896060 0xd08f14b4 0xd08f17d0
0xd08f28fd [xfs]xfs_ioctl+0xbb9 (0xcf66faf0, 0xcf867400, 0xcdfb2da0,)
xfs .text 0xd0896060 0xd08f1d44 0xd08f2c40
0xd08f10eb [xfs]linvfs_ioctl+0x2f (0xcf867400, 0xcdfb2da0, 0xc01c586)
xfs .text 0xd0896060 0xd08f10bc 0xd08f10f4
0xc0142056 sys_ioctl+0x1ea (0x5, 0xc01c586b, 0xbfff9a80, 0x80e7a40, )
kernel .text 0xc0100000 0xc0141e6c 0xc01420b0
0xc010701f system_call+0x33
kernel .text 0xc0100000 0xc0106fec 0xc0107024
ps of the current processes shows that 859 is xfsdump.
0xce112000 00000859 00000857 1 001 run 0xce112350*xfsdump
The output of the xfsdump would lead one to believe that it was doing the
normal files (e.g. non-directory files).
/usr/sbin/xfsdump: version 3.0 - Running single-threaded
/usr/sbin/xfsdump: WARNING: most recent level 0 dump was interrupted, but
not resuming that dump since resume (-R) option not specified
/usr/sbin/xfsdump: level 0 dump of fpga:/ibm1
/usr/sbin/xfsdump: dump date: Mon May 7 14:35:41 2001
/usr/sbin/xfsdump: session id: fc05be66-5713-4a8e-961c-de1bf6ed607c
/usr/sbin/xfsdump: session label: ""
/usr/sbin/xfsdump: ino map phase 1: skipping (no subtrees specified)
/usr/sbin/xfsdump: ino map phase 2: constructing initial dump list
/usr/sbin/xfsdump: ino map phase 3: skipping (no pruning necessary)
/usr/sbin/xfsdump: ino map phase 4: skipping (size estimated in phase 2)
/usr/sbin/xfsdump: ino map phase 5: skipping (only one dump stream)
/usr/sbin/xfsdump: ino map construction complete
/usr/sbin/xfsdump: estimated dump size: 1836213504 bytes
/usr/sbin/xfsdump: creating dump session media file 0 (media 0, file 0)
/usr/sbin/xfsdump: dumping ino map
/usr/sbin/xfsdump: dumping directories
/usr/sbin/xfsdump: dumping non-directory files
I am not able to make this fail consistently, but it does seem to fail
fairly regularly. Any help would be greatly appreciated.
.justin.
> >
> > I have been trying to backup xfs partitions using amanda, and there seems
> > to be a problem with xfsdump. Amanda suprisingly recognized xfsdump and
> > seems to tried to do the backup correctly. On the other hand, whenever we
> > attempted to backup the machine would belly-up hard.
> >
> > The problem is vexing, because sometimes it will fail consistently and
> > quickly, yet other times it seems to take all day to fail. I am doing to
> > following to replicate the amanda backup:
> >
> > ssh xfs_machine -l root "/usr/sbin/xfsdump -F -l 0 - /dev/sda1" | gzip -6
> > - > file.gz
> >
> > At the same time the xfs_machine is serving up a partition nfs and the
> > partition is being read and written, by two independant news spools.
> > Doing a usenet news spool over nfs onto xfs, may not be the best
> > performance-wise, but I think most may agree that a news server can hit
> > disks pretty hard when it comes to file ops.
> >
> > After xfsdump caused the machine to fail, got the following error message
> > on the console:
> >
> > xfs_iget_core: ambiguous vns: vp/0xc3a543c8 invp/0xcf89d948
> >
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 00000008
> >
> > printing eip:
> > d08f90bf
> > *pdc=00000000
> >
> > Entering kdb (current=0xc96dc00, pid 803) on processor 1
> > Oops: Oops
> > due to oops @ 0xd08490bf ...
> >
> > The process listing showed that xfs was pid 803.
> >
> > I have been unable to reliable recreate the failure. Sometimes it fails,
> > and some times it does not. (It does seem to fail more reliably in the
> > morning :) ) I am backing up about 2G of files produced by the news
> > servers. Any ideas?
> >
> > The machine is a Dual 500 MHz PIII, and the filesystems run on top of the
> > 3ware IDE raid card with 4 46G disks running in RAID level 5. (138G
> > filesystem available...) The XFS is the 2.4.3 version from April 5th.
> >
> >
> >
> > .justin.
> >
> > ------------------------------------------------------------------------
> > Justin Leonard Tripp justin@xxxxxxxxxx
> > Configurable Computing Laboratory Research Assistant CB 461 x8-7206
> > Electrical and Computer Engineering Department Brigham Young University
> >
> >
>
>
>
|