xfs
[Top] [All Lists]

Kernel Oops - still not solved

To: linux-xfs@xxxxxxxxxxx
Subject: Kernel Oops - still not solved
From: Gaspar Bakos <gbakos@xxxxxxxxxxxxxxx>
Date: Fri, 10 Oct 2003 20:14:04 -0400 (EDT)
Reply-to: gbakos@xxxxxxxxxxxxxxx
Sender: linux-xfs-bounce@xxxxxxxxxxx
Hi,

Some time ago I reported on a problem I encountered. I could still not
solve it, but I thought the details might help someone to help me...

System: i386 PIV, ASUS mobo, 512Mb RAM, 2 x 80Gb Seagate disks as RAID-1
array, as hda and hdc. I have about 10 partitions on each disk, in the
same configuration, and each is run as RAID-1 with its relevant pair on
the other disk. Everything has XFS, except for 2 x 1Gb swap.
Kernel: 2.4.18, XFS = 1.1, XFS utils are: xfsprogs-2.0.3-0.

I have this exact configuration running on two other PCs, and they have
been running for 0.5yr now with no hint of problem. Anyway, there is this
nasty PC, which crashed, prints an ugly kernel oops in /var/log/messages,
then kswapd becomes zombie, sometimes other processes (kupdated) as well,
the system starts to behave erratically (shutdown segfaults, CTRL+ALT+DEL
segfaults, etc.), and then finally it dies.

I booted up XFS RH7,3 CD, and issued an xfs_repair on all the devices, of
course none mounted yet, such as:
xfs_repair /dev/md0
....
One some of the /dev/md devices, and notably on a third disk
(/dev/hdd, not part of the RAID) I get quite a few messages, but
xfs_repair vades itself through them, and returns. What I would expect
that it repaired the filesystem, and so if I run it again, I don't see any
"disconnected inode ..., bad fork, ..." messages. But this is not true;
a new xfs_repair produces almost exactly the same messages.

Any clue on this?
Any idea or suggestion what else should be used?

I also issued xfs_check, which seemed to repair things, or at least
report. I did not clearly catch if the operations by xfs_check are also
WRITE ones, or solely READ.

The mystical thing is that if I unplug these two harddrives, hda and hdc,
and place them in an identically configured other PC (really all
components are the same brand/type), then I see no crashes. It might be
only luck, although I applied quite heavy disk IO and CPU-load tests.

I replaced the motherboard and VGA card in the original system, moreover,
got rid of all unnecessary cards, such as SCSI, watchdog, etc. Crash still
happens if I copy a few gigs back and forth, then delete it.

Cheers
Gaspar

Sep 28 21:50:10 hat7 kernel: EFSCORRUPTED returned from file xfs_bmap.c line 
4678
Sep 28 21:50:10 hat7 kernel: Unable to handle kernel paging request at virtual 
address ff00001c
Sep 28 21:50:10 hat7 kernel:  printing eip:
Sep 28 21:50:10 hat7 kernel: c01afd76
Sep 28 21:50:10 hat7 kernel: *pde = 00000000
Sep 28 21:50:10 hat7 kernel: Oops: 0000
Sep 28 21:50:10 hat7 kernel: CPU:    0
Sep 28 21:50:10 hat7 kernel: EIP:    0010:[<c01afd76>]    Not tainted
Sep 28 21:50:10 hat7 kernel: EFLAGS: 00010286
Sep 28 21:50:10 hat7 kernel: eax: 6cc1842c   ebx: 6cc1842c   ecx: 00000200   
edx: 00000200
Sep 28 21:50:10 hat7 kernel: esi: ff000000   edi: 00000000   ebp: cf3dcc80   
esp: c1831e9c
Sep 28 21:50:10 hat7 kernel: ds: 0018   es: 0018   ss: 0018
Sep 28 21:50:10 hat7 kernel: Process kswapd (pid: 5, stackpage=c1831000)
Sep 28 21:50:10 hat7 kernel: Stack: 00000004 cf3ddc6d c019b11b 6cc1842c 
ff000000 000003de c01c83d7 c01c60f6
Sep 28 21:50:10 hat7 kernel:        cf3ddc6d 00000004 cf3ddc84 00000000 
cf3dcc80 00000000 c304c483 c01c51a6
Sep 28 21:50:10 hat7 kernel:        cf3ddd7c 00000000 cf3dcc80 c02ecc00 
00000200 c01c83d7 00000fd8 00000000
Sep 28 21:50:10 hat7 kernel: Call Trace: [<c019b11b>] [<c01c83d7>] [<c01c60f6>] 
[<c01c51a6>] [<c01c83d7>]
Sep 28 21:50:10 hat7 kernel:    [<c01c912c>] [<c0129711>] [<c01c83d7>] 
[<c01428bf>] [<c014095d>] [<c0140c1b>]
Sep 28 21:50:10 hat7 kernel:    [<c012a6d5>] [<c012a721>] [<c012a7cf>] 
[<c012a83e>] [<c012a957>] [<c0105000>]
Sep 28 21:50:10 hat7 kernel:    [<c01054e3>]
Sep 28 21:50:10 hat7 kernel:
Sep 28 21:50:10 hat7 kernel: Code: f6 46 1c 01 74 26 f6 83 30 02 00 00 10 75 1d 
8d 43 18 50 e8


<Prev in Thread] Current Thread [Next in Thread>