xfs
[Top] [All Lists]

Re: Kernel Oops - still not solved

To: gbakos@xxxxxxxxxxxxxxx
Subject: Re: Kernel Oops - still not solved
From: Steve Lord <lord@xxxxxxx>
Date: 14 Oct 2003 15:44:46 -0500
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <Pine.SOL.4.53.0310101954340.8636@xxxxxxxxxxxxxxxxxxxx>
Organization:
References: <Pine.SOL.4.53.0310101954340.8636@xxxxxxxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
On Fri, 2003-10-10 at 19:14, Gaspar Bakos wrote:
> Hi,
> 
> Some time ago I reported on a problem I encountered. I could still not
> solve it, but I thought the details might help someone to help me...
> 
> System: i386 PIV, ASUS mobo, 512Mb RAM, 2 x 80Gb Seagate disks as RAID-1
> array, as hda and hdc. I have about 10 partitions on each disk, in the
> same configuration, and each is run as RAID-1 with its relevant pair on
> the other disk. Everything has XFS, except for 2 x 1Gb swap.
> Kernel: 2.4.18, XFS = 1.1, XFS utils are: xfsprogs-2.0.3-0.
> 
> I have this exact configuration running on two other PCs, and they have
> been running for 0.5yr now with no hint of problem. Anyway, there is this
> nasty PC, which crashed, prints an ugly kernel oops in /var/log/messages,
> then kswapd becomes zombie, sometimes other processes (kupdated) as well,
> the system starts to behave erratically (shutdown segfaults, CTRL+ALT+DEL
> segfaults, etc.), and then finally it dies.
> 
> I booted up XFS RH7,3 CD, and issued an xfs_repair on all the devices, of
> course none mounted yet, such as:
> xfs_repair /dev/md0
> ....
> One some of the /dev/md devices, and notably on a third disk
> (/dev/hdd, not part of the RAID) I get quite a few messages, but
> xfs_repair vades itself through them, and returns. What I would expect
> that it repaired the filesystem, and so if I run it again, I don't see any
> "disconnected inode ..., bad fork, ..." messages. But this is not true;
> a new xfs_repair produces almost exactly the same messages.


If you run xfs_repair with files in lost+found, it blows them away
first, so repeated repair runs which put stuff in lost+found will
generate output. This should probably be in a FAQ somewhere. Mounting
and unmounting before running repair is a good idea, as repair does
not replay the log - anyone care to implement that ;-).

> 
> Any clue on this?
> Any idea or suggestion what else should be used?
> 
> I also issued xfs_check, which seemed to repair things, or at least
> report. I did not clearly catch if the operations by xfs_check are also
> WRITE ones, or solely READ.
> 

xfs_check is a readonly command, it does not fix anything.


> The mystical thing is that if I unplug these two harddrives, hda and hdc,
> and place them in an identically configured other PC (really all
> components are the same brand/type), then I see no crashes. It might be
> only luck, although I applied quite heavy disk IO and CPU-load tests.
> 
> I replaced the motherboard and VGA card in the original system, moreover,
> got rid of all unnecessary cards, such as SCSI, watchdog, etc. Crash still
> happens if I copy a few gigs back and forth, then delete it.
> 

Hmm, did you try something like moving the memory between the machines,
or your ide cables?

Steve


-- 

Steve Lord                                      voice: +1-651-683-3511
Principal Engineer, Filesystem Software         email: lord@xxxxxxx


<Prev in Thread] Current Thread [Next in Thread>