xfs
[Top] [All Lists]

Re: Oops with 2.4.16

To: Stephen Lord <lord@xxxxxxx>
Subject: Re: Oops with 2.4.16
From: Pascal Haakmat <a.haakmat@xxxxxxxxx>
Date: Fri, 11 Jan 2002 04:53:57 +0100
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <3C3E5FEF.40509@sgi.com>; from lord@sgi.com on Thu, Jan 10, 2002 at 09:45:51PM -0600
References: <20020110221155.A912@awacs.dhs.org> <1010697908.2812.22.camel@stout.americas.sgi.com> <20020110225711.A1259@awacs.dhs.org> <1010702208.1772.98.camel@jen.americas.sgi.com> <20020111023859.A2413@awacs.dhs.org> <3C3E578B.7090309@sgi.com> <20020111043633.A791@awacs.dhs.org> <3C3E5FEF.40509@sgi.com>
Sender: owner-linux-xfs@xxxxxxxxxxx
User-agent: Mutt/1.2.5i
10/01/02 21:45, Stephen Lord wrote:

> Pascal Haakmat wrote:
> 
> >10/01/02 21:10, Stephen Lord wrote:
> >
> >>Pascal Haakmat wrote:
> >>
> >>>10/01/02 16:36, Steve Lord wrote:

[snip]

> >>I don't think fs corruption would have much to do with this one, it is a 
> >>purely in memory
> >>circular list. So far as I can see it is always manipulated under the 
> >>correct locking. I have
> >>a box running a debug kernel sitting in a loop doing the test which 
> >>Adrian says makes
> >>this happen for him. It has been going for a few hours, so far no problems.
> >>
> >
> >Well, I've been doing the same, and after 68 iterations of his script I got
> >this pair of messages, repeating every three seconds or so (no Oops or
> >anything else):
> >
> >ide_dmaproc: chipset supported ide_dma_lostirq func only: 13
> >hdc: lost interrupt
> >
> >Looks like a kernel problem or bad hardware?
> >
> >>Would you be willing turn on kdb? It only really makes sense if you are 
> >>able to setup
> >>a serial console. There is a debugger command which will walk the 
> >>complete list of
> >>inodes in the filesystem.
> >>
> >
> >The serial console won't happen, but I think it's no longer necessary
> >either. This is probably not an XFS bug, right? 
> >
> Well, in memory corruption of xfs data structures should not be 
> triggerable by
> losing an interrupt, I would like to track it down some more. Forget kdb 
> if you
> cannot do the console - we were talking a lot of output here. I may ask you
> to run some sanity check code in the sync path - you said your oops was
> repeatable, correct?

Yes, it is, although it takes some time. I suppose if I had waited to reboot
the machine when it gave me the "hdc: lost interrupt" it might have turned
into an Oops eventually.

Right now I have printk's everywhere that I think m_inext gets set and I
didn't see it getting any strange values up to and until the "lost
interrupt" message. Perhaps I should have waited a bit longer before
rebooting. What other code would you like me to add?

> Steve
> 
> p.s. can you send me the script, I could look back in the xfs maillist, 
> but I am feeling
> lazy, I am currently using something I wrote based on the brief 
> description in this
> thread.

dd if=/dev/urandom of=01 bs=1024 count=8192

#!/bin/bash
cp -fr 01 2

for (( i=80; i!=2; i-- )) ; do
cp -fr 01 $i &
#  echo $i
done


<Prev in Thread] Current Thread [Next in Thread>