Oops with 2.4.16

Stephen Lord
Thu, 10 Jan 2002 21:45:51 -0600
References: <20020110221155.A912@xxxxxxxxxxxxx> <1010697908.2812.22.camel@xxxxxxxxxxxxxxxxxxxxxx> <20020110225711.A1259@xxxxxxxxxxxxx> <1010702208.1772.98.camel@xxxxxxxxxxxxxxxxxxxx> <20020111023859.A2413@xxxxxxxxxxxxx> <3C3E578B.7090309@xxxxxxx> <20020111043633.A791@xxxxxxxxxxxxx>
Pascal Haakmat wrote:

10/01/02 21:10, Stephen Lord wrote:

Pascal Haakmat wrote:

10/01/02 16:36, Steve Lord wrote:

On Thu, 2002-01-10 at 15:57, Pascal Haakmat wrote:

                ASSERT(ipointer_in == B_FALSE);
                ip = ip->i_mnext;
c01ccb34:       8b 4c 24 70             mov    0x70(%esp,1),%ecx
c01ccb38:       8b 76 08                mov    0x8(%esi),%esi
c01ccb3b:       8b 91 14 01 00 00       mov    0x114(%ecx),%edx

        } while (ip->i_mnext != mp->m_inodes);

[*ksymoops disassembly matches here*]

ip->i_mnext is NULL which is never supposed to happen, next question is

FWIW, this happened just after rebooting using the XFS 1.01/RedHat boot CD
and running xfs_repair on the filesystem, which hopefully rules out an
inconsistent filesystem/filesystem errors.

I don't think fs corruption would have much to do with this one, it is a purely in memory circular list. So far as I can see it is always manipulated under the correct locking. I have a box running a debug kernel sitting in a loop doing the test which Adrian says makes
this happen for him. It has been going for a few hours, so far no problems.

Well, I've been doing the same, and after 68 iterations of his script I got
this pair of messages, repeating every three seconds or so (no Oops or
anything else):

ide_dmaproc: chipset supported ide_dma_lostirq func only: 13
hdc: lost interrupt

Looks like a kernel problem or bad hardware?

Would you be willing turn on kdb? It only really makes sense if you are able to setup a serial console. There is a debugger command which will walk the complete list of
inodes in the filesystem.

The serial console won't happen, but I think it's no longer necessary
either. This is probably not an XFS bug, right?
Well, in memory corruption of xfs data structures should not be triggerable by losing an interrupt, I would like to track it down some more. Forget kdb if you
cannot do the console - we were talking a lot of output here. I may ask you
to run some sanity check code in the sync path - you said your oops was
repeatable, correct?


p.s. can you send me the script, I could look back in the xfs maillist, but I am feeling lazy, I am currently using something I wrote based on the brief description in this

