linux-origin
[Top] [All Lists]

Re: Crashes

To: ralf@xxxxxxxxxxx (Ralf Baechle)
Subject: Re: Crashes
From: Kanoj Sarcar <kanoj@xxxxxxxxxxxxxxxxxxx>
Date: Mon, 2 Oct 2000 14:05:35 -0700 (PDT)
Cc: linux-origin@xxxxxxxxxxx
In-reply-to: <20001002103334.A29695@xxxxxxxxxxxxxxxx> from "Ralf Baechle" at Oct 02, 2000 10:33:34 AM
Sender: owner-linux-origin@xxxxxxxxxxx
> 
> I've had a crash while doing cd /; find . -xdev -print0 | cpio -oumd0 /mnt.
> That copy went fine for a while, them I did cd /mnt/tmp; rm -rf * and
> shortly after this the machine crashed.  The epc is pointing to 800d430c:
> 
> [...]
>     800d42f8:   90a20000        lbu     $v0,0($a1)
>     800d42fc:   14540005        bne     $v0,$s4,800d4314 
> <isp1020_intr_handler+0x31c>
>     800d4300:   3c020007        lui     $v0,0x7
>     800d4304:   0c0350fb        jal     800d43ec <isp1020_return_status>
>     800d4308:   00a0202d        move    $a0,$a1
>     800d430c:   10000002        b       800d4318 <isp1020_intr_handler+0x320>
>     800d4310:   ae020228        sw      $v0,552($s0)
>     800d4314:   ae020228        sw      $v0,552($s0)
>     800d4318:   960200d6        lhu     $v0,214($s0)
>     800d431c:   5040000a        beqzl   $v0,800d4348 
> <isp1020_intr_handler+0x350>
> [...]
> 
> this is a branch, so the fault was caused by the following instruction which

I hope you verified the BD bit was set in the Cause register printed out
as part of the panic ...

> was dereferencing a NULL pointer.  What makes me more worried about this kind
> of crash is I also keep receiving reports about I/O errors and data corruption

This looks like the isp1020_intr_handler() code to me, I would be surprised
if the 32-bit guys are using the same driver. In any case, it might be 
worthwhile to match up with C code and see which variable/pointer was 
NULL, that might give us a clue. 

AFAICS, the above asm code probably corresponds to this in 
isp1020_intr_handler:

                if (sts->hdr.entry_type == ENTRY_STATUS)
                        Cmnd->result = isp1020_return_status(sts);
                else
                        Cmnd->result = DID_ERROR << 16;

It seems to me that Cmnd turned out to be 0, and I see that 

                Cmnd = hostdata->cmd_slots[cmd_slot];

and this in fact could be the first place that uses Cmnd. 

Kanoj

> from users of the 32-bit kernel while copying from one disk to another
> physical disk, so exactly what also happened here.
> 
> Ideas?
> 
>   Ralf
> 


<Prev in Thread] Current Thread [Next in Thread>