>
> I've had a crash while doing cd /; find . -xdev -print0 | cpio -oumd0 /mnt.
> That copy went fine for a while, them I did cd /mnt/tmp; rm -rf * and
> shortly after this the machine crashed. The epc is pointing to 800d430c:
>
> [...]
> 800d42f8: 90a20000 lbu $v0,0($a1)
> 800d42fc: 14540005 bne $v0,$s4,800d4314
> <isp1020_intr_handler+0x31c>
> 800d4300: 3c020007 lui $v0,0x7
> 800d4304: 0c0350fb jal 800d43ec <isp1020_return_status>
> 800d4308: 00a0202d move $a0,$a1
> 800d430c: 10000002 b 800d4318 <isp1020_intr_handler+0x320>
> 800d4310: ae020228 sw $v0,552($s0)
> 800d4314: ae020228 sw $v0,552($s0)
> 800d4318: 960200d6 lhu $v0,214($s0)
> 800d431c: 5040000a beqzl $v0,800d4348
> <isp1020_intr_handler+0x350>
> [...]
>
> this is a branch, so the fault was caused by the following instruction which
I hope you verified the BD bit was set in the Cause register printed out
as part of the panic ...
> was dereferencing a NULL pointer. What makes me more worried about this kind
> of crash is I also keep receiving reports about I/O errors and data corruption
This looks like the isp1020_intr_handler() code to me, I would be surprised
if the 32-bit guys are using the same driver. In any case, it might be
worthwhile to match up with C code and see which variable/pointer was
NULL, that might give us a clue.
AFAICS, the above asm code probably corresponds to this in
isp1020_intr_handler:
if (sts->hdr.entry_type == ENTRY_STATUS)
Cmnd->result = isp1020_return_status(sts);
else
Cmnd->result = DID_ERROR << 16;
It seems to me that Cmnd turned out to be 0, and I see that
Cmnd = hostdata->cmd_slots[cmd_slot];
and this in fact could be the first place that uses Cmnd.
Kanoj
> from users of the 32-bit kernel while copying from one disk to another
> physical disk, so exactly what also happened here.
>
> Ideas?
>
> Ralf
>
|