On Tue, 2001-10-30 at 15:09, Sean Kormilo wrote:
> Hi,
>
> I'm using XFS on a Linux PPC based system, with a Systran Fibre channel
> card (based on the Qlogic ISP2200A chip). I'm using Feral software
> device drivers (http://www.feral.com).
>
> Kernel version : 2.4.5 with the following sets of patches applied-
> XFS 1.0.1 - release version
> LVM 1.0.1-rc2
>
> I'm exporting an XFS filesystem over NFS to another linux based client.
> If files are being written to the filesystem over NFS and I pull the
> fibre channel link, I get a bunch of warning messages like:
>
> SCSI disk error : host 0 channel 0 id 3 lun 0 return code = 10000
> I/O error: dev 08:02, sector 22061432
>
> But in the end, I wind up with a kernel panic, the output of which
> follows:
>
> kernel BUG at sched.c:709!
> Oops: Exception in kernel mode, sig: 4
> NIP: C0011A18 XER: 20000000 LR: C0011A18 SP: C1113B70 REGS: c1113ac0 TRAP:
> 0700MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> Using defaults from ksymoops -t elf32-powerpc -a powerpc:common
> TASK = c1112000[8] 'isp_thrd0' Last syscall: -1
> last math c165c000 last altivec 00000000
> GPR00: C0011A18 C1113B70 C1112000 0000001B 00001032 00000001 C02A0000 C02A0000
> GPR08: C02A0000 00000000 00002CE3 C1113AB0 44822022 0184DC4C 00000000 00000000
> GPR16: 00000000 00000000 C02C0000 C1113E68 00000000 00000001 00000001 00000000
> GPR24: C01A0000 00000001 C1113C08 00000001 00000002 C0250000 00000000 C1113B70
> Call backtrace:
> C0011A18 C002879C C0027B08 C0027CFC C0094328 C00ED1B4 C00D2BD4
> C008FDDC C0090078 C01249D8 C0124AE4 C0124E88 C01394FC C011E674
> C011E464 C0019BB4 C0019A70 C0019870 C00119A0 C0009C28 C012DB58
> C0006664
> Kernel panic: Aiee, killing interrupt handler!
> Warning (Oops_read): Code line not seen, dumping what data is available
>
> >>NIP; c0011a18 <schedule+470/48c> <=====
> Trace; c0011a18 <schedule+470/48c>
> Trace; c002879c <___wait_on_page+9c/d0>
> Trace; c0027b08 <truncate_list_pages+9c/248>
> Trace; c0027cfc <truncate_inode_pages+48/94>
> Trace; c0094328 <pagebuf_target_clear+1c/30>
> Trace; c00ed1b4 <_xfs_force_shutdown+108/12c>
> Trace; c00d2bd4 <xlog_iodone+54/94>
> Trace; c008fddc <pagebuf_iodone+70/dc>
> Trace; c0090078 <_end_pagebuf_page_io+120/134>
> Trace; c01249d8 <__scsi_end_request+dc/1d0>
> Trace; c0124ae4 <scsi_end_request+18/28>
> Trace; c0124e88 <scsi_io_completion+2e4/300>
> Trace; c01394fc <rw_intr+1e4/1f4>
> Trace; c011e674 <scsi_finish_command+d4/e8>
> Trace; c011e464 <scsi_bottom_half_handler+74/130>
> Trace; c0019bb4 <bh_action+44/c4>
> Trace; c0019a70 <tasklet_hi_action+7c/bc>
> Trace; c0019870 <do_softirq+88/d0>
> Trace; c00119a0 <schedule+3f8/48c>
> Trace; c0009c28 <__down+54/b4>
> Trace; c012db58 <isp_task_thread+10c/768>
> Trace; c0006664 <kernel_thread+2c/38>
>
>
> 5 warnings issued. Results may not be reliable.
>
>
> If I remove the XFS patches from the tree and run the same test using
> ext2 based filesystems, the panic does not occur.
Not surprising, the xfs forced shutdown code is getting executed when
you lose the fiberchannel connection. It looks like the forced shutdown
code is doing a little too much in interrupt context, it will take some
thinking about as to how to fix this.
>
> Interestingly, if I run the system with the XFS patches applied, but
> have no XFS based filesystems mounted, it still panic's (I don't have
> the ksymoops output for this type of panic, however). I just get the
> following message on the console:
>
> Kernel panic: scsi_free:Bad offset
> In interrupt handler - not syncing
> Rebooting in 60 seconds..
>
In this case I have no idea - if you are not running on an xfs
filesystem then this has nothing to do with xfs, there are no mods in
the code base which would get executed in this case.
I suspect the forced shutdown code in xfs is going to need some work
before this scenario will work for you. You could try editing the
_xfs_force_shutdown() function in fs/xfs/xfs_rw.c and removing this
chunk of code:
while (xfs_incore_relse(&mp->m_ddev_targ, 1, 0)) {
if (ntries >= XFS_MAX_DRELSE_RETRIES)
break;
delay(++ntries * 5);
}
What it is doing in there looks radically different from the Irix
version, and is I suspect broken in this case. Let me know what this
does for you.
What does happen with ext2 - it should be getting I/O errors back from
the fiber channel driver.
> Any help or suggestions would be appreciated.
>
> Sean.
>
>
> --
>
> Sean C. Kormilo, Software Architect, Nortel Networks
> email: skormilo@xxxxxxxxxxxxxxxxxx
>
Steve
--
Steve Lord voice: +1-651-683-3511
Principal Engineer, Filesystem Software email: lord@xxxxxxx
|