On Tue, 2003-07-29 at 14:07, HABBINGA,ERIK (HP-Loveland,ex1) wrote:
> I just saw the same crash this morning, my first experience with
> 2.6.0-test2.
>
> 4 x Xeon processors
> 2 GB ram
> fibre channel drives
> more details upon request...
>
> I was running SPEC SFS, so 144 nfs client processes spread across 8 nfs
> client machines attached by gigabit ethernet to my server under test. 36
> file systems, and SPEC was trying to write 555 MB to each filesystem as fast
> as the NFS/network infrastructure would allow. I'm running stock
> 2.6.0-test2 with the qlogic qla2xxx fibre channel driver version 8.00.00b4
> (http://sourceforge.net/projects/linux-qla2xxx/) and the following patch to
> the driver code to make the luns visible.
>
> --- linux/drivers/scsi/qla2xxx/qla_os.c~ Tue Jul 29 08:39:12 2003
> +++ linux/drivers/scsi/qla2xxx/qla_os.c Tue Jul 29 08:38:56 2003
> @@ -2634,6 +2634,7 @@
> qla2x00_cfg_display_devices();
>
> scsi_add_host(host, &pdev->dev);
> + scsi_scan_host(host);
>
> return 0;
>
> Here's the oops/BUG listing.
>
> Thanks,
> Erik Habbinga
> Hewlett Packard
>
> # ------------[ cut here ]------------
> kernel BUG at fs/xfs/pagebuf/page_buf.c:1291!
> invalid operand: 0000 [#1]
> CPU: 6
> EIP: 0060:[<c0213b12>] Not tainted
> EFLAGS: 00010202
> EIP is at bio_end_io_pagebuf+0xc2/0x12e
> eax: 00000001 ebx: f6fb2380 ecx: 00000000 edx: c185a778
> esi: f060fbc0 edi: 00000000 ebp: ed2bf600 esp: f5f3d9fc
> ds: 007b es: 007b ss: 0068
> Process nfsd (pid: 5007, threadinfo=f5f3c000 task=f56e46a0)
> Stack: 00000001 00000000 00000046 c26a0a00 00000009 00001000 ed2bf600
> 00000000
> 00000200 00000200 c01562d7 ed2bf600 00000200 00000000 01801b1f
> 00000200
> ed2bf600 c0269374 ed2bf600 00000200 00000000 c2654600 ed2bf600
> 00000000
> Call Trace:
> [<c01562d7>] bio_endio+0x55/0x7a
> [<c0269374>] __end_that_request_first+0x204/0x224
> [<c029519f>] scsi_end_request+0x3b/0xbc
> [<c0295502>] scsi_io_completion+0x144/0x442
> [<c0293648>] scsi_delete_timer+0x16/0x30
> [<c02dda90>] sd_rw_intr+0x4e/0x198
> [<c020501a>] xfs_trans_commit+0x116/0x3d4
> [<c02045e3>] xfs_trans_dup+0xed/0xfc
> [<c01efbd5>] xfs_itruncate_finish+0x24d/0x430
> [<c020c450>] xfs_inactive_free_eofblocks+0x26c/0x2ba
> [<c021007b>] xfs_rwunlock+0x1/0x3a
> [<c020cb4e>] xfs_release+0x94/0xdc
> [<c021665d>] linvfs_release+0x1d/0x24
> [<c01518d0>] close_private_file+0x28/0x2a
> [<c01954f5>] nfsd_close+0x1d/0x3c
> [<c0195c97>] nfsd_write+0x20d/0x348
> [<c0336975>] udp_push_pending_frames+0x12d/0x244
> [<c03374d3>] udp_sendpage+0xf3/0x2a6
> [<c035ebe2>] svcauth_unix_accept+0x26a/0x28e
> [<c01929cc>] nfsd_proc_write+0xa8/0x122
> [<c0191a74>] nfsd_dispatch+0xe8/0x1e5
> [<c019198c>] nfsd_dispatch+0x0/0x1e5
> [<c035ac4b>] svc_process+0x4eb/0x673
> [<c01917e2>] nfsd+0x1de/0x388
> [<c0191604>] nfsd+0x0/0x388
> [<c010703d>] kernel_thread_helper+0x5/0xc
>
> Code: 0f 0b 0b 05 4f a6 37 c0 eb a9 89 d0 e8 d3 12 f2 ff eb a0 81
> <0>Kernel panic: Fatal exception in interrupt
> In interrupt handler - not syncing
>
>
>
> Sorry for the repost
> I forgot to cc the list.....
>
> -----Original Message-----
> From: Kostadin Todorov Karaivanov [ <mailto:larry@xxxxxxxxx>]
> Sent: Tuesday, July 29, 2003 10:52 AM
> To: 'Nathan Scott'
> Subject: RE: The infamous BUG in page_buff.c
>
>
> > -----Original Message-----
> > From: Nathan Scott [ <mailto:nathans@xxxxxxx>]
> > Sent: Tuesday, July 29, 2003 10:19 AM
> > To: k.karaivanov@xxxxxxxxx
> > Cc: linux-xfs@xxxxxxxxxxx
> > Subject: Re: The infamous BUG in page_buff.c
> >
> >
> > On Tue, Jul 29, 2003 at 09:58:42AM +0300, Kostadin Todorov
> > Karaivanov wrote:
> > > Short summary:
> > > I have seen at least 3 reports for , I beleave, same BUG. one of
> > > witch is mine. It's present since 2.5.6x till now. I know you are
> > > focused on 2.4 branch but still...
> > >
> > > reference:
> > > <http://marc.theaimsgroup.com/?l=linux-kernel&m=105941333012271&w=2>
> > > <http://www.ussg.iu.edu/hypermail/linux/kernel/0306.3/0357.html>
> > > <http://marc.theaimsgroup.com/?l=linux-xfs&m=105410871804737&w=2>
> > >
> >
> > A reproducible test case would be a big help here.
>
> Alas it's not so easy, at least for me.
> Sometimes it happens while I sit quietly and do nothing on that PC the
> other time it happens when I recompile kernel. To catch my oops I was
> forced to make an endless loop of kernel make bzImage; make modules;
> make clean on the other hand yesterday I try the same, plus some
> postgresql benchmarks
> in the background just to higher the load and nothing happened, 2 hours
> later when I have given up and stop all the tnings and when I was doing
> REALY nothing the machine dies 8-( .
>
> >
> > thanks.
> >
> > --
> > Nathan
> >
I hit it for the first time today - and in this case I actually
had kgdb in the kernel.
So at least I actually know what is going on, now its just a
matter of fixing it..... The BUG_ON may actually be bogus, here,
but I think we need to forward port some other logic from the 2.4
code.
Steve
--
Steve Lord voice: +1-651-683-3511
Principal Engineer, Filesystem Software email: lord@xxxxxxx
|