xfs
[Top] [All Lists]

re: FW: The infamous BUG in page_buff.c

To: "HABBINGA,ERIK ""(HP-Loveland,ex1)" <erik.habbinga@xxxxxx>
Subject: re: FW: The infamous BUG in page_buff.c
From: Steve Lord <lord@xxxxxxx>
Date: 30 Jul 2003 17:14:18 -0500
Cc: "'linux-xfs@xxxxxxxxxxx'" <linux-xfs@xxxxxxxxxxx>
In-reply-to: <F341E03C8ED6D311805E00902761278C0C35E6A6@xxxxxxxxxxxxxxx>
Organization:
References: <F341E03C8ED6D311805E00902761278C0C35E6A6@xxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
On Tue, 2003-07-29 at 14:07, HABBINGA,ERIK (HP-Loveland,ex1) wrote:
> I just saw the same crash this morning, my first experience with
> 2.6.0-test2.
> 
> 4 x Xeon processors
> 2 GB ram
> fibre channel drives
> more details upon request...
> 
> I was running SPEC SFS, so 144 nfs client processes spread across 8 nfs
> client machines attached by gigabit ethernet to my server under test.  36
> file systems, and SPEC was trying to write 555 MB to each filesystem as fast
> as the NFS/network infrastructure would allow.  I'm running stock
> 2.6.0-test2 with the qlogic qla2xxx fibre channel driver version 8.00.00b4
> (http://sourceforge.net/projects/linux-qla2xxx/) and the following patch to
> the driver code to make the luns visible.
> 
> --- linux/drivers/scsi/qla2xxx/qla_os.c~        Tue Jul 29 08:39:12 2003
> +++ linux/drivers/scsi/qla2xxx/qla_os.c Tue Jul 29 08:38:56 2003
> @@ -2634,6 +2634,7 @@
>                 qla2x00_cfg_display_devices();
> 
>         scsi_add_host(host, &pdev->dev);
> +       scsi_scan_host(host);
> 
>         return 0;
> 
> Here's the oops/BUG listing.
> 
> Thanks,
> Erik Habbinga
> Hewlett Packard
> 
> # ------------[ cut here ]------------
> kernel BUG at fs/xfs/pagebuf/page_buf.c:1291!
> invalid operand: 0000 [#1]
> CPU:    6
> EIP:    0060:[<c0213b12>]    Not tainted
> EFLAGS: 00010202
> EIP is at bio_end_io_pagebuf+0xc2/0x12e
> eax: 00000001   ebx: f6fb2380   ecx: 00000000   edx: c185a778
> esi: f060fbc0   edi: 00000000   ebp: ed2bf600   esp: f5f3d9fc
> ds: 007b   es: 007b   ss: 0068
> Process nfsd (pid: 5007, threadinfo=f5f3c000 task=f56e46a0)
> Stack: 00000001 00000000 00000046 c26a0a00 00000009 00001000 ed2bf600
> 00000000
>        00000200 00000200 c01562d7 ed2bf600 00000200 00000000 01801b1f
> 00000200
>        ed2bf600 c0269374 ed2bf600 00000200 00000000 c2654600 ed2bf600
> 00000000
> Call Trace:
>  [<c01562d7>] bio_endio+0x55/0x7a
>  [<c0269374>] __end_that_request_first+0x204/0x224
>  [<c029519f>] scsi_end_request+0x3b/0xbc
>  [<c0295502>] scsi_io_completion+0x144/0x442
>  [<c0293648>] scsi_delete_timer+0x16/0x30
>  [<c02dda90>] sd_rw_intr+0x4e/0x198
>  [<c020501a>] xfs_trans_commit+0x116/0x3d4
>  [<c02045e3>] xfs_trans_dup+0xed/0xfc
>  [<c01efbd5>] xfs_itruncate_finish+0x24d/0x430
>  [<c020c450>] xfs_inactive_free_eofblocks+0x26c/0x2ba
>  [<c021007b>] xfs_rwunlock+0x1/0x3a
>  [<c020cb4e>] xfs_release+0x94/0xdc
>  [<c021665d>] linvfs_release+0x1d/0x24
>  [<c01518d0>] close_private_file+0x28/0x2a
>  [<c01954f5>] nfsd_close+0x1d/0x3c
>  [<c0195c97>] nfsd_write+0x20d/0x348
>  [<c0336975>] udp_push_pending_frames+0x12d/0x244
>  [<c03374d3>] udp_sendpage+0xf3/0x2a6
>  [<c035ebe2>] svcauth_unix_accept+0x26a/0x28e
>  [<c01929cc>] nfsd_proc_write+0xa8/0x122
>  [<c0191a74>] nfsd_dispatch+0xe8/0x1e5
>  [<c019198c>] nfsd_dispatch+0x0/0x1e5
>  [<c035ac4b>] svc_process+0x4eb/0x673
>  [<c01917e2>] nfsd+0x1de/0x388
>  [<c0191604>] nfsd+0x0/0x388
>  [<c010703d>] kernel_thread_helper+0x5/0xc
> 
> Code: 0f 0b 0b 05 4f a6 37 c0 eb a9 89 d0 e8 d3 12 f2 ff eb a0 81
>  <0>Kernel panic: Fatal exception in interrupt
> In interrupt handler - not syncing
> 
> 
> 
> Sorry for the repost 
> I forgot to cc the list.....
> 
> -----Original Message-----
> From: Kostadin Todorov Karaivanov [ <mailto:larry@xxxxxxxxx>] 
> Sent: Tuesday, July 29, 2003 10:52 AM
> To: 'Nathan Scott'
> Subject: RE: The infamous BUG in page_buff.c
> 
> 
> > -----Original Message-----
> > From: Nathan Scott [ <mailto:nathans@xxxxxxx>]
> > Sent: Tuesday, July 29, 2003 10:19 AM
> > To: k.karaivanov@xxxxxxxxx
> > Cc: linux-xfs@xxxxxxxxxxx
> > Subject: Re: The infamous BUG in page_buff.c
> > 
> > 
> > On Tue, Jul 29, 2003 at 09:58:42AM +0300, Kostadin Todorov
> > Karaivanov wrote:
> > > Short summary:
> > > I have seen at least 3 reports for , I beleave, same BUG. one of 
> > > witch is mine. It's present since 2.5.6x till now. I know you are 
> > > focused on 2.4 branch but still...
> > > 
> > > reference:
> > >  <http://marc.theaimsgroup.com/?l=linux-kernel&m=105941333012271&w=2>
> > >  <http://www.ussg.iu.edu/hypermail/linux/kernel/0306.3/0357.html>
> > >  <http://marc.theaimsgroup.com/?l=linux-xfs&m=105410871804737&w=2>
> > > 
> > 
> > A reproducible test case would be a big help here.
> 
> Alas it's not so easy, at least for me.
> Sometimes it happens while I sit quietly and do nothing on that PC the
> other time it happens when I recompile kernel. To catch my oops I was
> forced to make an endless loop of kernel make bzImage; make modules;
> make clean on the other hand yesterday I try the same, plus some
> postgresql benchmarks 
> in the background just to higher the load and nothing happened, 2 hours
> later when I have given up and stop all the tnings and when I was doing
> REALY nothing the machine dies 8-( .
> 
> > 
> > thanks.
> > 
> > --
> > Nathan
> > 

I hit it for the first time today - and in this case I actually
had kgdb in the kernel.

So at least I actually know what is going on, now its just a
matter of fixing it..... The BUG_ON may actually be bogus, here,
but I think we need to forward port some other logic from the 2.4
code.

Steve

-- 

Steve Lord                                      voice: +1-651-683-3511
Principal Engineer, Filesystem Software         email: lord@xxxxxxx


<Prev in Thread] Current Thread [Next in Thread>