xfs
[Top] [All Lists]

Re: Kernel oops on new mailserver

To: Steve Lord <lord@xxxxxxx>
Subject: Re: Kernel oops on new mailserver
From: Paul Schutte <paul@xxxxxxxxxxx>
Date: Tue, 26 Mar 2002 13:24:37 +0200
Cc: XFS mailing list <linux-xfs@xxxxxxxxxxx>
References: <3C965130.2CAB0A19@xxxxxxxxxxx> <3C98ADE8.ACDBEF4C@xxxxxxxxxxx> <1016642563.28200.113.camel@xxxxxxxxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
Hi,

It happened again.

Dell PE2550
1G RAM
2 x 1.133GHz CPUs
4x18Gb Seagate cheeta's in RAID 10 on Perc hardware RAID

kernel 2.4.18 (checked out on 21 March 2002 ) after the following take:


> New vnode code left in an assert which is no longer valid
>
> Date:  Thu Mar 21 06:39:40 PST 2002
> Workarea:  jen.americas.sgi.com:/src/lord/xfs-newpagebuf
>
> The following file(s) were checked into:
>   bonnie.engr.sgi.com:/isms/slinx/2.4.x-xfs
>
>
> Modid:  2.4.x-xfs:slinx:114612a
> linux/fs/xfs/linux/xfs_vnode.c - 1.71
>         - remove bad assert
>
>
>



gcc version 2.95.4  (Debian prerelease)
modutils 2.4.13

CONFIG_HIGHMEM4G=y

I see that Nathan Scott has checked changes in for page_buf.
I am checking out the latest release to see if it helps.

Paul


kernel BUG at ll_rw_blk.c:902!
invalid operand: 0000
CPU:    1
EIP:    0010:[<c021bebc>]    Tainted: P
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 0000001f   ebx: ca3a77a0   ecx: c03cc4c0   edx: 00002dbd
esi: 00000008   edi: 00000001   ebp: c8fa3cbc   esp: c8fa3ca8
ds: 0018   es: 0018   ss: 0018
Process modprobe (pid: 471, stackpage=c8fa3000)
Stack: c02ff562 00000386 ca3a77a0 00000000 00000001 c8fa3ce4 c021c047 00000001
       ca3a77a0 c67171a0 ca3a77a0 c67171a0 00001000 ca3a77a0 00000200 c8fa3cfc
       c013582e 00000001 00000001 c8fa3d04 c67171a0 c8fa3efc c013677d ca3a77a0
Call Trace: [<c021c047>] [<c013582e>] [<c013677d>] [<c0190735>] [<c0190735>]
   [<c0190735>] [<c0111758>] [<c0190735>] [<c01d6600>] [<c0136b96>]
[<c0136bb9>]
   [<c0136ae9>] [<c01e4c9c>] [<c01e4fbc>] [<c01e3f18>] [<c01e5652>]
[<c01e5682>]
   [<c01d6633>] [<c01c311c>] [<c01e3fa1>] [<c01e6f27>] [<c01db5d1>]
[<c01e6a32>]
   [<c0135fa6>] [<c010715b>]
Code: 0f 0b 83 c4 08 b8 03 00 00 00 f0 0f ab 43 18 0f b7 43 0c 66

>>EIP; c021bebc <submit_bh+54/98>   <=====
Trace; c021c046 <ll_rw_block+146/1b4>
Trace; c013582e <write_buffer+1a/58>
Trace; c013677c <fsync_inode_data_buffers+9c/164>
Trace; c0190734 <xfs_acl_iaccess+28/84>
Trace; c0190734 <xfs_acl_iaccess+28/84>
Trace; c0190734 <xfs_acl_iaccess+28/84>
Trace; c0111758 <do_page_fault+0/4e6>
Trace; c0190734 <xfs_acl_iaccess+28/84>
Trace; c01d6600 <xfs_trans_push_ail+1bc/1cc>
Trace; c0136b96 <__refile_buffer+56/60>
Trace; c0136bb8 <refile_buffer+18/24>
Trace; c0136ae8 <__mark_buffer_dirty+28/30>
Trace; c01e4c9c <set_buffer_dirty_uptodate+34/48>
Trace; c01e4fbc <__pb_block_commit_write_async+2c/50>
Trace; c01e3f18 <pagebuf_commit_write+48/b8>
Trace; c01e5652 <pagebuf_generic_file_write+296/300>
Trace; c01e5682 <pagebuf_generic_file_write+2c6/300>
Trace; c01d6632 <xfs_trans_unlocked_item+22/40>
Trace; c01c311c <xfs_iunlock+4c/58>
Trace; c01e3fa0 <pagebuf_flush+18/2c>
Trace; c01e6f26 <fs_flush_pages+2a/34>
Trace; c01db5d0 <xfs_fsync+e0/300>
Trace; c01e6a32 <linvfs_fsync+42/50>
Trace; c0135fa6 <sys_fdatasync+6a/b4>
Trace; c010715a <system_call+32/38>
Code;  c021bebc <submit_bh+54/98>
00000000 <_EIP>:
Code;  c021bebc <submit_bh+54/98>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c021bebe <submit_bh+56/98>
   2:   83 c4 08                  add    $0x8,%esp
Code;  c021bec0 <submit_bh+58/98>
   5:   b8 03 00 00 00            mov    $0x3,%eax
Code;  c021bec6 <submit_bh+5e/98>
   a:   f0 0f ab 43 18            lock bts %eax,0x18(%ebx)
Code;  c021beca <submit_bh+62/98>
   f:   0f b7 43 0c               movzwl 0xc(%ebx),%eax
Code;  c021bece <submit_bh+66/98>
  13:   66                        data16

Entering kdb (current=0xc8fa2000, pid 471) on processor 1 Oops: invalid operand

eax = 0x0000001f ebx = 0xca3a77a0 ecx = 0xc03cc4c0 edx = 0x00002dbd
esi = 0x00000008 edi = 0x00000001 esp = 0xc8fa3ca8 eip = 0xc021bebc
ebp = 0xc8fa3cbc xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010202

[1]kdb> bt
    EBP       EIP         Function(args)
0xc8fa3cbc 0xc021bebc submit_bh+0x54 (0x1, 0xca3a77a0, 0xc67171a0, 0xca3a77a0,
0xc67171a0)
                               kernel .text 0xc0100000 0xc021be68 0xc021bf00
0xc8fa3ce4 0xc021c047 ll_rw_block+0x147 (0x1, 0x1, 0xc8fa3d04, 0xc67171a0)
                               kernel .text 0xc0100000 0xc021bf00 0xc021c0b4
0xc8fa3cfc 0xc013582e write_buffer+0x1a (0xca3a77a0, 0xc67171a0, 0x1,
0xc03d3020, 0x0)
                               kernel .text 0xc0100000 0xc0135814 0xc013586c
0xc8fa3efc 0xc013677d fsync_inode_data_buffers+0x9d (0xc67171a0, 0xc6717254,
0x0)
                               kernel .text 0xc0100000 0xc01366e0 0xc0136844
0xc8fa3f10 0xc01e3fa1 pagebuf_flush+0x19 (0xc67171a0, 0x0, 0x0, 0x0)
                               kernel .text 0xc0100000 0xc01e3f88 0xc01e3fb4
0xc8fa3f28 0xc01e6f27 fs_flush_pages+0x2b (0xca9a8c64, 0x0, 0x0, 0xffffffff,
0xffffffff)
                               kernel .text 0xc0100000 0xc01e6efc 0xc01e6f30
0xc8fa3f64 0xc01db5d1 xfs_fsync+0xe1 (0xca9a8c64, 0x5, 0x0, 0x0, 0x0)
                               kernel .text 0xc0100000 0xc01db4f0 0xc01db7f0
0xc8fa3f90 0xc01e6a32 linvfs_fsync+0x42 (0xcc5227a0, 0xc6dac760, 0x1,
0xc6717254, 0xc8fa2000)
                               kernel .text 0xc0100000 0xc01e69f0 0xc01e6a40
0xc8fa3fbc 0xc0135fa6 sys_fdatasync+0x6a (0x0, 0x8063530, 0xbfffeca0,
0x8063530, 0x4013b6e0)
                               kernel .text 0xc0100000 0xc0135f3c 0xc0135ff0
           0xc010715b system_call+0x33
                               kernel .text 0xc0100000 0xc0107128 0xc0107160
[1]kdb> bh 0xca3a77a0
buffer_head at 0xca3a77a0
  next 0x00000000 bno 0 rsec 1054688 size 4096 dev 0x805 rdev 0x805
  count 2 state 0x5 [Uptodate Lock] ftime 0x899f47 b_list 1 b_reqnext
0x00000000 b_data 0xc41c2000
  b_page 0xc1107080 b_this_page 0xca3a77a0 b_private 0xcd388da0
[1]kdb> cpu
Currently on cpu 1
Available cpus: 0, 1
[1]kdb> cpu 0

Entering kdb (current=0xcf42e000, pid 171) on processor 0 due to cpu switch
[0]kdb> bt
    EBP       EIP         Function(args)
0xcf42ff84 0xc011558b do_syslog+0x14b (0x2, 0x804dcbf, 0xfff)
                               kernel .text 0xc0100000 0xc0115440 0xc0115804
0xcf42ff98 0xc01562da kmsg_read+0x12 (0xcffd7720, 0x804dca0, 0xfff, 0xcffd7740,
0xcf42e000)
                               kernel .text 0xc0100000 0xc01562c8 0xc01562e0
0xcf42ffbc 0xc013461d sys_read+0x91 (0x0, 0x804dca0, 0xfff, 0x0, 0x804eca0)
                               kernel .text 0xc0100000 0xc013458c 0xc01346a0
           0xc010715b system_call+0x33
                               kernel .text 0xc0100000 0xc0107128 0xc0107160
[0]kdb> lsmod
Module                  Size  modstruct     Used by
e100                   89752  0xd0850000     1
[0]kdb>


Steve Lord wrote:

>
> Well, I just rewrote this code (after the 14th) to clean up a number
> of problems in this area.
>
> Can you possibly try a current cvs tree. If you hit it again it will
> be in submit_bh this time. Can you run with kdb again, specify y
> for the KDB modules command. If it should happen again, run the
> bt command, take the second argument of the submit_bh function
> and use the bh command on it.
>
> Thanks
>
>    Steve
>
> --
>
> Steve Lord                                      voice: +1-651-683-3511
> Principal Engineer, Filesystem Software         email: lord@xxxxxxx


<Prev in Thread] Current Thread [Next in Thread>