xfs
[Top] [All Lists]

RE: Linux 2.4.18 freeze running dbench 1.3

To: "Steve Lord" <lord@xxxxxxx>
Subject: RE: Linux 2.4.18 freeze running dbench 1.3
From: Christian Røsnes <christian@xxxxxxxxx>
Date: Mon, 4 Mar 2002 23:56:18 +0100
Cc: <linux-xfs@xxxxxxxxxxx>
Importance: Normal
In-reply-to: <1015257285.21528.55.camel@jen.americas.sgi.com>
Sender: owner-linux-xfs@xxxxxxxxxxx

> From: Steve Lord [mailto:lord@xxxxxxx] Sent: 4. mars 2002 16:55

> > On Mon, 2002-03-04 at 09:24, Christian Røsnes wrote:
> >
> > To follow up:
> >
> > I tested dbench on an ext2 partition (/home) on the same server,
> > and it worked. dbench on the XFS /usr partition still crashes
> the server.
> > This kernel was made with gcc 2.96.
>
>
> Let me confirm something here - you originally reported a complete
> hang when running with xfs. Keith asked you to run with the nmi
> oopser enabled - is this oops coming out after your turned on the
> oopser enabled (nmi_watchdog=1 on the kernel boot line)?


I've ran some tests to be sure:
Running dbench on an XFS partition.
All kernel compilation done with kgcc - gcc version 2.91.66

1) Without kdb turned on in the kernel, it did a
complete hang: No Oops on console or in /var/log/messages
(Running dbench on an ext2 partition works fine).

2) With kdb turned on, but without the nmi_watchdog=1 parameter
it oopsed and entered the debugger:

[root@dl02 dbench]# ./dbench 10
..invalid operand: 0000
CPU:    1
EIP:    0010:[<c01f1c47>]    Not tainted
EFLAGS: 00010086
eax: 0000004a   ebx: f75748f0   ecx: c03ed424   edx: 00002e72
esi: 00000297   edi: f7574800   ebp: 00000000   esp: f5f8da5c
ds: 0018   es: 0018   ss: 0018
Process dbench (pid: 1050, stackpage=f5f8d000)
Stack: c0315760 0000005a 00000000 00000005 00000010 f5f8dcc0 c01bb04d
f7574800
       0000001e ffffffeb 00000000 00000010 00000000 c01ba988 f5f8dcc0
00000001
       00000000 00000000 f7c73218 00000000 f5f8db60 00000000 fffe0005
00000010
Call Trace: [<c01bb04d>] [<c01ba988>] [<c01b94e9>] [<c020fc49>] [<c01ba988>]
   [<c01ba988>] [<c01e4c00>] [<c01e4c00>] [<c0204182>] [<c01dfc6f>]
[<c020d8b8>
   [<c020f38d>] [<c020e456>] [<c020cd6f>] [<c0208259>] [<c0206802>]
[<c0208453>
   [<c020ccf4>] [<c020e0bf>] [<c020ccf4>] [<c0209a6e>] [<c0145717>]
[<c01077bb>

Code: 0f 0b 83 c4 08 8d 74 26 00 c6 87 f0 00 00 00 01 56 9d 89 e8

Entering kdb (current=0xf5f8c000, pid 1050) on processor 1 Oops: invalid
operand
due to oops @ 0xc01f1c47
eax = 0x0000004a ebx = 0xf75748f0 ecx = 0xc03ed424 edx = 0x00002e72
esi = 0x00000297 edi = 0xf7574800 esp = 0xf5f8da5c eip = 0xc01f1c47
ebp = 0x00000000 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010086
xds = 0xc0310018 xes = 0x00000018 origeax = 0xffffffff &regs = 0xf5f8da28
[1]kdb> bt
    EBP       EIP         Function(args)
           0xc01f1c47 xfs_mod_incore_sb+0x93 (0xf7574800, 0x1e, 0xffffffeb,
0x0)
                               kernel .text 0xc0100000 0xc01f1bb4 0xc01f1c60
0xf5f8dcc0 0xc01bb04d xfs_bmapi+0x6c5 (0x0, 0xf5fb7bdc, 0x80, 0x0, 0x10)
                               kernel .text 0xc0100000 0xc01ba988 0xc01bbca4
           0xc020fc49 xfs_iomap_write_delay+0x64d (0xf5fb7bdc, 0xc4,
0xf5fb7d14)
                               kernel .text 0xc0100000 0xc020f5fc 0xc020fe8c
           0xc020d8b8 xfs_zero_last_block+0x680 (0xf657cccc, 0x80000, 0x0,
0x10)
                               kernel .text 0xc0100000 0xc020d238 0xc020d8e0
           0xc0208259 _pagebuf_file_write+0xf1 (0xf6da1334, 0x804bde8,
0xffbb, )
                               kernel .text 0xc0100000 0xc0208168 0xc0208390
           0xc0208453 pagebuf_generic_file_write+0xc3 (0xf6da1334,
0x804bde8, 0)
                               kernel .text 0xc0100000 0xc0208390 0xc02086fc
           0xc020e0bf xfs_write+0x2bb (0xf5fb7bf4, 0xf5f8df7c, 0x2, 0x0,
0x0)
                               kernel .text 0xc0100000 0xc020de04 0xc020e320
           0xc0209a6e linvfs_write+0x2fe (0xf6da1334, 0x804bde0, 0xffc3,
0xf6da)
                               kernel .text 0xc0100000 0xc0209770 0xc0209abc
           0xc0145717 sys_write+0x8f (0x6, 0x804bde0, 0xffc3, 0x10, 0x105e)
                               kernel .text 0xc0100000 0xc0145688 0xc0145858
           0xc01077bb system_call+0x33
                               kernel .text 0xc0100000 0xc0107788 0xc01077c0
[1]kdb> cpu 0

Entering kdb (current=0xf5f8a000, pid 1049) on processor 0 due to cpu switch
[0]kdb> bt
    EBP       EIP         Function(args)
0xf5f8bcc0 0xc01f242d _text_lock_xfs_mount+0xf (0x0, 0xf5faf5f0, 0x90, 0x0,
0x1)
                               kernel .text 0xc0100000 0xc01f241e 0xc01f2480
           0xc020fc49 xfs_iomap_write_delay+0x64d (0xf5faf5f0, 0xc4,
0xf5faf728)
                               kernel .text 0xc0100000 0xc020f5fc 0xc020fe8c
           0xc020d8b8 xfs_zero_last_block+0x680 (0xf6c6a758, 0x90000, 0x0,
0x10)
                               kernel .text 0xc0100000 0xc020d238 0xc020d8e0
           0xc0208259 _pagebuf_file_write+0xf1 (0xf72b59f4, 0x804bde9,
0xffba, )
                               kernel .text 0xc0100000 0xc0208168 0xc0208390
           0xc0208453 pagebuf_generic_file_write+0xc3 (0xf72b59f4,
0x804bde9, 0)
                               kernel .text 0xc0100000 0xc0208390 0xc02086fc
           0xc020e0bf xfs_write+0x2bb (0xf5faf608, 0xf5f8bf7c, 0x2, 0x0,
0x0)
                               kernel .text 0xc0100000 0xc020de04 0xc020e320
           0xc0209a6e linvfs_write+0x2fe (0xf72b59f4, 0x804bde0, 0xffc3,
0xf72b)
                               kernel .text 0xc0100000 0xc0209770 0xc0209abc
           0xc0145717 sys_write+0x8f (0x6, 0x804bde0, 0xffc3, 0x10, 0x105e)
                               kernel .text 0xc0100000 0xc0145688 0xc0145858
           0xc01077bb system_call+0x33
                               kernel .text 0xc0100000 0xc0107788 0xc01077c0
[0]kdb>


> From: Steve Lord [mailto:lord@xxxxxxx] Sent: 4. mars 2002 16:55
>
> If this is the case I think I can deduce the lock xfs has got hung
> up on - and safely say I have never seen this before. I may try and
> come up with some tracing code for this one, it looks like a spinlock
> leak on the superblock counters.
>
> Could you possibly try this:
>
> Edit fs/xfs/Makefile and fs/xfs/linux/Makefile, look for the line like
> this:
>
>       EXTRA_CFLAGS +=  -I. -funsigned-char
>
> (It is a little different in the xfs/linux/Makefile) Remove the
>    -funsigned-char
>
> Also remove all the .o files in the xfs directories after doing this.
> Now in the config tool, turn on spinlock debugging in the kernel and
> try running xfs again.
>
> You have to remove the -funsigned-char to make xfs function with
> spinlock debugging. We know it mostly works this way, but it has not
> had much exposure.
>
> Hopefully this will report some misuse of the lock somewhere.
>

3)
Ok, here goes:
Spinlock debugging turned on (it was also turned on in case 2 above)
Removed '-funsigned-char' from fs/xfs/Makefile and fs/xfs/linux/Makefile
make clean
make dep
make bzImage
make modules
make modules_install
reboot
(no nmi_watchdog=1 when booting kernel)

Test dbench:

[root@dl02 dbench]# ./dbench 10
10 clients started
..........

Result: A complete hang this time. No oops.

4) Tried the same kernel, but with nmi_watchdog=1.
Result: Complete hang. No oops.


5) Same as 3) above, but put '-funsigned-char' back in the two Makefiles.
with nmi_watchdog=1

This time it crashed when i tried to 'rm' the /usr/src/dbench directory,
to untar and rebuild the dbench test.

[root@dl02 root]# cd /usr/src
[root@dl02 src]# rm -rf dbench
invalid operand: 0000
CPU:    0
EIP:    0010:[<c01f6993>]    Not tainted
EFLAGS: 00010082
eax: 0000004a   ebx: f7c74014   ecx: c03ed424   edx: 00002eef
esi: 0000026e   edi: 000000d8   ebp: f7c74000   esp: f6de9d2c
ds: 0018   es: 0018   ss: 0018
Process rm (pid: 1108, stackpage=f6de9000)
Stack: c03160e0 0000005a 00000008 f7c73694 f7c73694 f7c73694 00000286
c01e7e23
       f7c74000 00000008 f7c73694 f7c73694 f7550000 f6de9d70 c01e9721
f7c73694
       00000000 00000296 f7c61930 c01ea74e f7c74000 00000000 00000008
f7c73694
Call Trace: [<c01e7e23>] [<c01e9721>] [<c01ea74e>] [<c01e8ecc>] [<c01e7ad4>]
   [<c01f6219>] [<c0212e48>] [<c01ffbed>] [<c01ffc23>] [<c020c7dd>]
[<c0155cf4>
   [<c0155ea0>] [<c01077bb>]

Code: 0f 0b 83 c4 08 c6 45 14 01 ff 74 24 10 9d 89 f0 89 fa 5b 5e

Entering kdb (current=0xf6de8000, pid 1108) on processor 0 Oops: invalid
operand
due to oops @ 0xc01f6993
eax = 0x0000004a ebx = 0xf7c74014 ecx = 0xc03ed424 edx = 0x00002eef
esi = 0x0000026e edi = 0x000000d8 esp = 0xf6de9d2c eip = 0xc01f6993
ebp = 0xf7c74000 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010082
xds = 0xc0310018 xes = 0x00000018 origeax = 0xffffffff &regs = 0xf6de9cf8
[0]kdb> bt
    EBP       EIP         Function(args)
0xf7c74000 0xc01f6993 xfs_trans_tail_ail+0xa3 (0xf7c74000)
                               kernel .text 0xc0100000 0xc01f68f0 0xc01f69ac
           0xc01e7e23 xlog_assign_tail_lsn+0x17 (0xf7c74000, 0x0)
                               kernel .text 0xc0100000 0xc01e7e0c 0xc01e7f5c
           0xc01ea74e xlog_state_release_iclog+0x22 (0xf7c73694, 0xf7550000)
                               kernel .text 0xc0100000 0xc01ea72c 0xc01ea88c
           0xc01e8ecc xlog_write+0x3ec (0xf7c74000, 0xf6de9e54, 0x8,
0xf7c61930)
                               kernel .text 0xc0100000 0xc01e8ae0 0xc01e8ed8
           0xc01e7ad4 xfs_log_write+0x38 (0xf7c74000, 0xf6de9e54, 0x8,
0xf7c619)
                               kernel .text 0xc0100000 0xc01e7a9c 0xc01e7af8
           0xc01f6219 xfs_trans_commit+0x19d (0xf7582e68, 0x4, 0x0)
                               kernel .text 0xc0100000 0xc01f607c 0xc01f6318
           0xc01ffc23 xfs_rmdir+0x523 (0xf6d7605c, 0xf7657948, 0x0, 0x0)
                               kernel .text 0xc0100000 0xc01ff700 0xc01ffd1c
           0xc020c7dd linvfs_rmdir+0x25 (0xf6decb50, 0xf76578ec)
                               kernel .text 0xc0100000 0xc020c7b8 0xc020c834
           0xc0155cf4 vfs_rmdir+0x2c8 (0xf6decb50, 0xf76578ec)
                               kernel .text 0xc0100000 0xc0155a2c 0xc0155de4
           0xc0155ea0 sys_rmdir+0xbc (0xbfffeeb0, 0x0, 0x1, 0x0, 0xbfffeeb0)
                               kernel .text 0xc0100000 0xc0155de4 0xc0155ee4
           0xc01077bb system_call+0x33
[0]more>
                               kernel .text 0xc0100000 0xc0107788 0xc01077c0
[0]kdb> cpu 1

Entering kdb (current=0xc2554000, pid 0) on processor 1 due to cpu switch
[1]kdb> bt
    EBP       EIP         Function(args)
0x000000d8 0xc01f68e0 _text_lock_xfs_trans (0xf764bc10, 0x0)
                               kernel .text 0xc0100000 0xc01f68e0 0xc01f68f0
           0xc01e944a xlog_state_do_callback+0x3aa (0xf7c73694, 0x0,
0xf7540000)
                               kernel .text 0xc0100000 0xc01e90a0 0xc01e9600
           0xc01e9702 xlog_state_done_syncing+0x102 (0xf7540000, 0x0)
                               kernel .text 0xc0100000 0xc01e9600 0xc01e970c
           0xc01e801b xlog_iodone+0x73 (0xf7765a34)
                               kernel .text 0xc0100000 0xc01e7fa8 0xc01e803c
           0xc0204fe8 pagebuf_iodone+0x38 (0xf7765a34)
                               kernel .text 0xc0100000 0xc0204fb0 0xc0205030
           0xc0205232 _end_pagebuf_page_io+0x15e (0xf71ccae4, 0x1)
                               kernel .text 0xc0100000 0xc02050d4 0xc020523c
           0xc024bc11 do_cciss_intr+0x241 (0x3, 0xf7c6a000, 0xc2555f7c,
0xc0444)
                               kernel .text 0xc0100000 0xc024b9d0 0xc024bce4
           0xc0108f88 handle_IRQ_event+0x50 (0x3, 0xc2555f7c, 0xc255f9ec,
0xc01)
                               kernel .text 0xc0100000 0xc0108f38 0xc0108fb4
0xc2555f74 0xc010929d do_IRQ+0x105 (0xc01054e0, 0x1, 0xc2554000, 0xc2554000,
0x)
                               kernel .text 0xc0100000 0xc0109198 0xc0109358
           0xc02eeaac call_do_IRQ+0x5
                               kernel .rodata 0xc02ec940 0xc02eeaa7
0xc02eeab4
           0xc0105572 cpu_idle+0x3e
[1]more>
                               kernel .text 0xc0100000 0xc0105534 0xc0105588
           0xc040bc86 start_secondary+0x26
                               kernel .text.init 0xc0406000 0xc040bc60
0xc040bc8
[1]kdb>



6) I then rebooted and did:
with nmi_watchdog=1

mv .config ..
make mrproper
mv ../.config .
make oldconfig
make dep clean bzImage modules
# install, boot

[root@dl02 dbench]# ./dbench 10
..invalid operand: 0000
CPU:    1
EIP:    0010:[<c01f1c47>]    Not tainted
EFLAGS: 00010086
eax: 0000004a   ebx: f7573cf0   ecx: c03ed424   edx: 00002ed2
esi: 00000297   edi: f7573c00   ebp: 00000000   esp: f6c8ba5c
ds: 0018   es: 0018   ss: 0018
Process dbench (pid: 1155, stackpage=f6c8b000)
Stack: c0315760 0000005a 00000000 00000005 00000010 f6c8bcc0 c01bb04d
f7573c00
       0000001e ffffffeb 00000000 00000010 00000000 c01ba988 f6c8bcc0
00000001
       00000000 00000000 f7c73530 00000000 f6c8bb60 00000000 fffe0005
00000010
Call Trace: [<c01bb04d>] [<c01ba988>] [<c01b94e9>] [<c020fc49>] [<c01ba988>]
   [<c01ba988>] [<c01e4c00>] [<c01e4c00>] [<c0204182>] [<c01dfc6f>]
[<c020d8b8>
   [<c020f38d>] [<c020e456>] [<c020cd6f>] [<c0208259>] [<c0206802>]
[<c0208453>
   [<c020ccf4>] [<c020e0bf>] [<c020ccf4>] [<c0209a6e>] [<c0145717>]
[<c01077bb>

Code: 0f 0b 83 c4 08 8d 74 26 00 c6 87 f0 00 00 00 01 56 9d 89 e8

Entering kdb (current=0xf6c8a000, pid 1155) on processor 1 Oops: invalid
operand
due to oops @ 0xc01f1c47
eax = 0x0000004a ebx = 0xf7573cf0 ecx = 0xc03ed424 edx = 0x00002ed2
esi = 0x00000297 edi = 0xf7573c00 esp = 0xf6c8ba5c eip = 0xc01f1c47
ebp = 0x00000000 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010086
xds = 0xc0310018 xes = 0x00000018 origeax = 0xffffffff &regs = 0xf6c8ba28
[1]kdb> bt
    EBP       EIP         Function(args)
           0xc01f1c47 xfs_mod_incore_sb+0x93 (0xf7573c00, 0x1e, 0xffffffeb,
0x0)
                               kernel .text 0xc0100000 0xc01f1bb4 0xc01f1c60
0xf6c8bcc0 0xc01bb04d xfs_bmapi+0x6c5 (0x0, 0xf6db2dc0, 0x80, 0x0, 0x10)
                               kernel .text 0xc0100000 0xc01ba988 0xc01bbca4
           0xc020fc49 xfs_iomap_write_delay+0x64d (0xf6db2dc0, 0xc4,
0xf6db2ef8)
                               kernel .text 0xc0100000 0xc020f5fc 0xc020fe8c
           0xc020d8b8 xfs_zero_last_block+0x680 (0xf6c7f044, 0x80000, 0x0,
0x10)
                               kernel .text 0xc0100000 0xc020d238 0xc020d8e0
           0xc0208259 _pagebuf_file_write+0xf1 (0xf72d14b4, 0x804bde8,
0xffbb, )
                               kernel .text 0xc0100000 0xc0208168 0xc0208390
           0xc0208453 pagebuf_generic_file_write+0xc3 (0xf72d14b4,
0x804bde8, 0)
                               kernel .text 0xc0100000 0xc0208390 0xc02086fc
           0xc020e0bf xfs_write+0x2bb (0xf6db2dd8, 0xf6c8bf7c, 0x2, 0x0,
0x0)
                               kernel .text 0xc0100000 0xc020de04 0xc020e320
           0xc0209a6e linvfs_write+0x2fe (0xf72d14b4, 0x804bde0, 0xffc3,
0xf72d)
                               kernel .text 0xc0100000 0xc0209770 0xc0209abc
           0xc0145717 sys_write+0x8f (0x6, 0x804bde0, 0xffc3, 0x10, 0x105d)
                               kernel .text 0xc0100000 0xc0145688 0xc0145858
           0xc01077bb system_call+0x33
                               kernel .text 0xc0100000 0xc0107788 0xc01077c0
[1]kdb> cpu 0

Entering kdb (current=0xf6c7c000, pid 1156) on processor 0 due to cpu switch
[0]kdb> bt
    EBP       EIP         Function(args)
0xf6c7dcc0 0xc01f242d _text_lock_xfs_mount+0xf (0x0, 0xf6c4d9d8, 0x90, 0x0,
0x1)
                               kernel .text 0xc0100000 0xc01f241e 0xc01f2480
           0xc020fc49 xfs_iomap_write_delay+0x64d (0xf6c4d9d8, 0xc4,
0xf6c4db10)
                               kernel .text 0xc0100000 0xc020f5fc 0xc020fe8c
           0xc020d8b8 xfs_zero_last_block+0x680 (0xf6c35c4c, 0x90000, 0x0,
0x10)
                               kernel .text 0xc0100000 0xc020d238 0xc020d8e0
           0xc0208259 _pagebuf_file_write+0xf1 (0xf737d644, 0x804bde9,
0xffba, )
                               kernel .text 0xc0100000 0xc0208168 0xc0208390
           0xc0208453 pagebuf_generic_file_write+0xc3 (0xf737d644,
0x804bde9, 0)
                               kernel .text 0xc0100000 0xc0208390 0xc02086fc
           0xc020e0bf xfs_write+0x2bb (0xf6c4d9f0, 0xf6c7df7c, 0x2, 0x0,
0x0)
                               kernel .text 0xc0100000 0xc020de04 0xc020e320
           0xc0209a6e linvfs_write+0x2fe (0xf737d644, 0x804bde0, 0xffc3,
0xf737)
                               kernel .text 0xc0100000 0xc0209770 0xc0209abc
           0xc0145717 sys_write+0x8f (0x6, 0x804bde0, 0xffc3, 0x10, 0x105d)
                               kernel .text 0xc0100000 0xc0145688 0xc0145858
           0xc01077bb system_call+0x33
                               kernel .text 0xc0100000 0xc0107788 0xc01077c0
[0]kdb>


Also: When the system has rebooted, and I try to run dbench on the
XFS partition again, it will not run. Eg:

root@dl02 dbench]# ./dbench 10
bash: ./dbench: cannot execute binary file

[root@dl02 dbench]# file dbench
dbench: ASCII text, with no line terminators


Thank you for your assistance.

Christian



<Prev in Thread] Current Thread [Next in Thread>