Hello all,
I'm getting the following kernel BUG when running SPEC SFS NFS testing on
a 2.4.19-rc1 kernel and an XFS CVS download from July 9th:
Hardware:
4 Intel PIII 700 MHz 1MB cache
2 GB memory
whole bunch of fibre channel disks
ksymoops 2.4.0 on i686 2.4.2-2. Options used
-V (default)
-K (specified)
-L (specified)
-O (specified)
-m System.map (specified)
kernel BUG at filemap.c:843!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c012992c>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000 ebx: c1369d10 ecx: 00000016 edx: c0338f54
esi: c2801268 edi: f6d152a0 ebp: e598e3a0 esp: f7bfbecc
ds: 0018 es: 0018 ss: 0018
Process ksoftirqd_CPU0 (pid: 3, stackpage=f7bfb000)
Stack: c470fb40 c470fb40 c01e316f f6a6c4b8 c470fb40 00000001 f6a6c400
c1369d10
c01e31ad c470fb40 00000001 00000001 c024a30e c470fb40 00000001
f6a6c400
f6a6c400 00000001 00000010 c024a612 f6a6c400 00000001 00000008
00000001
Call Trace: [<c01e316f>] [<c01e31ad>] [<c024a30e>] [<c024a612>] [<c027550b>]
[<c02446f7>] [<c02445b6>] [<c011d190>] [<c011d063>] [<c011cdef>]
[<c011d325>]
[<c0107014>]
Code: 0f 0b 4b 03 a0 43 2e c0 8d 46 04 39 46 04 74 0e 31 c9 ba 03
>>EIP; c012992c <unlock_page+44/68> <=====
Trace; c01e316f <_end_pagebuf_page_io_multi+107/134>
Trace; c01e31ad <_end_io_multi_full+11/18>
Trace; c024a30e <__scsi_end_request+6a/130>
Trace; c024a612 <scsi_io_completion+166/378>
Trace; c027550b <rw_intr+197/1a0>
Trace; c02446f7 <scsi_finish_command+a7/b0>
Trace; c02445b6 <scsi_bottom_half_handler+c2/d8>
Trace; c011d190 <bh_action+4c/88>
Trace; c011d063 <tasklet_hi_action+67/a0>
Trace; c011cdef <do_softirq+6f/cc>
Trace; c011d325 <ksoftirqd+a9/c4>
Trace; c0107014 <kernel_thread+28/38>
Code; c012992c <unlock_page+44/68>
00000000 <_EIP>:
Code; c012992c <unlock_page+44/68> <=====
0: 0f 0b ud2a <=====
Code; c012992e <unlock_page+46/68>
2: 4b dec %ebx
Code; c012992f <unlock_page+47/68>
3: 03 a0 43 2e c0 8d add 0x8dc02e43(%eax),%esp
Code; c0129935 <unlock_page+4d/68>
9: 46 inc %esi
Code; c0129936 <unlock_page+4e/68>
a: 04 39 add $0x39,%al
Code; c0129938 <unlock_page+50/68>
c: 46 inc %esi
Code; c0129939 <unlock_page+51/68>
d: 04 74 add $0x74,%al
Code; c012993b <unlock_page+53/68>
f: 0e push %cs
Code; c012993c <unlock_page+54/68>
10: 31 c9 xor %ecx,%ecx
Code; c012993e <unlock_page+56/68>
12: ba 03 00 00 00 mov $0x3,%edx
<0>Kernel panic: Aiee, killing interrupt handler!
Here's the code from fs/filemap.c:unlock_page:
/*
* Unlock the page and wake up sleepers in ___wait_on_page.
*/
void unlock_page(struct page *page)
{
wait_queue_head_t *waitqueue = page_waitqueue(page);
ClearPageLaunder(page);
smp_mb__before_clear_bit();
if (!test_and_clear_bit(PG_locked, &(page)->flags))
BUG(); <=====
smp_mb__after_clear_bit();
if (waitqueue_active(waitqueue))
wake_up_all(waitqueue);
}
When I run the same test on the same kernel, but with an XFS CVS download
from Feb 7, 2002, the test runs to completion without any BUGs.
Also, the latency numbers for SPEC SFS also look worse for the latest XFS
versus Feb 7 (third number is average response time per operation in
milliseconds)
linux-2.4.19-rc1-xfs_020702
500 495 0.8 148535 300 3 U 5071248 4 18 2 2
3.0
1000 1001 2.4 300112 299 3 U 10140624 4 18 2 2
3.0
1500 1522 4.2 455020 299 3 U 15211872 4 18 2 2
3.0
2000 2038 4.6 609447 299 3 U 20281248 4 18 2 2
3.0
2500 2532 5.0 757196 299 3 U 25350624 4 18 2 2
3.0
...test continues fine...
linux-2.4.19-rc1-xfs_070902
500 496 0.7 148537 299 3 U 5071248 4 18 2 2
3.0
1000 1001 1.6 300145 299 3 U 10140624 4 18 2 2
3.0
1500 1524 7.3 455664 299 3 U 15211872 4 18 2 2
3.0
2000 2037 10.0 609526 299 3 U 20281248 4 18 2 2
3.0
2500 2523 10.8 756978 300 3 U 25350624 4 18 2 2
3.0
...test crashes with kernel BUG
The only parameter changed between these two tests is the version of XFS.
Any ideas?
Erik
|