XFS Developers,
Ok here is what I have done:
I have a tree that already had XFS 1.1 (with a bunch of other stuff)
included. I hand applied the XFS 1.2 on top of XFS 1.1 patch and
everything seems to work fine, except when I run bonnie++ and under load
(Writing intelligently operation), I receive the following Oops:
Mar 24 15:20:13 192 kernel: invalid operand: 0000
Mar 24 15:20:13 192 kernel: CPU: 0
Mar 24 15:20:13 192 kernel: EIP: 0010:[<c0128331>] Not tainted
Mar 24 15:20:13 192 kernel: EFLAGS: 00010286
Mar 24 15:20:13 192 kernel: eax: 00000037 ebx: 000000f8 ecx:
00000008 edx: 00000000
Mar 24 15:20:13 192 kernel: esi: c30000bc edi: 00000000 ebp:
c2a517c0 esp: f7465e80
Mar 24 15:20:13 192 kernel: ds: 0018 es: 0018 ss: 0018
Mar 24 15:20:13 192 kernel: Process bonnie++ (pid: 114, stackpage=f7465000)
Mar 24 15:20:13 192 kernel: Stack: c0327060 000000f8 00018541 00000000
f748ee80 c012abf6 f7465ec0 00001000
Mar 24 15:20:13 192 kernel: 00000000 00001000 00001000 00001000
1650f000 00000000 f740e320 f740e3d4
Mar 24 15:20:13 192 kernel: 00000000 3e7f849d 000e6b32 3e7f849d
3852bb50 0640d230 f7465f64 1650e000
Mar 24 15:20:13 192 kernel: Call Trace: [<c012abf6>] [<c024ad64>]
[<c02467a6>] [<c0135836>] [<c0116f9b>]
Mar 24 15:20:13 192 kernel: [<c0106d03>]
Mar 24 15:20:13 192 kernel:
Mar 24 15:20:13 192 kernel: Code: 0f 0b 83 c4 0c 8d 46 04 39 46 04 74 12
5b 89 f0 31 c9 ba 03
Mar 24 15:20:13 192 kernel: invalid operand: 0000
Mar 24 15:20:13 192 kernel: CPU: 0
Mar 24 15:20:13 192 kernel: EIP: 0010:[<c0128331>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Mar 24 15:20:13 192 kernel: EFLAGS: 00010286
Mar 24 15:20:13 192 kernel: eax: 00000037 ebx: 000000f8 ecx:
00000008 edx: 00000000
Mar 24 15:20:13 192 kernel: esi: c30000bc edi: 00000000 ebp:
c2a517c0 esp: f7465e80
Mar 24 15:20:13 192 kernel: ds: 0018 es: 0018 ss: 0018
Mar 24 15:20:13 192 kernel: Process bonnie++ (pid: 114, stackpage=f7465000)
Mar 24 15:20:13 192 kernel: Stack: c0327060 000000f8 00018541 00000000
f748ee80 c012abf6 f7465ec0 00001000
Mar 24 15:20:13 192 kernel: 00000000 00001000 00001000 00001000
1650f000 00000000 f740e320 f740e3d4
Mar 24 15:20:13 192 kernel: 00000000 3e7f849d 000e6b32 3e7f849d
3852bb50 0640d230 f7465f64 1650e000
Mar 24 15:20:13 192 kernel: Call Trace: [<c012abf6>] [<c024ad64>]
[<c02467a6>] [<c0135836>] [<c0116f9b>]
Mar 24 15:20:13 192 kernel: [<c0106d03>]
Mar 24 15:20:13 192 kernel: Code: 0f 0b 83 c4 0c 8d 46 04 39 46 04 74 12
5b 89 f0 31 c9 ba 03
>>EIP; c0128331 <unlock_page+61/90> <=====
Trace; c012abf6 <generic_file_write_nolock+556/720>
Trace; c024ad64 <xfs_write+384/580>
Trace; c02467a6 <linvfs_write+e6/120>
Trace; c0135836 <sys_write+96/f0>
Trace; c0116f9b <sys_gettimeofday+1b/90>
Trace; c0106d03 <system_call+33/38>
Code; c0128331 <unlock_page+61/90>
00000000 <_EIP>:
Code; c0128331 <unlock_page+61/90> <=====
0: 0f 0b ud2a <=====
Code; c0128333 <unlock_page+63/90>
2: 83 c4 0c add $0xc,%esp
Code; c0128336 <unlock_page+66/90>
5: 8d 46 04 lea 0x4(%esi),%eax
Code; c0128339 <unlock_page+69/90>
8: 39 46 04 cmp %eax,0x4(%esi)
Code; c012833c <unlock_page+6c/90>
b: 74 12 je 1f <_EIP+0x1f> c0128350
<unlock_page+80/90>
Code; c012833e <unlock_page+6e/90>
d: 5b pop %ebx
Code; c012833f <unlock_page+6f/90>
e: 89 f0 mov %esi,%eax
Code; c0128341 <unlock_page+71/90>
10: 31 c9 xor %ecx,%ecx
Code; c0128343 <unlock_page+73/90>
12: ba 03 00 00 00 mov $0x3,%edx
I tracked down the oops to a BUG() call in unlock_page which is
triggered when a page that is already unlocked is attempted to be
unlocked again (this just wouldn't work which is why there is a bug...).
I read through the code in generic_file_write_nolock and it looks to me
as if the page is being locked, and then unlocked consistently. If I
put syslog() calls in the bonnie++ output to write the chunk count,
(slowing down the I/Os) there is no oops. Bonnie++ consistently Oops on
the 45703 write.
I checked every function in the Oops call path and all the functions are
identical to taking linux 2.4.19 and applying XFS 1.2 except for 2.4.19
isms. I though perhaps some changes to the mm layer in 2.4.19 would be
the cause, so I modified my 2.4.18 to match the 2.4.19 implementation +
xfs core patch with same results. I am running on Linux 2.4.18 with
significant modifications (although most of the memory manager and
filesystem layer are the same as 2.4.18). I am running UP on a custom
Pentium4 board/processor (well tested before this to be operational). I
have run the same bonnie++ benchmark on other filesystems without incident.
I was thinking I could take the xfs_write routine from 1.1 and meld it
into the 1.2 source codes, but this looks to be very painful, and hey, I
want 1.2 anyway ;)
Anyone know why I am receiving this Oops?
Thanks
-steve
|