xfs
[Top] [All Lists]

Need some help with cause of Oops in XFS 1.2

To: linux-xfs@xxxxxxxxxxx
Subject: Need some help with cause of Oops in XFS 1.2
From: Steven Dake <sdake@xxxxxxxxxx>
Date: Tue, 25 Mar 2003 11:23:45 -0700
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021130
XFS Developers,

Ok here is what I have done:

I have a tree that already had XFS 1.1 (with a bunch of other stuff) included. I hand applied the XFS 1.2 on top of XFS 1.1 patch and everything seems to work fine, except when I run bonnie++ and under load (Writing intelligently operation), I receive the following Oops:

Mar 24 15:20:13 192 kernel: invalid operand: 0000
Mar 24 15:20:13 192 kernel: CPU: 0
Mar 24 15:20:13 192 kernel: EIP: 0010:[<c0128331>] Not tainted
Mar 24 15:20:13 192 kernel: EFLAGS: 00010286
Mar 24 15:20:13 192 kernel: eax: 00000037 ebx: 000000f8 ecx: 00000008 edx: 00000000
Mar 24 15:20:13 192 kernel: esi: c30000bc edi: 00000000 ebp: c2a517c0 esp: f7465e80
Mar 24 15:20:13 192 kernel: ds: 0018 es: 0018 ss: 0018
Mar 24 15:20:13 192 kernel: Process bonnie++ (pid: 114, stackpage=f7465000)
Mar 24 15:20:13 192 kernel: Stack: c0327060 000000f8 00018541 00000000 f748ee80 c012abf6 f7465ec0 00001000
Mar 24 15:20:13 192 kernel: 00000000 00001000 00001000 00001000 1650f000 00000000 f740e320 f740e3d4
Mar 24 15:20:13 192 kernel: 00000000 3e7f849d 000e6b32 3e7f849d 3852bb50 0640d230 f7465f64 1650e000
Mar 24 15:20:13 192 kernel: Call Trace: [<c012abf6>] [<c024ad64>] [<c02467a6>] [<c0135836>] [<c0116f9b>]
Mar 24 15:20:13 192 kernel: [<c0106d03>]
Mar 24 15:20:13 192 kernel:
Mar 24 15:20:13 192 kernel: Code: 0f 0b 83 c4 0c 8d 46 04 39 46 04 74 12 5b 89 f0 31 c9 ba 03
Mar 24 15:20:13 192 kernel: invalid operand: 0000
Mar 24 15:20:13 192 kernel: CPU: 0
Mar 24 15:20:13 192 kernel: EIP: 0010:[<c0128331>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
Mar 24 15:20:13 192 kernel: EFLAGS: 00010286
Mar 24 15:20:13 192 kernel: eax: 00000037 ebx: 000000f8 ecx: 00000008 edx: 00000000
Mar 24 15:20:13 192 kernel: esi: c30000bc edi: 00000000 ebp: c2a517c0 esp: f7465e80
Mar 24 15:20:13 192 kernel: ds: 0018 es: 0018 ss: 0018
Mar 24 15:20:13 192 kernel: Process bonnie++ (pid: 114, stackpage=f7465000)
Mar 24 15:20:13 192 kernel: Stack: c0327060 000000f8 00018541 00000000 f748ee80 c012abf6 f7465ec0 00001000
Mar 24 15:20:13 192 kernel: 00000000 00001000 00001000 00001000 1650f000 00000000 f740e320 f740e3d4
Mar 24 15:20:13 192 kernel: 00000000 3e7f849d 000e6b32 3e7f849d 3852bb50 0640d230 f7465f64 1650e000
Mar 24 15:20:13 192 kernel: Call Trace: [<c012abf6>] [<c024ad64>] [<c02467a6>] [<c0135836>] [<c0116f9b>]
Mar 24 15:20:13 192 kernel: [<c0106d03>]
Mar 24 15:20:13 192 kernel: Code: 0f 0b 83 c4 0c 8d 46 04 39 46 04 74 12 5b 89 f0 31 c9 ba 03


>>EIP; c0128331 <unlock_page+61/90> <=====
Trace; c012abf6 <generic_file_write_nolock+556/720>
Trace; c024ad64 <xfs_write+384/580>
Trace; c02467a6 <linvfs_write+e6/120>
Trace; c0135836 <sys_write+96/f0>
Trace; c0116f9b <sys_gettimeofday+1b/90>
Trace; c0106d03 <system_call+33/38>
Code; c0128331 <unlock_page+61/90>
00000000 <_EIP>:
Code; c0128331 <unlock_page+61/90> <=====
0: 0f 0b ud2a <=====
Code; c0128333 <unlock_page+63/90>
2: 83 c4 0c add $0xc,%esp
Code; c0128336 <unlock_page+66/90>
5: 8d 46 04 lea 0x4(%esi),%eax
Code; c0128339 <unlock_page+69/90>
8: 39 46 04 cmp %eax,0x4(%esi)
Code; c012833c <unlock_page+6c/90>
b: 74 12 je 1f <_EIP+0x1f> c0128350 <unlock_page+80/90>
Code; c012833e <unlock_page+6e/90>
d: 5b pop %ebx
Code; c012833f <unlock_page+6f/90>
e: 89 f0 mov %esi,%eax
Code; c0128341 <unlock_page+71/90>
10: 31 c9 xor %ecx,%ecx
Code; c0128343 <unlock_page+73/90>
12: ba 03 00 00 00 mov $0x3,%edx



I tracked down the oops to a BUG() call in unlock_page which is triggered when a page that is already unlocked is attempted to be unlocked again (this just wouldn't work which is why there is a bug...). I read through the code in generic_file_write_nolock and it looks to me as if the page is being locked, and then unlocked consistently. If I put syslog() calls in the bonnie++ output to write the chunk count, (slowing down the I/Os) there is no oops. Bonnie++ consistently Oops on the 45703 write.


I checked every function in the Oops call path and all the functions are identical to taking linux 2.4.19 and applying XFS 1.2 except for 2.4.19 isms. I though perhaps some changes to the mm layer in 2.4.19 would be the cause, so I modified my 2.4.18 to match the 2.4.19 implementation + xfs core patch with same results. I am running on Linux 2.4.18 with significant modifications (although most of the memory manager and filesystem layer are the same as 2.4.18). I am running UP on a custom Pentium4 board/processor (well tested before this to be operational). I have run the same bonnie++ benchmark on other filesystems without incident.

I was thinking I could take the xfs_write routine from 1.1 and meld it into the 1.2 source codes, but this looks to be very painful, and hey, I want 1.2 anyway ;)

Anyone know why I am receiving this Oops?

Thanks
-steve



<Prev in Thread] Current Thread [Next in Thread>