xfs
[Top] [All Lists]

Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
From: hank peng <pengxihan@xxxxxxxxx>
Date: Tue, 15 Dec 2009 08:58:15 +0800
Cc: xfs-oss <xfs@xxxxxxxxxxx>
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=WG5hrnp5+bPAgs8B+OQBZfvxdQBvlz22kgoLFb0bejc=; b=wNLfBk1Nlk+iHQcZ9CZ4fviq+P2SdUXdAhWqg6eDnAeCDBogHxEUhr2z/IsR5QesAA i+xzsjFBRtsDhOLu0HsnGovapYAEp3ajB2ItQiPJzMQH6WELx9GdRv9qzFsNPmB2DZyA UlxeKYy93HF9NWPkFqJT9ws5gejMs9GCw2WsM=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=XcXHf6savynrIiwnvhCxQtSha7Dk3p+pWibUXP01zG7v++Qk06D/C8rcLXlEt1saKi nHMsL/ZqoYzyfMn4y5UbgERl2FfHh9RGSqdPYWF01ttsdLIJlrIjpoD6wfS2kCJUba1V 6fSaMVH1UX0Jl71FMZUMcx4QzGrnT2fKEueL4=
In-reply-to: <389deec70912141649g767a1540hdeae66707c4c68fd@xxxxxxxxxxxxxx>
References: <389deec70912081758x5af751b8pe3189aee6cb98e97@xxxxxxxxxxxxxx> <389deec70912081918v24ccc5abi90c8fc7546c741d7@xxxxxxxxxxxxxx> <4B1F18C4.3060704@xxxxxxxxxxx> <389deec70912082053v4310057dg479f6d4b6c4b46f7@xxxxxxxxxxxxxx> <4B1F31FD.3020705@xxxxxxxxxxx> <389deec70912082220pcb3b5d1q516ac197d31502c5@xxxxxxxxxxxxxx> <389deec70912082230g38987576pc48d7699f23844c5@xxxxxxxxxxxxxx> <389deec70912140119q40ed91cao62fe9c9ebdf13601@xxxxxxxxxxxxxx> <4B26604B.3060901@xxxxxxxxxxx> <389deec70912141649g767a1540hdeae66707c4c68fd@xxxxxxxxxxxxxx>
2009/12/15 hank peng <pengxihan@xxxxxxxxx>:
> Hi, Eric:
> I add some code like this:
> if (*stat) {
>                printk("*stat = 0x%08x, oindex = %p, index = %p\n",
>                                *stat, oindex, index);
>                if (oindex == NULL || index == NULL) {
>                        printk("BUG occured!\n");
>                        printk("oindex = %p, index = %p\n", oindex, index);
>                        BUG();
>                }
>                *oindex = *index = cur->bc_ptrs[level];
>                return 0;
>        }
>
> And the same OOPS happened again but a little different, kernel messages are:
>
> <snip>
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
> *stat = 0x00000001, oindex = 00000501, index = 22008424
> Unable to handle kernel paging request for data at address 0x22008424
> Faulting instruction address: 0xc019f568
> Oops: Kernel access of bad area, sig: 11 [#1]
> MPC85xx CDS
> Modules linked in:
> NIP: c019f568 LR: c019f54c CTR: c023f9f4
> REGS: e87d7af0 TRAP: 0300   Not tainted  (2.6.31.6-svn40)
> MSR: 00029000 <EE,ME,CE>  CR: 22008424  XER: 20000000
> DEAR: 22008424, ESR: 00800000
> TASK = efb03390[17279] 'SS_Server' THREAD: e87d6000
> GPR00: 000001fd e87d7ba0 efb03390 0000003b 00031d91 ffffffff c023cfa4 00031d91
> GPR08: c04a7c40 e84511c8 00031d91 00004000 20008482 1016d410 3fff5400 100a0000
> GPR16: 100d0408 00000000 00000000 e8fa3558 c019d0ac 00029000 e87d7c5c c01876f0
> GPR24: c019d088 00000000 22008424 00000000 00000501 e87d7c58 00000000 e84511c8
> NIP [c019f568] xfs_btree_make_block_unfull+0xe4/0x1f4
> LR [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4
> Call Trace:
> [e87d7ba0] [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4 (unreliable)
> [e87d7be0] [c019f9ec] xfs_btree_insrec+0x374/0x4b0
> [e87d7c50] [c019fba4] xfs_btree_insert+0x7c/0x1c0
> [e87d7cb0] [c01866ac] xfs_free_ag_extent+0x408/0x810
> [e87d7d20] [c0187188] xfs_free_extent+0xdc/0x104
> [e87d7db0] [c018fe70] xfs_bmap_finish+0x154/0x1a0
> [e87d7de0] [c01b6998] xfs_itruncate_finish+0x254/0x3b8
> [e87d7e60] [c01d0ea0] xfs_free_eofblocks+0x254/0x29c
> [e87d7ee0] [c01da70c] xfs_file_release+0x14/0x28
> [e87d7ef0] [c00957dc] __fput+0xe8/0x1dc
> [e87d7f10] [c00920d8] filp_close+0x70/0xb0
> [e87d7f30] [c00921ac] sys_close+0x94/0xc0
> [e87d7f40] [c000f7cc] ret_from_syscall+0x0/0x3c
> Instruction dump:
> 7f85e378 3863ed7c 7f46d378 4cc63182 4be97ea1 2f9c0000 419e00f8 2f9a0000
> 419e00f0 57c9103a 7d29fa14 80090050 <901a0000> 901c0000 4bffff88 3b810010
> ---[ end trace f245b6a670339d8f ]---
> </snip>
>
> As you see, after printing "*stat = 0x00000001, oindex = 00000501,
> index = 22008424", OOPS happened.
> Although my BUG() was not invoked, it did access bad area.
>
This is what gdb shows:

(gdb) list *xfs_btree_make_block_unfull+0xe4
0xc019f568 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2650).
2645                    if (oindex == NULL || index == NULL) {
2646                            printk("BUG occured!\n");
2647                            printk("oindex = %p, index = %p\n",
oindex, index);
2648                            BUG();
2649                    }
2650                    *oindex = *index = cur->bc_ptrs[level];
/* why alaways here????? */
2651                    return 0;
2652            }
2653
2654            /*
(gdb)
Why suddenly abnormal? memory corrupt? If so, why this OOPS always
occured at the same place?

>
>
> 2009/12/14 Eric Sandeen <sandeen@xxxxxxxxxxx>:
>> hank peng wrote:
>>> Hi,Eric:
>>> I think I have found the reason to this problem, but I need you a little 
>>> help.
>>> We have tested it again, and the same OOPS occured again:
>>
>> Ok, let's keep this on the list please ...
>>
>>> Unable to handle kernel paging request for data at address 0x00000000
>>> Faulting instruction address: 0xc019f4b8
>>> Oops: Kernel access of bad area, sig: 11 [#1]
>>> MPC85xx CDS
>>> Modules linked in:
>>> NIP: c019f4b8 LR: c019f490 CTR: 00000000
>>> REGS: ef965af0 TRAP: 0300   Not tainted  (2.6.31.6-svn40)
>>> MSR: 00029000 <EE,ME,CE>  CR: 22008284  XER: 00000000
>>> DEAR: 00000000, ESR: 00800000
>>> TASK = e8a56580[3450] 'SS_Server' THREAD: ef964000
>>> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 
>>> 00000001
>>> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 
>>> 100a0000
>>> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c 
>>> c0187660
>>> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 
>>> e8fa1a18
>>> NIP [c019f4b8] xfs_btree_make_block_unfull+0xc4/0x1b0
>>> LR [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0
>>> Call Trace:
>>> [ef965ba0] [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable)
>>> [ef965be0] [c019f918] xfs_btree_insrec+0x374/0x4b0
>>> [ef965c50] [c019fad0] xfs_btree_insert+0x7c/0x1c0
>>> [ef965cb0] [c018661c] xfs_free_ag_extent+0x408/0x810
>>> [ef965d20] [c01870f8] xfs_free_extent+0xdc/0x104
>>> [ef965db0] [c018fde0] xfs_bmap_finish+0x154/0x1a0
>>> [ef965de0] [c01b68c4] xfs_itruncate_finish+0x254/0x3b8
>>> [ef965e60] [c01d0dcc] xfs_free_eofblocks+0x254/0x29c
>>> [ef965ee0] [c01da638] xfs_file_release+0x14/0x28
>>> [ef965ef0] [c009574c] __fput+0xe8/0x1dc
>>> [ef965f10] [c0092048] filp_close+0x70/0xb0
>>> [ef965f30] [c009211c] sys_close+0x94/0xc0
>>> [ef965f40] [c000f784] ret_from_syscall+0x0/0x3c
>>> Instruction dump:
>>> 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a
>>> 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040
>>> ---[ end trace 356726176eeecd9c ]---
>>> Oops: Exception in kernel mode, sig: 4 [#2]
>>> MPC85xx CDS
>>> Modules linked in:
>>> NIP: c0187660 LR: c019b26c CTR: c0187660
>>> REGS: d42076a0 TRAP: 0700   Tainted: G      D     (2.6.31.6-svn40)
>>> MSR: 00029000 <EE,ME,CE>  CR: 22222082  XER: 00000000
>>> TASK = e08a6ee0[8533] 'pdflush' THREAD: d4206000
>>> GPR00: 00000004 d4207750 e08a6ee0 d42b1098 00000001 00000001 e8e97d80 
>>> 00000003
>>> GPR08: c2c65300 c0187660 41425443 41425443 00001000 1001a1c4 c01842f8 
>>> 00000001
>>> GPR16: d4207880 d42077e0 00000002 d42077d8 d42077e0 d42077e8 d42b10ec 
>>> 00000001
>>> GPR24: c0486be0 00000000 d42b1098 09c40000 c019b4f0 00000011 e88bf000 
>>> d4207750
>>> NIP [c0187660] xfs_allocbt_get_maxrecs+0x0/0x20
>>> LR [c019b26c] xfs_btree_check_sblock+0xb0/0xf8
>>> Call Trace:
>>> [d4207770] [c019b4f0] xfs_btree_read_buf_block+0x8c/0xb8
>>> [d42077a0] [c019b5a8] xfs_btree_lookup_get_block+0x8c/0xfc
>>> [d42077d0] [c019c638] xfs_btree_lookup+0x124/0x3fc
>>> [d4207850] [c01842f8] xfs_alloc_lookup_ge+0x20/0x30
>>> [d4207860] [c0185828] xfs_alloc_ag_vextent_near+0x60/0xa4c
>>> [d42078e0] [c0186af4] xfs_alloc_ag_vextent+0xd0/0x168
>>> [d4207900] [c01873f0] xfs_alloc_vextent+0x2d0/0x524
>>> [d4207940] [c01940fc] xfs_bmap_btalloc+0x274/0xa60
>>> [d4207a00] [c01988bc] xfs_bmapi+0xb30/0x10dc
>>> [d4207b40] [c01bb190] xfs_iomap_write_allocate+0x11c/0x450
>>> [d4207c00] [c01bc2e8] xfs_iomap+0x320/0x35c
>>> [d4207c80] [c01d5d5c] xfs_map_blocks+0x2c/0x40
>>> [d4207ca0] [c01d6dc0] xfs_page_state_convert+0x2e8/0x744
>>> [d4207d60] [c01d7384] xfs_vm_writepage+0x7c/0x128
>>> [d4207d90] [c006d740] __writepage+0x24/0x80
>>> [d4207da0] [c006db44] write_cache_pages+0x1e4/0x3a0
>>> [d4207e50] [c01d5e14] xfs_vm_writepages+0x24/0x34
>>> [d4207e60] [c006dd70] do_writepages+0x48/0x7c
>>> [d4207e70] [c00b2120] writeback_single_inode+0xf8/0x2e4
>>> [d4207ec0] [c00b2788] generic_sync_sb_inodes+0x280/0x398
>>> [d4207ef0] [c00b295c] writeback_inodes+0xb8/0xd4
>>> [d4207f10] [c006ece0] wb_kupdate+0xd4/0x154
>>> [d4207f70] [c006f3bc] pdflush+0xd4/0x1c4
>>> [d4207fc0] [c004c750] kthread+0x78/0x7c
>>> <...>
>>>
>>>
>>> There were another OOPS which followed the first one.
>>
>> After the first oops I think the rest is not interesting, things
>> are in bad shape by now.
>>
>>> Please note that
>>> in the second OOPS, a SIGILL has been invoked and address of illegal
>>> instrucion is 0xc0187660.
>>> In the first OOPS, look at the following registers:
>>>
>>> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 
>>> 00000001
>>> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 
>>> 100a0000
>>> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c 
>>> c0187660
>>> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 
>>> e8fa1a18
>>>
>>> I noticed that the value of r23 is also 0xc0187660. I have a little
>>> powerpc assembly code knowledge, if I am not wrong,
>>> *oindex = *index = cur->bc_ptrs[level];' in fs/xfs/xfs_btree.c was
>>> built into the following asm code which I send it to you ealier:
>>> 80 09 00 50     lwz     r0,80(r9)
>>> 90 17 00 00     stw     r0,0(r23)
>>> 90 19 00 00     stw     r0,0(r25)              <OOPs occured here>
>>>
>>> So, r23 should have pointed to address of index and never had a chace
>>> to point to a code adress, but it did. What's worse, the code at
>>> 0xc0187660 had been changed and the second OOPS happened imediately.
>>>
>>> Could you correct my analysis if I am wrong?
>>> In addition, I think the problem may be caused by stack overflow, what
>>> is your comments?
>>>
>>>
>> Perhaps, but if this is the 2nd oops I think it is not worth investigating;
>> we need to figure out why the first one happened, and from that stack trace
>> I don't think you are close to overflowing...
>>
>> -eric
>>
>>>
>>> 2009/12/9 hank peng <pengxihan@xxxxxxxxx>:
>>>> 2009/12/9 hank peng <pengxihan@xxxxxxxxx>:
>>>>> 2009/12/9 Eric Sandeen <sandeen@xxxxxxxxxxx>:
>>>>>> hank peng wrote:
>>>>>>> 2009/12/9 Eric Sandeen <sandeen@xxxxxxxxxxx>:
>>>>>>>> hank peng wrote:
>>>>>>>>
>>>>>>>>> Thanks for your replay.
>>>>>>>>>
>>>>>>>>> I made this conclusion from assembly code, correct me if I am wrong.
>>>>>>>>> #powerpc-linux-gnuspe-objdump vmlinux | less
>>>>>>>>> <snip>
>>>>>>>> (off list; if this works maybe you can reply on-list?)
>>>>>>>>
>>>>>>>> Could you use gdb to look?  Maybe:
>>>>>>>>
>>>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>>>>>
>>>>>>> I use gdb on my PC and get this:
>>>>>>>
>>>>>>> [root@localhost linux-2.6.31.6]# gdb vmlinux
>>>>>>> GNU gdb Red Hat Linux (6.5-37.el5rh)
>>>>>>> Copyright (C) 2006 Free Software Foundation, Inc.
>>>>>>> GDB is free software, covered by the GNU General Public License, and 
>>>>>>> you are
>>>>>>> welcome to change it and/or distribute copies of it under certain 
>>>>>>> conditions.
>>>>>>> Type "show copying" to see the conditions.
>>>>>>> There is absolutely no warranty for GDB.  Type "show warranty" for 
>>>>>>> details.
>>>>>>> This GDB was configured as "i386-redhat-linux-gnu"...Using host
>>>>>>> libthread_db library "/lib/libthread_db.so.1".
>>>>>>>
>>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>>>> No source file for address 0xc019ea28.
>>>>>>> (gdb)
>>>>>>>
>>>>>>>> -Eric
>>>>>> so I guess it is not built with debugging symbols perhaps?
>>>>>>
>>>>>> Try rebuilding it with CONFIG_DEBUG_INFO on maybe?
>>>>>>
>>>>> yes, you are right, now I get the result:
>>>>> (gdb) l *xfs_btree_make_block_unfull+0xc4
>>>>> 0xc019ea30 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2643).
>>>>> 2638            error = xfs_btree_lshift(cur, level, stat);
>>>>> 2639            if (error)
>>>>> 2640                    return error;
>>>>> 2641
>>>>> 2642            if (*stat) {
>>>>> 2643                    *oindex = *index = cur->bc_ptrs[level];
>>>>> 2644                    return 0;
>>>>> 2645            }
>>>>> 2646
>>>>> 2647            /*
>>>>>
>>>>> It indeed points to "*oindex = *index = cur->bc_ptrs[level];"
>>>>>
>>>> Very strange, as you said, xfs_btree_insrec passes address local
>>>> variable to xfs_btree_make_block_unfull, so it is impossible for
>>>> oindex to be NULL.
>>>> Do you think it may be an memory corrupt?
>>>>>> -Eric
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> The simplest is not all best but the best is surely the simplest!
>>>>>
>>>>
>>>>
>>>> --
>>>> The simplest is not all best but the best is surely the simplest!
>>>>
>>>
>>>
>>>
>>
>>
>
>
>
> --
> The simplest is not all best but the best is surely the simplest!
>



-- 
The simplest is not all best but the best is surely the simplest!

<Prev in Thread] Current Thread [Next in Thread>