xfs
[Top] [All Lists]

Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: [BUG report]xfs_btree_make_block_unfull generated an OOPS
From: hank peng <pengxihan@xxxxxxxxx>
Date: Tue, 15 Dec 2009 08:49:37 +0800
Cc: xfs-oss <xfs@xxxxxxxxxxx>
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=tmMmxJ2sMlJJJlJZ4VmcQu1LjvY3BMY5ATeE7QowHtU=; b=xNBYlDNJoViuYyiHSckjlidHbxFGmof+YQs2JhAu4RgHysa0RddQO41MH+upzipMX0 maJPTCgan+lXCtDCsegggZWbBn+QnX4MRS7OWCBS1z0amKv9yAEl4gcxS2moyYOPhYkh mH4uLQAJutxqw00orHBqnh7eronuvX8tR3leI=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=jlJhaBe6Or9ZPvf/8iYkbvv//52KwHB3uDa7Ap6MuOOcnVUWe9Ducrlk9mdI/4HdJ6 2EHwUmPEF/WmLM4ZFjmteekL2x2zjuqOV2xckGkzzEzGlPDdI6K7w1qjvjGQY4rb5o1A MJ2JMLqLd7PtEA8eHGm5R/eo8P+xZP0c0XmU0=
In-reply-to: <4B26604B.3060901@xxxxxxxxxxx>
References: <389deec70912081758x5af751b8pe3189aee6cb98e97@xxxxxxxxxxxxxx> <4B1F1211.90607@xxxxxxxxxxx> <389deec70912081918v24ccc5abi90c8fc7546c741d7@xxxxxxxxxxxxxx> <4B1F18C4.3060704@xxxxxxxxxxx> <389deec70912082053v4310057dg479f6d4b6c4b46f7@xxxxxxxxxxxxxx> <4B1F31FD.3020705@xxxxxxxxxxx> <389deec70912082220pcb3b5d1q516ac197d31502c5@xxxxxxxxxxxxxx> <389deec70912082230g38987576pc48d7699f23844c5@xxxxxxxxxxxxxx> <389deec70912140119q40ed91cao62fe9c9ebdf13601@xxxxxxxxxxxxxx> <4B26604B.3060901@xxxxxxxxxxx>
Hi, Eric:
I add some code like this:
if (*stat) {
                printk("*stat = 0x%08x, oindex = %p, index = %p\n",
                                *stat, oindex, index);
                if (oindex == NULL || index == NULL) {
                        printk("BUG occured!\n");
                        printk("oindex = %p, index = %p\n", oindex, index);
                        BUG();
                }
                *oindex = *index = cur->bc_ptrs[level];
                return 0;
        }

And the same OOPS happened again but a little different, kernel messages are:

<snip>
*stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
*stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
*stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
*stat = 0x00000001, oindex = e87d7bf8, index = e87d7bfc
*stat = 0x00000001, oindex = 00000501, index = 22008424
Unable to handle kernel paging request for data at address 0x22008424
Faulting instruction address: 0xc019f568
Oops: Kernel access of bad area, sig: 11 [#1]
MPC85xx CDS
Modules linked in:
NIP: c019f568 LR: c019f54c CTR: c023f9f4
REGS: e87d7af0 TRAP: 0300   Not tainted  (2.6.31.6-svn40)
MSR: 00029000 <EE,ME,CE>  CR: 22008424  XER: 20000000
DEAR: 22008424, ESR: 00800000
TASK = efb03390[17279] 'SS_Server' THREAD: e87d6000
GPR00: 000001fd e87d7ba0 efb03390 0000003b 00031d91 ffffffff c023cfa4 00031d91
GPR08: c04a7c40 e84511c8 00031d91 00004000 20008482 1016d410 3fff5400 100a0000
GPR16: 100d0408 00000000 00000000 e8fa3558 c019d0ac 00029000 e87d7c5c c01876f0
GPR24: c019d088 00000000 22008424 00000000 00000501 e87d7c58 00000000 e84511c8
NIP [c019f568] xfs_btree_make_block_unfull+0xe4/0x1f4
LR [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4
Call Trace:
[e87d7ba0] [c019f54c] xfs_btree_make_block_unfull+0xc8/0x1f4 (unreliable)
[e87d7be0] [c019f9ec] xfs_btree_insrec+0x374/0x4b0
[e87d7c50] [c019fba4] xfs_btree_insert+0x7c/0x1c0
[e87d7cb0] [c01866ac] xfs_free_ag_extent+0x408/0x810
[e87d7d20] [c0187188] xfs_free_extent+0xdc/0x104
[e87d7db0] [c018fe70] xfs_bmap_finish+0x154/0x1a0
[e87d7de0] [c01b6998] xfs_itruncate_finish+0x254/0x3b8
[e87d7e60] [c01d0ea0] xfs_free_eofblocks+0x254/0x29c
[e87d7ee0] [c01da70c] xfs_file_release+0x14/0x28
[e87d7ef0] [c00957dc] __fput+0xe8/0x1dc
[e87d7f10] [c00920d8] filp_close+0x70/0xb0
[e87d7f30] [c00921ac] sys_close+0x94/0xc0
[e87d7f40] [c000f7cc] ret_from_syscall+0x0/0x3c
Instruction dump:
7f85e378 3863ed7c 7f46d378 4cc63182 4be97ea1 2f9c0000 419e00f8 2f9a0000
419e00f0 57c9103a 7d29fa14 80090050 <901a0000> 901c0000 4bffff88 3b810010
---[ end trace f245b6a670339d8f ]---
</snip>

As you see, after printing "*stat = 0x00000001, oindex = 00000501,
index = 22008424", OOPS happened.
Although my BUG() was not invoked, it did access bad area.



2009/12/14 Eric Sandeen <sandeen@xxxxxxxxxxx>:
> hank peng wrote:
>> Hi,Eric:
>> I think I have found the reason to this problem, but I need you a little 
>> help.
>> We have tested it again, and the same OOPS occured again:
>
> Ok, let's keep this on the list please ...
>
>> Unable to handle kernel paging request for data at address 0x00000000
>> Faulting instruction address: 0xc019f4b8
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> MPC85xx CDS
>> Modules linked in:
>> NIP: c019f4b8 LR: c019f490 CTR: 00000000
>> REGS: ef965af0 TRAP: 0300   Not tainted  (2.6.31.6-svn40)
>> MSR: 00029000 <EE,ME,CE>  CR: 22008284  XER: 00000000
>> DEAR: 00000000, ESR: 00800000
>> TASK = e8a56580[3450] 'SS_Server' THREAD: ef964000
>> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 
>> 00000001
>> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 
>> 100a0000
>> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c 
>> c0187660
>> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 
>> e8fa1a18
>> NIP [c019f4b8] xfs_btree_make_block_unfull+0xc4/0x1b0
>> LR [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0
>> Call Trace:
>> [ef965ba0] [c019f490] xfs_btree_make_block_unfull+0x9c/0x1b0 (unreliable)
>> [ef965be0] [c019f918] xfs_btree_insrec+0x374/0x4b0
>> [ef965c50] [c019fad0] xfs_btree_insert+0x7c/0x1c0
>> [ef965cb0] [c018661c] xfs_free_ag_extent+0x408/0x810
>> [ef965d20] [c01870f8] xfs_free_extent+0xdc/0x104
>> [ef965db0] [c018fde0] xfs_bmap_finish+0x154/0x1a0
>> [ef965de0] [c01b68c4] xfs_itruncate_finish+0x254/0x3b8
>> [ef965e60] [c01d0dcc] xfs_free_eofblocks+0x254/0x29c
>> [ef965ee0] [c01da638] xfs_file_release+0x14/0x28
>> [ef965ef0] [c009574c] __fput+0xe8/0x1dc
>> [ef965f10] [c0092048] filp_close+0x70/0xb0
>> [ef965f30] [c009211c] sys_close+0x94/0xc0
>> [ef965f40] [c000f784] ret_from_syscall+0x0/0x3c
>> Instruction dump:
>> 7fa5eb78 4bffdf59 7c7c1b79 40a2ffd4 801d0000 2f800000 419e0064 57c9103a
>> 7f83e378 7d29fa14 80090050 90170000 <90190000> 80010044 bae1001c 38210040
>> ---[ end trace 356726176eeecd9c ]---
>> Oops: Exception in kernel mode, sig: 4 [#2]
>> MPC85xx CDS
>> Modules linked in:
>> NIP: c0187660 LR: c019b26c CTR: c0187660
>> REGS: d42076a0 TRAP: 0700   Tainted: G      D     (2.6.31.6-svn40)
>> MSR: 00029000 <EE,ME,CE>  CR: 22222082  XER: 00000000
>> TASK = e08a6ee0[8533] 'pdflush' THREAD: d4206000
>> GPR00: 00000004 d4207750 e08a6ee0 d42b1098 00000001 00000001 e8e97d80 
>> 00000003
>> GPR08: c2c65300 c0187660 41425443 41425443 00001000 1001a1c4 c01842f8 
>> 00000001
>> GPR16: d4207880 d42077e0 00000002 d42077d8 d42077e0 d42077e8 d42b10ec 
>> 00000001
>> GPR24: c0486be0 00000000 d42b1098 09c40000 c019b4f0 00000011 e88bf000 
>> d4207750
>> NIP [c0187660] xfs_allocbt_get_maxrecs+0x0/0x20
>> LR [c019b26c] xfs_btree_check_sblock+0xb0/0xf8
>> Call Trace:
>> [d4207770] [c019b4f0] xfs_btree_read_buf_block+0x8c/0xb8
>> [d42077a0] [c019b5a8] xfs_btree_lookup_get_block+0x8c/0xfc
>> [d42077d0] [c019c638] xfs_btree_lookup+0x124/0x3fc
>> [d4207850] [c01842f8] xfs_alloc_lookup_ge+0x20/0x30
>> [d4207860] [c0185828] xfs_alloc_ag_vextent_near+0x60/0xa4c
>> [d42078e0] [c0186af4] xfs_alloc_ag_vextent+0xd0/0x168
>> [d4207900] [c01873f0] xfs_alloc_vextent+0x2d0/0x524
>> [d4207940] [c01940fc] xfs_bmap_btalloc+0x274/0xa60
>> [d4207a00] [c01988bc] xfs_bmapi+0xb30/0x10dc
>> [d4207b40] [c01bb190] xfs_iomap_write_allocate+0x11c/0x450
>> [d4207c00] [c01bc2e8] xfs_iomap+0x320/0x35c
>> [d4207c80] [c01d5d5c] xfs_map_blocks+0x2c/0x40
>> [d4207ca0] [c01d6dc0] xfs_page_state_convert+0x2e8/0x744
>> [d4207d60] [c01d7384] xfs_vm_writepage+0x7c/0x128
>> [d4207d90] [c006d740] __writepage+0x24/0x80
>> [d4207da0] [c006db44] write_cache_pages+0x1e4/0x3a0
>> [d4207e50] [c01d5e14] xfs_vm_writepages+0x24/0x34
>> [d4207e60] [c006dd70] do_writepages+0x48/0x7c
>> [d4207e70] [c00b2120] writeback_single_inode+0xf8/0x2e4
>> [d4207ec0] [c00b2788] generic_sync_sb_inodes+0x280/0x398
>> [d4207ef0] [c00b295c] writeback_inodes+0xb8/0xd4
>> [d4207f10] [c006ece0] wb_kupdate+0xd4/0x154
>> [d4207f70] [c006f3bc] pdflush+0xd4/0x1c4
>> [d4207fc0] [c004c750] kthread+0x78/0x7c
>> <...>
>>
>>
>> There were another OOPS which followed the first one.
>
> After the first oops I think the rest is not interesting, things
> are in bad shape by now.
>
>> Please note that
>> in the second OOPS, a SIGILL has been invoked and address of illegal
>> instrucion is 0xc0187660.
>> In the first OOPS, look at the following registers:
>>
>> GPR00: 000001fd ef965ba0 e8a56580 00000000 00000000 00000001 00000001 
>> 00000001
>> GPR08: e8fa10e8 e8fa1a18 e8fa10f0 000001fd 22008222 1016d410 3fff5400 
>> 100a0000
>> GPR16: 100d2408 00000000 00000000 d42b12f8 c019d01c 00029000 ef965c5c 
>> c0187660
>> GPR24: c019cff8 00000000 22008224 ef965c08 00000000 ef965c58 00000000 
>> e8fa1a18
>>
>> I noticed that the value of r23 is also 0xc0187660. I have a little
>> powerpc assembly code knowledge, if I am not wrong,
>> *oindex = *index = cur->bc_ptrs[level];' in fs/xfs/xfs_btree.c was
>> built into the following asm code which I send it to you ealier:
>> 80 09 00 50     lwz     r0,80(r9)
>> 90 17 00 00     stw     r0,0(r23)
>> 90 19 00 00     stw     r0,0(r25)              <OOPs occured here>
>>
>> So, r23 should have pointed to address of index and never had a chace
>> to point to a code adress, but it did. What's worse, the code at
>> 0xc0187660 had been changed and the second OOPS happened imediately.
>>
>> Could you correct my analysis if I am wrong?
>> In addition, I think the problem may be caused by stack overflow, what
>> is your comments?
>>
>>
> Perhaps, but if this is the 2nd oops I think it is not worth investigating;
> we need to figure out why the first one happened, and from that stack trace
> I don't think you are close to overflowing...
>
> -eric
>
>>
>> 2009/12/9 hank peng <pengxihan@xxxxxxxxx>:
>>> 2009/12/9 hank peng <pengxihan@xxxxxxxxx>:
>>>> 2009/12/9 Eric Sandeen <sandeen@xxxxxxxxxxx>:
>>>>> hank peng wrote:
>>>>>> 2009/12/9 Eric Sandeen <sandeen@xxxxxxxxxxx>:
>>>>>>> hank peng wrote:
>>>>>>>
>>>>>>>> Thanks for your replay.
>>>>>>>>
>>>>>>>> I made this conclusion from assembly code, correct me if I am wrong.
>>>>>>>> #powerpc-linux-gnuspe-objdump vmlinux | less
>>>>>>>> <snip>
>>>>>>> (off list; if this works maybe you can reply on-list?)
>>>>>>>
>>>>>>> Could you use gdb to look?  Maybe:
>>>>>>>
>>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>>>>
>>>>>> I use gdb on my PC and get this:
>>>>>>
>>>>>> [root@localhost linux-2.6.31.6]# gdb vmlinux
>>>>>> GNU gdb Red Hat Linux (6.5-37.el5rh)
>>>>>> Copyright (C) 2006 Free Software Foundation, Inc.
>>>>>> GDB is free software, covered by the GNU General Public License, and you 
>>>>>> are
>>>>>> welcome to change it and/or distribute copies of it under certain 
>>>>>> conditions.
>>>>>> Type "show copying" to see the conditions.
>>>>>> There is absolutely no warranty for GDB.  Type "show warranty" for 
>>>>>> details.
>>>>>> This GDB was configured as "i386-redhat-linux-gnu"...Using host
>>>>>> libthread_db library "/lib/libthread_db.so.1".
>>>>>>
>>>>>> (gdb) list *xfs_btree_make_block_unfull+0xc4
>>>>>> No source file for address 0xc019ea28.
>>>>>> (gdb)
>>>>>>
>>>>>>> -Eric
>>>>> so I guess it is not built with debugging symbols perhaps?
>>>>>
>>>>> Try rebuilding it with CONFIG_DEBUG_INFO on maybe?
>>>>>
>>>> yes, you are right, now I get the result:
>>>> (gdb) l *xfs_btree_make_block_unfull+0xc4
>>>> 0xc019ea30 is in xfs_btree_make_block_unfull (fs/xfs/xfs_btree.c:2643).
>>>> 2638            error = xfs_btree_lshift(cur, level, stat);
>>>> 2639            if (error)
>>>> 2640                    return error;
>>>> 2641
>>>> 2642            if (*stat) {
>>>> 2643                    *oindex = *index = cur->bc_ptrs[level];
>>>> 2644                    return 0;
>>>> 2645            }
>>>> 2646
>>>> 2647            /*
>>>>
>>>> It indeed points to "*oindex = *index = cur->bc_ptrs[level];"
>>>>
>>> Very strange, as you said, xfs_btree_insrec passes address local
>>> variable to xfs_btree_make_block_unfull, so it is impossible for
>>> oindex to be NULL.
>>> Do you think it may be an memory corrupt?
>>>>> -Eric
>>>>>
>>>>
>>>>
>>>> --
>>>> The simplest is not all best but the best is surely the simplest!
>>>>
>>>
>>>
>>> --
>>> The simplest is not all best but the best is surely the simplest!
>>>
>>
>>
>>
>
>



-- 
The simplest is not all best but the best is surely the simplest!

<Prev in Thread] Current Thread [Next in Thread>