xfs
[Top] [All Lists]

Re: Kernel crash with 2.6.29 + nfs + xfs (radix-tree)

To: Alex Samad <alex@xxxxxxxxxxxx>
Subject: Re: Kernel crash with 2.6.29 + nfs + xfs (radix-tree)
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 20 May 2009 19:05:58 +1000
Cc: linux-kernel@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx
In-reply-to: <20090520003745.GA27491@xxxxxxxxxxxx>
References: <20090520003745.GA27491@xxxxxxxxxxxx>
User-agent: Mutt/1.5.18 (2008-05-17)
On Wed, May 20, 2009 at 10:37:45AM +1000, Alex Samad wrote:
> Hi
> 
> I have been quit a lot of crashes on my debian amd64 box in the 2.6.29
> series of kernel. Seems for me to be when the system is under load and
> there is network action -> nfsd -> xfs.

Perhaps a use after free or a reference counting problem. Thanks for
reporting it.

> May  5 19:45:38 x kernel: ------------[ cut here ]------------
> May  5 19:45:39 x kernel: kernel BUG at lib/radix-tree.c:485!
> May  5 19:45:39 x kernel: invalid opcode: 0000 [#1] SMP
> May  5 19:45:39 x kernel: last sysfs file:
> /sys/block/sdc/queue/nr_requests
> May  5 19:45:39 x kernel: CPU 0
> May  5 19:45:39 x kernel: Pid: 335, comm: kswapd0 Not tainted 2.6.29.2 #1 
> S2895
> May  5 19:45:39 x kernel: RIP: 0010:[<ffffffff803916e0>] [<ffffffff803916e0>] 
> radix_tree_tag_set+0x86/0xc6
> May  5 19:45:39 x kernel: RSP: 0018:ffff88016e2d1c88  EFLAGS: 00010246
> May  5 19:45:39 x kernel: RAX: 0000000000000004 RBX: 0000000000000000 RCX: 
> 0000000000000000
> May  5 19:45:39 x kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
> ffff88016a822b58
> May  5 19:45:39 x kernel: RBP: 0000000000000004 R08: 0000000000000000 R09: 
> 8000000000000000
> May  5 19:45:39 x kernel: R10: ffffa5a5a5a5a5a5 R11: ffffffff8037541d R12: 
> 0000000000000001
> May  5 19:45:39 x kernel: R13: 0000000000000000 R14: ffff88016d1bc310 R15: 
> 0000000000000000
> May  5 19:45:39 x kernel: FS:  00007fea1903f6e0(0000) 
> GS:ffffffff80759040(0000) knlGS:0000000000000000
> May  5 19:45:39 x kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> May  5 19:45:39 x kernel: CR2: 00007fd2df5ae8e0 CR3: 000000016bad0000 CR4: 
> 00000000000006e0
> May  5 19:45:39 x kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> May  5 19:45:39 x kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
> May  5 19:45:39 x kernel: Process kswapd0 (pid: 335, threadinfo 
> ffff88016e2d0000, task ffff88016f23eac0)
> May  5 19:45:39 x kernel: Stack:
> May  5 19:45:39 x kernel:  000000000069d804 0000000000000000 ffff88016d1bc2d0 
> ffff88000a8b7400
> May  5 19:45:39 x kernel:  ffff88000a8b7400 ffff88016df30000 ffff88000a8b74f8 
> ffff88016d1bc30c
> May  5 19:45:39 x kernel:  ffffffff80376b02 ffff88000a8b7580 0000000000000024 
> ffff88016e2d1d60
> May  5 19:45:39 x kernel: Call Trace:
> May  5 19:45:39 x kernel:  [<ffffffff80376b02>] ? 
> xfs_inode_set_reclaim_tag+0x69/0x89
> May  5 19:45:39 x kernel:  [<ffffffff8036972f>] ? xfs_reclaim+0x99/0x9f
> May  5 19:45:39 x kernel:  [<ffffffff80375453>] ? 
> xfs_fs_destroy_inode+0x36/0x54
> May  5 19:45:39 x kernel:  [<ffffffff80290304>] ? dispose_list+0xcd/0xfb
> May  5 19:45:39 x kernel:  [<ffffffff80290526>] ? 
> shrink_icache_memory+0x1f4/0x22a
> May  5 19:45:39 x kernel:  [<ffffffff8026242a>] ? shrink_slab+0xe4/0x157
> May  5 19:45:39 x kernel:  [<ffffffff80262b53>] ? kswapd+0x44f/0x5c9
> May  5 19:45:39 x kernel:  [<ffffffff8026063e>] ? 
> isolate_pages_global+0x0/0x231
> May  5 19:45:39 x kernel:  [<ffffffff8024458a>] ? 
> autoremove_wake_function+0x0/0x2e
> May  5 19:45:39 x kernel:  [<ffffffff8022a80e>] ? __wake_up_common+0x44/0x73
> May  5 19:45:39 x kernel:  [<ffffffff80262704>] ? kswapd+0x0/0x5c9
> May  5 19:45:39 x kernel:  [<ffffffff80244266>] ? kthread+0x47/0x73
> May  5 19:45:39 x kernel:  [<ffffffff8020c4ba>] ? child_rip+0xa/0x20
> May  5 19:45:39 x kernel:  [<ffffffff8024421f>] ? kthread+0x0/0x73
> May  5 19:45:39 x kernel:  [<ffffffff8020c4b0>] ? child_rip+0x0/0x20
> May  5 19:45:39 x kernel: Code: 83 e5 3f 89 ea e8 04 fc ff ff 85 c0 75
> 10 48 8b 54 24 08 48 8d 84 13 18 02 00 00 0f ab 28 48 63 c5 48 8b 5c c3
> 18 48 85 db 75 04 <0f> 0b eb fe 41 83 ed 06 41 ff cc 45$
> May  5 19:45:39 x kernel: RIP  [<ffffffff803916e0>]
> radix_tree_tag_set+0x86/0xc6
> May  5 19:45:39 x kernel:  RSP <ffff88016e2d1c88>
> May  5 19:45:39 x kernel: ---[ end trace aed81d6fef80e624 ]---
> 
> 
> I have logged a bug with debian
> ( more info http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=526406),
> there has been one other to report this problem.
> 
> we believe somebody has already reported a similar problem here
> http://groups.google.com/group/linux.kernel/browse_thread/thread/dd00f52e93397c9e/6b6814dab9b41a05?pli=1

Which no-one noticed was related to XFS (not in the subject line)
and so most people (like me) would have simply deleted it without
reading it....

> has any one else seen this problem, who do I need to raise this too ?

I've cc'd the XFS list.

> I am able to reproduce this problem on my machine (amd64 phenomem II 8G
> ram), running virtualbox, I have a vm access the local filesystem via
> nfs (udp) and when I do a rm -fr <some directory ~200M> I see the bug

I run debian, XFS and 2.6.29 on all my machines but I haven't
tripped over the problem - it all appears to be related to calling
dispose_list() during/just after removing a lot of files. If you
have a simple method of reproducing the problem (e.g. a simple shell
script) it would help track down the problem much faster....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>