Re: Kernel crash with 2.6.29 + nfs + xfs (radix-tree)

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: Kernel crash with 2.6.29 + nfs + xfs (radix-tree)
From: Alex Samad <alex@xxxxxxxxxxxx>
Date: Wed, 27 May 2009 12:54:57 +1000
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx, randrik@xxxxxxx
In-reply-to: <20090526090916.GA17194@xxxxxxxxxxxxx>
References: <20090520003745.GA27491@xxxxxxxxxxxx> <20090520090558.GQ16929@xxxxxxxxxxxxxxxx> <20090520095639.GA27496@xxxxxxxxxxxx> <20090526090916.GA17194@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.18 (2008-05-17)

I had a partition about 400G of xfs (lvm on a raid6 device) with source
tree of openwrt trunk on the partition (~200-300M of data and lots of
files - current tree ~ 40000 )

I have a VM (Virtualbox 2.2 - debian debi386 2.6.29-2) on the same machine that 
had mounted the
partition via nfs.

when I attempted to do a rm -fr trunk is when i saw problems - or on
builds (specially with make clean first)

from exports
-no_root_squash,insecure,wdelay,no_subtree_check,async,mp=/exports/shared mmac(all_squash,anonuid=1025,anongid=1029)

from the VM fstab

nfs.hme1.samad.com.au:/exports/shared   /exports/shared nfs
rw,exec,auto,async,_netdev,proto=udp 0 0

/exports/shared/src/openwrt/kamikaze/src/svn/trunk/tmp       none
rw,bind,exec,auto,async,_netdev 0 0

none    rw,bind,exec,auto,async,_netdev 0 0

/exports/shared/src/openwrt/kamikaze/src/svn/trunk/build_dir   none
rw,bind,exec,auto,async,_netdev 0 0

note I bind local filesystem to the work directories

I have attached the .config

I have since change the partition to ext3 and I have not seen any

I haven't gotten any messages left in my syslog, but I still have the
one on the debian bug report 



On Tue, May 26, 2009 at 05:09:16AM -0400, Christoph Hellwig wrote:
> So you're having primary a NFS workload, right?  Andrew had some
> dmesg output in bugzilla (please send this stuff to the list instead
> of hiding it in bugzilla if possible, BTW) that looks quite interesting:
> May 24 08:48:00 (none) last message repeated 61 times                         
> May 24 08:48:47 (none) last message repeated 760 times                        
> May 24 08:50:55 (none) kernel: reconnect_path: npd != pd                      
> May 24 08:50:55 (none) last message repeated 9 times                          
> May 24 08:55:04 (none) kernel: reconnect_path: npd != pd                      
> May 24 08:56:05 (none) last message repeated 47 times                         
> May 24 08:56:49 (none) last message repeated 419 times
> which means we are in deep trouble with the dcache coherency.  Also
> the only way the bug you two report could happen from my audit is
> we get ->destroy_inode called twice for the same inode.
> So defintively some deep problems here.  Alex and Andrew, can you send
> me your .config, and a description of the workload your seing this on?
> Also the /etc/exports file would be interesting.

Attachment: config-2.6.29-2-amd64
Description: Text document

Attachment: signature.asc
Description: Digital signature

