X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id o746rV27016375 for ; Wed, 4 Aug 2010 01:53:32 -0500 X-ASG-Debug-ID: 1280905311-4e9902090000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 65EB811CC2F8 for ; Wed, 4 Aug 2010 00:01:52 -0700 (PDT) Received: from mail.internode.on.net (bld-mail12.adl6.internode.on.net [150.101.137.97]) by cuda.sgi.com with ESMTP id da3SWVHt3yTaYBAB for ; Wed, 04 Aug 2010 00:01:52 -0700 (PDT) Received: from dastard (unverified [121.44.216.100]) by mail.internode.on.net (SurgeMail 3.8f2) with ESMTP id 33964396-1927428 for multiple; Wed, 04 Aug 2010 16:23:49 +0930 (CST) Received: from dave by dastard with local (Exim 4.71) (envelope-from ) id 1OgXrH-0000Vy-IG; Wed, 04 Aug 2010 16:53:47 +1000 Date: Wed, 4 Aug 2010 16:53:47 +1000 From: Dave Chinner To: Ilia Mirkin Cc: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: XFS hung on 2.6.33.3 kernel Subject: Re: XFS hung on 2.6.33.3 kernel Message-ID: <20100804065347.GZ7362@dastard> References: <20100718012033.GA18888@dastard> <20100718235036.GC32635@dastard> <20100804004746.GT7362@dastard> <20100804042725.GX7362@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: bld-mail12.adl6.internode.on.net[150.101.137.97] X-Barracuda-Start-Time: 1280905313 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.36983 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Wed, Aug 04, 2010 at 12:39:08AM -0400, Ilia Mirkin wrote: > On Wed, Aug 4, 2010 at 12:27 AM, Dave Chinner wrote: > > On Tue, Aug 03, 2010 at 09:15:53PM -0400, Ilia Mirkin wrote: > >> On Tue, Aug 3, 2010 at 8:47 PM, Dave Chinner wrote: > >> > Ilia, > >> > > >> > Can you send me the output of this for your kernel that the > >> > traces came from: > >> > > >> > $ gdb > >> > (gdb) l *( xfs_write+0x2cc) > >> > > >> > You can run it against the vmlinux file in the kernel build > >> > directory.  Basically I need to know which xfs_ilock() call in > >> > xfs_write() one of the mysqld-test processes is stuck on. > >> > >> No problem - BTW, I'm running this on a 2.6.33.3 kernel (same as the > >> one before, although diff hardware). If you want (and are fine with me > >> "destroying" the current state), I can upgrade it to a kernel of your > >> choice and repeat the test overnight. > >> > >> Naturally I didn't have CONFIG_DEBUG_INFO in there... just changed > >> that to Y and recompiled. I'm not entirely sure that this preserves > >> all the offsets, but at least the BUG-HUNTING doc makes allusions that > >> it would. > >> > >> (gdb) l *( xfs_write+0x2cc) > >> 0xffffffff8124342d is in xfs_write (fs/xfs/linux-2.6/xfs_lrw.c:597). > >> 592                     if (!need_i_mutex && (mapping->nrpages || pos > >> > xip->i_size)) { > >> 593                             xfs_iunlock(xip, XFS_ILOCK_EXCL|iolock); > >> 594                             iolock = XFS_IOLOCK_EXCL; > >> 595                             need_i_mutex = 1; > >> 596                             mutex_lock(&inode->i_mutex); > >> 597                             xfs_ilock(xip, XFS_ILOCK_EXCL|iolock); > > > > Make sense. Can you run 'l *(xfs_ilock+0x2c)' as well? I just need to > > confirm which lock it has blocked on. > > (gdb) l *(xfs_ilock+0x2c) > 0xffffffff81221001 is in xfs_ilock (fs/xfs/linux-2.6/mrlock.h:48). > 43 down_read_nested(&mrp->mr_lock, subclass); > 44 } > 45 > 46 static inline void mrupdate_nested(mrlock_t *mrp, int subclass) > 47 { > 48 down_write_nested(&mrp->mr_lock, subclass); > 49 #ifdef DEBUG > 50 mrp->mr_writer = 1; > 51 #endif > 52 } OK, that doesn't help - it followed into the inline function rather than telling me which of the two calls in the function it was. I guess I'll need the disassembly output to work it out. Can you send the output of "disass xfs_ilock" instead? Thanks. Cheers, Dave. -- Dave Chinner david@fromorbit.com