X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id oA4MxHjc167951 for ; Thu, 4 Nov 2010 17:59:18 -0500 X-ASG-Debug-ID: 1288911639-692f03150000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from mail.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 8BA141563E2 for ; Thu, 4 Nov 2010 16:00:40 -0700 (PDT) Received: from mail.internode.on.net (bld-mail20.adl6.internode.on.net [150.101.137.105]) by cuda.sgi.com with ESMTP id mo4h1do3iEcCL2JG for ; Thu, 04 Nov 2010 16:00:40 -0700 (PDT) Received: from dastard (unverified [121.44.196.177]) by mail.internode.on.net (SurgeMail 3.8f2) with ESMTP id 117881-1927428 for ; Fri, 05 Nov 2010 09:30:38 +1030 (CDT) Received: from dave by dastard with local (Exim 4.71) (envelope-from ) id 1PE8nN-00051N-AF for xfs@oss.sgi.com; Fri, 05 Nov 2010 10:00:37 +1100 Date: Fri, 5 Nov 2010 10:00:37 +1100 From: Dave Chinner To: xfs@oss.sgi.com X-ASG-Orig-Subj: Re: [bug, 2.6.37-current] Assertion failed: atomic_read(&pag->pag_ref) == 0 Subject: Re: [bug, 2.6.37-current] Assertion failed: atomic_read(&pag->pag_ref) == 0 Message-ID: <20101104230037.GD13830@dastard> References: <20101026071356.GY32255@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20101026071356.GY32255@dastard> User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: bld-mail20.adl6.internode.on.net[150.101.137.105] X-Barracuda-Start-Time: 1288911641 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -1.52 X-Barracuda-Spam-Status: No, SCORE=-1.52 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests=BSF_RULE7568M X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.45693 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.50 BSF_RULE7568M Custom Rule 7568M X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Tue, Oct 26, 2010 at 06:13:56PM +1100, Dave Chinner wrote: > Folks, > > Since themainline merge, I've been getting unmount failures during > shutdown that look like: > > Unmounting local filesystems...done. > Shutting down LVM Volume Groups[ 7088.820123] Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 259 > [ 7088.821811] ------------[ cut here ]------------ > [ 7088.822594] kernel BUG at fs/xfs/support/debug.c:108! > [ 7088.823383] invalid opcode: 0000 [#1] SMP > [ 7088.824019] last sysfs file: /sys/devices/system/node/node0/cpumap > [ 7088.824045] CPU 1 > [ 7088.824045] Modules linked in: > [ 7088.824045] > [ 7088.824045] Pid: 0, comm: kworker/0:0 Not tainted 2.6.36-dgc+ #587 /Bochs > [ 7088.824045] RIP: 0010:[] [] assfail+0x1f/0x30 > [ 7088.824045] RSP: 0018:ffff8800df003e50 EFLAGS: 00010286 > [ 7088.824045] RAX: 0000000000000069 RBX: ffff88011760a400 RCX: 0000000000000001 > [ 7088.824045] RDX: ffff88011b7742c0 RSI: 0000000000000001 RDI: 0000000000000246 > [ 7088.824045] RBP: ffff8800df003e50 R08: 0000000000000001 R09: 0000000000000001 > [ 7088.824045] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff81ef8f00 > [ 7088.824045] R13: ffff880117118df8 R14: ffff8800df1cecf0 R15: ffff880116ebf6e8 > [ 7088.824045] FS: 0000000000000000(0000) GS:ffff8800df000000(0000) knlGS:0000000000000000 > [ 7088.824045] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 7088.824045] CR2: 00007ffd8c8b6990 CR3: 0000000001edb000 CR4: 00000000000006e0 > [ 7088.824045] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 7088.824045] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 7088.824045] Process kworker/0:0 (pid: 0, threadinfo ffff88011b776000, task ffff88011b7742c0) > [ 7088.824045] Stack: > [ 7088.824045] ffff8800df003e70 ffffffff81499007 ffff8800df003e70 ffff8800df1cecc0 > [ 7088.824045] <0> ffff8800df003ed0 ffffffff810e900a 0000000000000001 000000000000000a > [ 7088.824045] <0> ffff880100000006 0000000000000202 0000000000000100 0000000000000048 > [ 7088.824045] Call Trace: > [ 7088.824045] > [ 7088.824045] [] __xfs_free_perag+0x37/0x50 > [ 7088.824045] [] __rcu_process_callbacks+0x13a/0x3e0 > [ 7088.824045] [] rcu_process_callbacks+0x28/0x50 > [ 7088.824045] [] __do_softirq+0xcd/0x290 > [ 7088.824045] [] ? hrtimer_interrupt+0x138/0x250 > [ 7088.824045] [] call_softirq+0x1c/0x50 > [ 7088.824045] [] do_softirq+0x9d/0xd0 > [ 7088.824045] [] irq_exit+0x95/0xa0 > [ 7088.824045] [] smp_apic_timer_interrupt+0x70/0x9b > [ 7088.824045] [] apic_timer_interrupt+0x13/0x20 > [ 7088.824045] > [ 7088.824045] [] ? native_safe_halt+0xb/0x10 > [ 7088.824045] [] ? trace_hardirqs_on+0xd/0x10 > [ 7088.824045] [] default_idle+0x50/0xb0 > [ 7088.824045] [] cpu_idle+0x78/0x100 > [ 7088.824045] [] start_secondary+0x1ac/0x1b1 > [ 7088.824045] Code: 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 44 00 00 31 c0 89 d1 48 89 f2 48 89 fe 48 c7 c7 08 38 df 81 e8 7b 34 64 00 <0f> 0b eb fe 66 66 66 66 2e > [ 7088.824045] RIP [] assfail+0x1f/0x30 > [ 7088.824045] RSP > [ 7088.863091] ---[ end trace ec76f8135c3adba9 ]--- > > I'm not seeing failures during xfstests runs, it seems that dbench may be the > trigger. Is anyone else seeing reference counting problems like this on the > current linus tree? Ok, found the bug - it's in the reclaim scalability patchset that was merged into .37-rc1 - when the shrinker skips a locked AG it misseѕ a xfs_perag_put() call. I'll push out a patch soon. Cheers, Dave. -- Dave Chinner david@fromorbit.com