X-Spam-Checker-Version: SpamAssassin 3.4.0-r929098 (2010-03-30) on oss.sgi.com X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham version=3.4.0-r929098 Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p42CGS4V231162 for ; Mon, 2 May 2011 07:16:28 -0500 X-ASG-Debug-ID: 1304338800-2fed00a40000-NocioJ X-Barracuda-URL: http://cuda.sgi.com:80/cgi-bin/mark.cgi Received: from ipmail06.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 618D91EC7831 for ; Mon, 2 May 2011 05:20:01 -0700 (PDT) Received: from ipmail06.adl6.internode.on.net (ipmail06.adl6.internode.on.net [150.101.137.145]) by cuda.sgi.com with ESMTP id iw9O7oVcwqwfOX4b for ; Mon, 02 May 2011 05:20:01 -0700 (PDT) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEABGhvk15LBza/2dsb2JhbACmG3jADQ6DEYJhBJ0t Received: from ppp121-44-28-218.lns20.syd6.internode.on.net (HELO dastard) ([121.44.28.218]) by ipmail06.adl6.internode.on.net with ESMTP; 02 May 2011 21:50:00 +0930 Received: from dave by dastard with local (Exim 4.72) (envelope-from ) id 1QGs6Y-0000rN-BI; Mon, 02 May 2011 22:19:58 +1000 Date: Mon, 2 May 2011 22:19:58 +1000 From: Dave Chinner To: Christian Kujau Cc: Markus Trippelsdorf , LKML , xfs@oss.sgi.com, minchan.kim@gmail.com X-ASG-Orig-Subj: Re: 2.6.39-rc4+: oom-killer busy killing tasks Subject: Re: 2.6.39-rc4+: oom-killer busy killing tasks Message-ID: <20110502121958.GA2978@dastard> References: <20110427022655.GE12436@dastard> <20110427102824.GI12436@dastard> <20110428233751.GR12436@dastard> <20110429201701.GA13166@x4.trippels.de> <20110501080149.GD13542@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Barracuda-Connect: ipmail06.adl6.internode.on.net[150.101.137.145] X-Barracuda-Start-Time: 1304338803 X-Barracuda-Bayes: INNOCENT GLOBAL 0.0000 1.0000 -2.0210 X-Barracuda-Virus-Scanned: by cuda.sgi.com at sgi.com X-Barracuda-Spam-Score: -2.02 X-Barracuda-Spam-Status: No, SCORE=-2.02 using per-user scores of TAG_LEVEL=2.0 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=2.1 tests= X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.2.62557 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- X-Virus-Scanned: ClamAV version 0.94.2, clamav-milter version 0.94.2 on oss.sgi.com X-Virus-Status: Clean On Sun, May 01, 2011 at 09:59:35PM -0700, Christian Kujau wrote: > On Sun, 1 May 2011 at 18:01, Dave Chinner wrote: > > I really don't know why the xfs inode cache is not being trimmed. I > > really, really need to know if the XFS inode cache shrinker is > > getting blocked or not running - do you have those sysrq-w traces > > when near OOM I asked for a while back? > > I tried to generate those via /proc/sysrq-trigger (don't have a F13/Print > Screen key), but the OOM killer kicks in prett fast - so fast thay my > debug script, trying to generate sysrq-w every second was too late and the > machine was already dead: > > http://nerdbynature.de/bits/2.6.39-rc4/oom/ > * messages-10.txt.gz > * slabinfo-10.txt.bz2 > > Timeline: > - du(1) started at 12:25:16 (and immediately listed > as "blocked" task) > - the last sysrq-w succeeded at 12:38:05, listing kswapd0 > - du invoked oom-killer at 12:38:06 > > I'll keep trying... > > > scan only scanned 516 pages. I can't see it freeing many inodes > > (there's >600,000 of them in memory) based on such a low page scan > > number. > > Not sure if this is related...this XFS filesytem I'm running du(1) on is > ~1 TB in size, with 918K allocated inodes, if df(1) is correct: > > # df -hi /mnt/backup/ > Filesystem Inodes IUsed IFree IUse% Mounted on > /dev/mapper/wdc1 37M 918K 36M 3% /mnt/backup > > > Maybe you should tweak /proc/sys/vm/vfs_cache_pressure to make it > > reclaim vfs structures more rapidly. It might help > > /proc/sys/vm/vfs_cache_pressure is currently set to '100'. You mean I > should increase it? To..150? 200? 1000? Yes. Try 2 orders of magnitude as a start. i.e change it to 10000... Cheers, Dave. -- Dave Chinner david@fromorbit.com