xfs
[Top] [All Lists]

Re: 2.4.13 Mem Related Hangs

To: Jim Eshleman <jce0@xxxxxxxxxx>
Subject: Re: 2.4.13 Mem Related Hangs
From: Jim Eshleman <jce0@xxxxxxxxxx>
Date: Wed, 19 Dec 2001 10:04:44 -0500
Cc: Steve Lord <lord@xxxxxxx>, Jason Allen <jallen@xxxxxxxx>, linux-xfs@xxxxxxxxxxx
References: <3BE6C909.6070308@xxxxxxxxxx> <3BF14253.1060008@xxxxxxxxxx> <1008256442.14210.0.camel@xxxxxxxxxxxxxxxxxxxx> <3C1E1538.4000609@xxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2.1) Gecko/20010901
Jim Eshleman wrote:

Steve Lord wrote:

On Tue, 2001-11-13 at 09:54, Jim Eshleman wrote:

FWIW me too, on an 8-way 8.5GB (64GB HIGHMEM enabled) IBM Netfinity x370 (8500R) which functions as a production mail server. I currently run 2.4.9 with XFS and it stays up for about a week under heavy load. 2.4.13 lasted about 4 hours under light load until all memory was consumed by cache then it became unresponsive.

2.4.13 on a 2-way 1GB (64GB HIGHMEM enabled) Netfinity x350 test box with the same kernel config and XFS works fine even under stress, so perhaps our problem is similar to the discussion on l-k "Google's mm problems"...


Update: I'm unable to make 2.4.14 fail on the test box (running Cerberus, bonnie++ against two XFS volumes, and LTP simultaneously) but it melts-down just as 2.4.13 does on the big production box. A short time after all memory is eaten by file cache, and under light load, the machine becomes unresponsive. It took about five minutes to login at the console. No error messages on the console or in syslog. Here's some info, it's obvious in the vmstat output where the melt-down occurs:

  kernel config: http://www.lehigh.edu/~jce0/2.4.14-config
  bootup messages: http://www.lehigh.edu/~jce0/2.4.14-messages
  vmstat 60 output: http://www.lehigh.edu/~jce0/2.4.14-vmstat
  ver_linux output: http://www.lehigh.edu/~jce0/ver_linux.out

This is linus 2.4.14 patched with linux-2.4.14-xfs-2001-11-06.patch and LVM 0.9.1_beta6, compiled with egcs-2.91.66. It's a RH 7.1 system.

I know Andrea and Marcelo? were testing and fixing some HIGHMEM things. Were there any patches and did they make it into the Linus tree?

  Any assistance greatly appreciated.

Jim



Going through my old email - I think I just fixed this - there was a bug
in the delayed allocation handling in XFS which caused a memory leak
due to a buffer_head reference count leak. The latest cvs tree (2.4.16
based) has the fix in it.

This bug was introduced around the time the new VM showed up in 2.4.10.

Steve



This would make my millennium. I shall test as soon as a new 2.4.16 patch set is available. Or 2.4.17, whichever comes first :-)

  Thanks Steve.

Jim


Of course I noticed linux-2.4.16-xfs-2001-12-16.cvs-patch.bz2 right after I sent this. Ran fine on the test box (which now has 5G RAM) and has been running over 24 hours on the production box with moderate load and no problems so far. Time will tell.

  Thanks again Steve.

Jim




<Prev in Thread] Current Thread [Next in Thread>