xfs
[Top] [All Lists]

Re: 2.4.13 Mem Related Hangs

To: Jim Eshleman <jce0@xxxxxxxxxx>
Subject: Re: 2.4.13 Mem Related Hangs
From: Steve Lord <lord@xxxxxxx>
Date: 13 Dec 2001 09:14:02 -0600
Cc: Jason Allen <jallen@xxxxxxxx>, linux-xfs@xxxxxxxxxxx
In-reply-to: <3BF14253.1060008@xxxxxxxxxx>
References: <3BE6C909.6070308@xxxxxxxxxx> <3BF14253.1060008@xxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
On Tue, 2001-11-13 at 09:54, Jim Eshleman wrote:
> >   FWIW me too, on an 8-way 8.5GB (64GB HIGHMEM enabled) IBM Netfinity 
> > x370 (8500R) which functions as a production mail server.  I currently 
> > run 2.4.9 with XFS and it stays up for about a week under heavy load. 
> > 2.4.13 lasted about 4 hours under light load until all memory was 
> > consumed by cache then it became unresponsive.
> > 
> >   2.4.13 on a 2-way 1GB (64GB HIGHMEM enabled) Netfinity x350 test box 
> > with the same kernel config and XFS works fine even under stress, so 
> > perhaps our problem is similar to the discussion on l-k "Google's mm 
> > problems"...
> 
> 
>    Update: I'm unable to make 2.4.14 fail on the test box (running 
> Cerberus, bonnie++ against two XFS volumes, and LTP simultaneously) but 
> it melts-down just as 2.4.13 does on the big production box.  A short 
> time after all memory is eaten by file cache, and under light load, the 
> machine becomes unresponsive.  It took about five minutes to login at 
> the console.  No error messages on the console or in syslog.  Here's 
> some info, it's obvious in the vmstat output where the melt-down occurs:
> 
>    kernel config: http://www.lehigh.edu/~jce0/2.4.14-config
>    bootup messages: http://www.lehigh.edu/~jce0/2.4.14-messages
>    vmstat 60 output: http://www.lehigh.edu/~jce0/2.4.14-vmstat
>    ver_linux output: http://www.lehigh.edu/~jce0/ver_linux.out
> 
>    This is linus 2.4.14 patched with linux-2.4.14-xfs-2001-11-06.patch 
> and LVM 0.9.1_beta6, compiled with egcs-2.91.66.  It's a RH 7.1 system.
> 
>    I know Andrea and Marcelo? were testing and fixing some HIGHMEM 
> things.  Were there any patches and did they make it into the Linus tree?
> 
>    Any assistance greatly appreciated.
> 
> Jim
> 

Going through my old email - I think I just fixed this - there was a bug
in the delayed allocation handling in XFS which caused a memory leak
due to a buffer_head reference count leak. The latest cvs tree (2.4.16
based) has the fix in it.

This bug was introduced around the time the new VM showed up in 2.4.10.

Steve

-- 

Steve Lord                                      voice: +1-651-683-3511
Principal Engineer, Filesystem Software         email: lord@xxxxxxx


<Prev in Thread] Current Thread [Next in Thread>