Please try the current cvs tree, the 2.4.17 split patches may be close
enough, anything prior to that has problems with a memory leak under
pressure. Lets then take a closer look at the oops output from that.
On Wed, 2002-01-16 at 14:27, Ian D. Hardy wrote:
> I've been looking at this further today.
> If Steve Lord is correct in his assessment that my Oops was due to a 'out of
> memory condition' (I'm sure he is, looks sensible) and I've correctly
> interpreted memory usage (via 'vmstat') then it would appear that the
> kernel is running out of available memory, with resultant problems for
> XFS because all of the memory is been used, most of it to cache filesystem
> data. Currently 'top' is showing:
> 7:03pm up 7:19, 2 users, load average: 0.00, 0.00, 0.00
> 108 processes: 107 sleeping, 1 running, 0 zombie, 0 stopped
> CPU states: 0.1% user, 0.5% system, 0.0% nice, 99.2% idle
> Mem: 898848K av, 895792K used, 3056K free, 0K shrd, 13392K buff
> Swap: 1028120K av, 4780K used, 1023340K free 820396K
> Confirming that the majority of the memory is been used for cache and that
> there is currently just under 3Mbytes of RAM free, so I can see how a sudden
> kernel demand for memory may result in a failure before 'kswapd' has chance
> to free some memory up.
> Isn't this a common problem? wouldn't this be the normal state for
> a fileserver with more 'active' data than memory - filesystem cache will
> (under Linux) grow to use as much memory as possible, hence 'free' memory
> will be at a minimum? From this reasoning adding more memory is unlikely
> to help - as the system will just use more cache until there is the same
> minimum amount of memory?
> It would appear that XFS has the potential to make heavier demands
> on kernel memory and/or does not check that memory was allocated (I
> assume that its memory allocation calls are such that it expects memory
> to be allocated from immediately available physical memory?), hence
> this problem (I believe a number of people have reported problems that
> have been attributed to memory allocation errors?).
> ..... have I missed something?
> I then looked to try to find out how to decrease the maximum amount of
> memory that the kernel would use as FS cache (or increase the minimum
> amount of memory that the kernel would try to keep free). Thanks to
> harri.haataja@xxxxxxxxxxxxxx for pointing me in the direction of
> 'Documentation/sysctl/vm.txt' in the kernel. Though as was noted this
> document is out of date referring to the 2.2 series kernels. It does
> however point to '/proc/sys/vm/feepages' as giving the number of pages
> at which 'kswapd' will start to free pages. This seems to exist in the
> 2.4 series kernels upto 2.4.9 (it may be 10 or 11?) after that the VM
> changed and this parameter went away (though the documentation hasn't
> changed...... So I'm completely lost with the later kernels!). Anyway
> back to 2.4.9:
> # cat /proc/sys/vm/freepages
> 383 766 1149
> Which means that the kernel will try to maintain 1149 pages (~4.5Mbytes)
> free memory, will try even harder to free memory at 766 pages and will
> stop allocating memory other than to root if free memory falls below
> 383 pages (~1.5Mbytes). This would seem to agree with 'vmstat'/'top'
> which tend to show ~3Mbytes free memory. I then tried to increase
> these values, however these appear to be read only values (tried by
> writing directly to the file and using 'sysctl'. Indeed a comment
> in the kernel file 'mm/page_alloc.c' confirms that the 'freepages'
> array is not writable due to potential conflicts with different memory
> zones. So I'm stuck again. I've not been able to find any obvious
> place in the kernel source to change these values at kernel compilation
> time either.
> Any ideas? (either for 2.4.9 or ideally in latter/current
> kernels, I have reproduced what looked like a similar failure with
> 2.4.16 and 2.4.17 but unfortunately did not get the Oops details to
> confirm that it was the same problem, I'll try to setup a test to
> get this info, is there any more info that would help). FYI: test
> environment is a server and a number (~6) client machines running
> a mixture of 'bonnie' runs and back-to-back tar's copying the local
> /usr to the shared filesystem from the server (last time I looked
> at this it lasted ~ 24hours).
> Many thanks for your time.
> "Ian D. Hardy" wrote:
> > >
> > > On Tue, 2002-01-15 at 13:33, Ian D. Hardy wrote:
> > > > Hi,
> > > >
> > > > For some time we've been having problem with a server, which is acting
> > > > as a master/control node and NFS server for a computational cluster
> > > > (~180 client nodes). The server will crash after anywhere between
> > > > a few hours and 10 days operation. We've tried various kernels and
> > > > XFS patch versions from 2.4.9 kernel with XFS patch-2.4.9-xfs-2001-08-17
> > > > up to and including 2.4.16 kernel with the xfs-2.4.16-all-i386 patch,
> > > > if anything the 2.4.9 kernel has proved the most reliable (it normally
> > > > lasts between 4 and 10 days! - 2.4.16 lasted less than 24hrs).
> > .... more details deleted
> > > >
> > >
> > > Almost certainly this is an out of memory condition, just from looking
> > > at the code in the function you oopsed in. Would you say your system is
> > > stressed when it comes to memory?
> > >
> > > Steve
> > >
> > > Steve Lord voice: +1-651-683-3511
> > > Principal Engineer, Filesystem Software email: lord@xxxxxxx
> > >
> > I'd not regard the server as short of memory as its using ~660Mbytes as
> > file system cache, though interestingly it does appear to be using some
> > swap space. Is it possible that XFS is having problems when there is not
> > memory immediately available, I've included some 'vmstat' output:
> > vmstat 10 10
> > procs memory swap io system
> > cpu
> > r b w swpd free buff cache si so bi bo in cs us sy
> > id
> > 0 1 28 11116 3784 42300 674656 0 0 37 35 19 3 0 2
> > 23
> > 0 1 28 11116 3480 42300 674960 0 0 0 0 167 194 0 0
> > 100
> > 0 1 28 11116 3176 42300 675264 0 0 0 0 139 107 0 0
> > 100
> > 0 1 28 11116 3056 42300 675380 0 0 0 2 152 142 0 0
> > 100
> > In Irix I'd tune the kernel parameters 'min_free_pages'... to ensure that
> > there was always physical memory available, is there any equivalent in
> > Linux (sorry if this is a silly/obvious question).
> > Many thanks.
> > Ian
> > --
> > ////////////////////////////////////////////////////////////////////////////
> > Ian Hardy Tel: 023 80593577
> > Research Services Fax: 023 80593131
> > Computing Services email: idh@xxxxxxxxxxx
> > Southampton University i.d.hardy@xxxxxxxxxxx
> > Southampton S017 1BJ, UK.
> > \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
> /////////////Technical Coordination, Research Services////////////////////
> Ian Hardy Tel: 023 80 593577
> Computing Services Mobile: 0709 2127503
> Southampton University email: idh@xxxxxxxxxxx
> Southampton S017 1BJ, UK. i.d.hardy@xxxxxxxxxxx
> \\'BUGS: The notion of errors is ill-defined' (IRIX man page for netstat)\
Steve Lord voice: +1-651-683-3511
Principal Engineer, Filesystem Software email: lord@xxxxxxx