xfs
[Top] [All Lists]

Re: IA64 Linux VM performance woes.

To: "Michael E. Thomadakis" <miket@xxxxxxxxxxxxxxx>
Subject: Re: IA64 Linux VM performance woes.
From: Chris Wedgwood <cw@xxxxxxxx>
Date: Wed, 14 Apr 2004 04:34:20 -0700
Cc: linux-xfs@xxxxxxxxxxx, andeen@xxxxxxx, lord@xxxxxxx
In-reply-to: <Pine.SGI.4.56.0404131254090.207155@xxxxxxxxxxxxxxx>
References: <Pine.SGI.4.56.0404131254090.207155@xxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
On Tue, Apr 13, 2004 at 01:32:36PM -0500, Michael E. Thomadakis wrote:

> I've also noticed that the FC adapter driver threads are running at
> 100% CPU utilization, when they are pumping data to the RAID for
> long time. Is there any data copy taking place at the drivers? The
> HBAs are from QLogic.

I would bitch to your SGI support channel about this.  Off the top of
your head do you have any idea where in the driver the cycles are
being spent?

> A more disturbing issue is that the system does NOT clean up the
> file cache and eventually all memory gets occupied by FS pages. Then
> the system simply hungs.

Are you sure about this?  How can you tell?  I guess it's redundant to
mention linux will cache all fs pages it can and release them as
required but not before then.  A umount will force them to be released
however.

Are you perhaps seeing bad slab behaviour (unbounded growth with weak
pressure to shrink it) instead I wonder?  Looking at /proc/slabinfo
will give you some idea of how much slab is being use and by what.

> One of our main objectives at our center is to maximize file
> thoughput for our systems. We are a medium size Supercomputing
> Center were compute and I/O intensive numerical computation code
> runs in batch sub-systems. Several programs expect and generate
> often very large files, in the order of 10-70GBs.

Are these C or fortran programs?  SGI has a fortran library that's
supposed to do good stuff for file-IO using O_DIRECT and other smarts.
I forget what it's called but the SGI support people should know.

> Another common problem is the competition between file cache and
> computation pages. We definitely do NOT want file cache pages being
> cached, while computation pages are reclaimed.

Known problem that's especially bad with 2.4.x --- it's even apparent
when doing a backup on a live system as that will cause swapping.
I'ts acutally a vm balancing problem and not specific to XFS.

There are a couple of ways you can 'hack' around this I an think of
right now.  Either mlock you applications pages or use O_DIRECT IO in
your applications.  For the latter you want to be a little clever to
do write-behind and read-ahead sort of stuff in another thread or just
use really large IO-sizes and assume that's good enouugh.

Also, do current propack 2.4.x kernels use rmap?

> 1. Set an upper bound on the number of memory pages ever caching FS
>    blocks.

Presently not possible.  Discussed a few times with various vm people
but nothing ever came of it as far as I know.

> 2. Control the amount of data flushed out to disk in set time
> periods; that is we need to be able to match the long term flushing
> rate with the service rate that the I/O subsystem is capable of
> delivering, tolerating possible transient spikes. We also need to be
> able to control the amount of read-ahead, write behind or even hint
> that data are only being streamed through, never to be reused again.

I think with some care you should be able to tune that a little
better.

> 3. Specify different parameters for 2., above, per file system: we
> have file systems that are meant to transfer wide stripes of
> sequential data, vs. file systems that need to perform well with
> smaller block, random I/O, vs. ones that need to provide access to
> numerous smaller files.

You can tune the fs parameters to some extent that may help here.  You
might also want to look at using a real-time subvolume if you have
lots of streaming data (again, this implies O_DIRECT).

> Also, cache percentages per file system would be useful.

That's starting to sound pretty complex to manage and tune.

> 4. Specify, if else fails, what parts of the FS cache should flushed
> in the near future.

Does madvise suffice?  Actually, I'm not sure that it will, I'd have
to check to see how much of it is actually implemented in a useful way
but I recall noise about it not being very useful at one point.

> 5. Provide in-depth technical documentation on the internal workings
> of the file system cache, its interaction with the VM and the
> interaction of XFS/LVM with the VM.

This is starting to sound really complicated.  The page-cache
semantics are pretty clear but when it comes to interactions with slab
and slab pressure it gets a little more muddy and I'm not sure.  There
is also an XFS-specific buffer layer for metadata.

> 6. We do operate IRIX Origins and IBM Regatta SMPs where all these
> issues have been addressed to a far more satisfying degree than on
> Linux. Is the IRIX file system cache going to be ported to ALTIX
> Linux?

I sersiously doubt such a thing is possible in any reasonable time
frame.  Or desirable.


I'm almost tempted to suggest you try mainline 2.6.x and see if that
behaves any better.  Normally I would expect Propack's XFS performance
to be much better than 2.6.x but I wonder if you're not hitting 2.4.x
VM suckage.


   --cw


<Prev in Thread] Current Thread [Next in Thread>