On Wed, Aug 15, 2001 at 05:24:48AM +0200, Steve Lord wrote:
> > Hi Steve,
> > In reply to your reply to my XFS memory hardening patch. Work email is
> > unfortunately down, so this way.
> hey, we had no phones today at work, just a hydrogen sulphide smell
> which they assured us was at an extremely low and safe concentration!
Ugh. Doesn't sound good.
> > > The problem with the code which does some
> > > xfs memory allocation failure detection is that you can never get to
> > > all of them, this is why I have never checked in the stuff about
> > > seeing a NULL and doing an error return. There are also places in xfs
> > > where failure is not an option - once a transaction has dirtied
> > > metadata there is no turning back. So really the only option which will
> > > fly long term is making sure memory allocations do not return failure
> > > when they get back up to xfs proper. I do have some other ideas it is
> > > just a matter of finding the time.
> > Most of my patch satisfies that. The kmem changes especially do not return
> > until success, even though that could mean deadlock on OOM.
> > The only place in the patch where a transaction could be potentially
> > dirtied and a new ENOMEM is added is in xfs_trans_read_buf(). There I still
> > think it's better to return the error than to oops in the caller on a NULL
> > pointer. Also this function definitely can return other errors instead
> > (EIO) so the callers should handle it (or they have a different bug)
> Yes the readbuf case is a little different, I think the issue here is
> that pagebuf cannot get any memory - either pages for the data, or
> buffer heads to read them with. An emergency pool of buffer heads in
> the pagebuf code may help with one part of this - a similar approach
> to md and friends.
Just needs to be very careful that all other users block, and do not hold
any locks that would prevent any of the blockers to succeed while doing that.
Frankly I don't know enough about XFS lock hierarchies to attempt that.
> The odd thing is I can run without your code under some fairly high
> stress loads without hitting these end cases, how much memory do you
> test with in general?
I mostly hit these because I had a page cache leak, and until that one
was found I had plenty of possibility to exercise OOM and do these patches.
On a 512MB machine now I also don't get any OOM with "normal" load.