> Hi Steve,
> In reply to your reply to my XFS memory hardening patch. Work email is
> unfortunately down, so this way.
hey, we had no phones today at work, just a hydrogen sulphide smell
which they assured us was at an extremely low and safe concentration!
> > The problem with the code which does some
> > xfs memory allocation failure detection is that you can never get to
> > all of them, this is why I have never checked in the stuff about
> > seeing a NULL and doing an error return. There are also places in xfs
> > where failure is not an option - once a transaction has dirtied
> > metadata there is no turning back. So really the only option which will
> > fly long term is making sure memory allocations do not return failure
> > when they get back up to xfs proper. I do have some other ideas it is
> > just a matter of finding the time.
> Most of my patch satisfies that. The kmem changes especially do not return
> until success, even though that could mean deadlock on OOM.
> The only place in the patch where a transaction could be potentially
> dirtied and a new ENOMEM is added is in xfs_trans_read_buf(). There I still
> think it's better to return the error than to oops in the caller on a NULL
> pointer. Also this function definitely can return other errors instead
> (EIO) so the callers should handle it (or they have a different bug)
Yes the readbuf case is a little different, I think the issue here is
that pagebuf cannot get any memory - either pages for the data, or
buffer heads to read them with. An emergency pool of buffer heads in
the pagebuf code may help with one part of this - a similar approach
to md and friends.
The odd thing is I can run without your code under some fairly high
stress loads without hitting these end cases, how much memory do you
test with in general?