On Sat, Feb 21, 2015 at 08:12:08PM +0900, Tetsuo Handa wrote:
> My main issue is
> c) whether to oom-kill more processes when the OOM victim cannot be
> terminated presumably due to the OOM killer deadlock.
> Dave Chinner wrote:
> > On Fri, Feb 20, 2015 at 07:36:33PM +0900, Tetsuo Handa wrote:
> > > Dave Chinner wrote:
> > > > I really don't care about the OOM Killer corner cases - it's
> > > > completely the wrong way line of development to be spending time on
> > > > and you aren't going to convince me otherwise. The OOM killer a
> > > > crutch used to justify having a memory allocation subsystem that
> > > > can't provide forward progress guarantee mechanisms to callers that
> > > > need it.
> > >
> > > I really care about the OOM Killer corner cases, for I'm
> > >
> > > (1) seeing trouble cases which occurred in enterprise systems
> > > under OOM conditions
> > You reach OOM, then your SLAs are dead and buried. Reboot the
> > box - its a much more reliable way of returning to a working system
> > than playing Russian Roulette with the OOM killer.
> What Service Level Agreements? Such troubles are occurring on RHEL systems
> where users are not sitting in front of the console. Unless somebody is
> sitting in front of the console in order to do SysRq-b when troubles
> occur, the down time of system will become significantly longer.
> What mechanisms are available for minimizing the down time of system
> when troubles under OOM condition occur? Software/hardware watchdog?
> Indeed they may help, but they may be triggered prematurely when the
> system has not entered into the OOM condition. Only the OOM killer knows.
# echo 1 > /proc/sys/vm/panic_on_oom
> We have memory cgroups to reduce the possibility of triggering the OOM
> killer, though there will be several bugs remaining in RHEL kernels
> which make administrators hesitate to use memory cgroups.
Fix upstream first, then worry about vendor kernels.
> Not only we cannot expect that the OOM killer messages being saved to
> /var/log/messages under the OOM killer deadlock condition, but also
CONFIG_PSTORE=y and configure appropriately from there.
> we do not emit the OOM killer messages if we hit
So add a warning.
> If you want to stop people from playing Russian Roulette with the OOM
> killer, please remove the OOM killer code entirely from RHEL kernels so that
> people must use their systems with hardcoded /proc/sys/vm/panic_on_oom == 1
> setting. Can you do it?
No. You need to go through vendor channels to get a vendor kernel
config change made.