xfs
[Top] [All Lists]

Re: How to handle TIF_MEMDIE stalls?

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: How to handle TIF_MEMDIE stalls?
From: Theodore Ts'o <tytso@xxxxxxx>
Date: Fri, 20 Feb 2015 22:20:00 -0500
Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>, hannes@xxxxxxxxxxx, mhocko@xxxxxxx, dchinner@xxxxxxxxxx, linux-mm@xxxxxxxxx, rientjes@xxxxxxxxxx, oleg@xxxxxxxxxx, akpm@xxxxxxxxxxxxxxxxxxxx, mgorman@xxxxxxx, torvalds@xxxxxxxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx, linux-ext4@xxxxxxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=thunk.org; s=ef5046eb; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date; bh=GTsRvxHuT3C3/mCWzfje0PHMJi+qCfQCut2hf2tED6s=; b=KSNY3gLIxRQXAGHgqcaQ+KMSv8Klbyw6hg5E7RVt0r7FQtnAKjoR1tcan4imzUwjKBg/VnYvZxSJVkxP8v2dgA8LtUnx4N9PwmlK3RkBnFqFFfZsrNHeD79ZGUpA2ghKawmcjHjppn7AMnf1cP7ykCiLWaDRXsC4OQBNGa37o9w=;
In-reply-to: <20150220231511.GH12722@dastard>
References: <201502172123.JIE35470.QOLMVOFJSHOFFt@xxxxxxxxxxxxxxxxxxx> <20150217125315.GA14287@xxxxxxxxxxxxxxxxxxxxxx> <20150217225430.GJ4251@dastard> <20150219102431.GA15569@xxxxxxxxxxxxxxxxxxxxxx> <20150219225217.GY12722@dastard> <201502201936.HBH34799.SOLFFFQtHOMOJV@xxxxxxxxxxxxxxxxxxx> <20150220231511.GH12722@dastard>
User-agent: Mutt/1.5.23 (2014-03-12)
+akpm

So I'm arriving late to this discussion since I've been in conference
mode for the past week, and I'm only now catching up on this thread.

I'll note that this whole question of whether or not file systems
should use GFP_NOFAIL is one where the mm developers are not of one
mind.

In fact, search for the subject line "fs/reiserfs/journal.c: Remove
obsolete __GFP_NOFAIL" where we recapitulated many of these arguments,
Andrew Morton said that it was better to use GFP_NOFAIL over the
alternatives of (a) panic'ing the kernel because the file system has
no way to move forward other than leaving the file system corrupted,
or (b) looping in the file system to retry the memory allocation to
avoid the unfortunate effects of (a).

So based on akpm's sage advise and wisdom, I added back GFP_NOFAIL to
ext4/jbd2.

It sounds like 9879de7373fc is causing massive file system
errors, and it seems **really** unfortunate it was added so late in
the day (between -rc6 and rc7).

So at this point, it seems we have two choices.  We can either revert
9879de7373fc, or I can add a whole lot more GFP_FAIL flags to ext4's
memory allocations and submit them as stable bug fixes.

Linux MM developers, this is your call.  I will liberally be adding
GFP_NOFAIL to ext4 if you won't revert the commit, because that's the
only way I can fix things with minimal risk of adding additional,
potentially more serious regressions.

                                                - Ted

<Prev in Thread] Current Thread [Next in Thread>