[Top] [All Lists]

Re: [PATCH] mm: add gfp_mask parameter to vm_map_ram()

To: Minchan Kim <minchan@xxxxxxxxxx>
Subject: Re: [PATCH] mm: add gfp_mask parameter to vm_map_ram()
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 14 Jun 2012 13:34:29 +1000
Cc: Fengguang Wu <fengguang.wu@xxxxxxxxx>, Dave Chinner <dchinner@xxxxxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>, linux-fsdevel@xxxxxxxxxxxxxxx, LKML <linux-kernel@xxxxxxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Tejun Heo <tj@xxxxxxxxxx>, Linux Memory Management List <linux-mm@xxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <4FD94779.3030108@xxxxxxxxxx>
References: <20120612012134.GA7706@localhost> <20120613123932.GA1445@localhost> <20120614012026.GL3019@xxxxxxxxxxxxxxxx> <20120614014902.GB7289@localhost> <4FD94779.3030108@xxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Jun 14, 2012 at 11:07:53AM +0900, Minchan Kim wrote:
> Hi Fengguang,
> On 06/14/2012 10:49 AM, Fengguang Wu wrote:
> > On Thu, Jun 14, 2012 at 11:20:26AM +1000, Dave Chinner wrote:
> >> On Wed, Jun 13, 2012 at 08:39:32PM +0800, Fengguang Wu wrote:
> >>> Hi Christoph, Dave,
> >>>
> >>> I got this lockdep warning on XFS when running the xfs tests:

[rant warning]

> >> Bug in vm_map_ram - it does an unconditional GFP_KERNEL allocation
> >> here, and we are in a GFP_NOFS context. We can't pass a gfp_mask to
> >> vm_map_ram(), so until vm_map_ram() grows that we can't fix it...
> > 
> > This trivial patch should fix it.
> It shouldn't work because vmap_page_range still can allocate GFP_KERNEL by 
> pud_alloc in vmap_pud_range.
> For it, I tried [1] but other mm guys want to add WARNING [2] so let's 
> avoiding gfp context passing.

Oh, wonderful, you're pulling the "it's not a MM issue, don't use
vmalloc" card.

> [1] https://lkml.org/lkml/2012/4/23/77


"vmalloc was never supposed to use gfp flags for allocation
"context" restriction. I.e., it was always supposed to have
blocking, fs, and io capable allocation context."

vmalloc always was a badly neglected, ugly step-sister of kmalloc
that was kept in the basement and only brought out when the tax
collector called.  But that inner ugliness doesn't change the fact
that beatiful things have been built around it. XFS has used
vm_map_ram() and it's predecessor since it was first ported to linux
some 13 or 14 years ago, so the above claim is way out of date. i.e.
vmalloc has been used in GFP_NOFS context since before that flag
even existed....


"I would say add a bit of warnings and documentation, and see what
can be done about callers."

Wonderful. Well, there's about 2 years of work queued up for me
before I even get to the do the open heart surgery that would allow
XFS to handle memory allocation failures at this level without
causing the filesystem to shut down.

Andrew Morton's response:


"There are gruesome problems in block/blk-throttle.c (thread
"mempool, percpu, blkcg: fix percpu stat allocation and remove
stats_lock").  It wants to do an alloc_percpu()->vmalloc() from the
IO submission path, under GFP_NOIO.

Changing vmalloc() to take a gfp_t does make lots of sense, although
I worry a bit about making vmalloc() easier to use!"

OK, so according to Andrew there is no technical reason why it can't
be done, it's just handwaving about "vmalloc is bad"....

> [2] https://lkml.org/lkml/2012/5/2/340


"> Where are these offending callsites?


So, we've got a bunch of filesystems that require vmalloc under
GFP_NOFS conditions. Perhaps there's a reason for needing to be able
to do this in filesystem code? Like, perhaps, avoiding memory
reclaim deadlocks?


"Note that in writeback paths, a "good citizen" filesystem should
not require any allocations, or at least it should be able to
tolerate allocation failures.  So fixing that would be a good idea

Oh, please. I have been hearing this for years, and are we any
closer to it? No, we are further away from ever being able to
acheive this than ever. Face it, filesystems require memory
allocation to write dirty data to disk, and the amount is almost
impossible to define. Hence mempools can't be used because we can't
give any guarantees of forward progress. And for vmalloc?

Filesystems widely use vmalloc/vm_map_ram because kmalloc fails on
large contiguous allocations. This renders kmalloc unfit for
purpose, so we have to fall back to single page allocation and
vm_map_ram or vmalloc so that the filesystem can function properly.
And to avoid deadlocks, all memory allocation must be able to
specify GFP_NOFS to prevent the MM subsystem from recursing into the
filesystem. Therefore, vmalloc needs to support GFP_NOFS.

I don't care how you make it happen, just fix it. Trying to place
the blame on the filesystem folk for using vmalloc in GFP_NOFS
contexts is a total and utter cop-out, because mm folk of all people
should know that non-zero order kmalloc is not a reliable

[end rant]


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>