xfs
[Top] [All Lists]

Re: [PATCH 16/19] VFS: use GFP_NOFS rather than GFP_KERNEL in __d_alloc.

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 16/19] VFS: use GFP_NOFS rather than GFP_KERNEL in __d_alloc.
From: NeilBrown <neilb@xxxxxxx>
Date: Thu, 17 Apr 2014 10:51:05 +1000
Cc: linux-mm@xxxxxxxxx, linux-nfs@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140416090051.GK15995@dastard>
References: <20140416033623.10604.69237.stgit@xxxxxxxxxxxxxx> <20140416040337.10604.61837.stgit@xxxxxxxxxxxxxx> <20140416062520.GG15995@dastard> <20140416164941.37587da6@xxxxxxxxxxxxxx> <20140416090051.GK15995@dastard>
On Wed, 16 Apr 2014 19:00:51 +1000 Dave Chinner <david@xxxxxxxxxxxxx> wrote:

> On Wed, Apr 16, 2014 at 04:49:41PM +1000, NeilBrown wrote:
> > On Wed, 16 Apr 2014 16:25:20 +1000 Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > 
> > > On Wed, Apr 16, 2014 at 02:03:37PM +1000, NeilBrown wrote:
> > > > __d_alloc can be called with i_mutex held, so it is safer to
> > > > use GFP_NOFS.
> > > > 
> > > > lockdep reports this can deadlock when loop-back NFS is in use,
> > > > as nfsd may be required to write out for reclaim, and nfsd certainly
> > > > takes i_mutex.
> > > 
> > > But not the same i_mutex as is currently held. To me, this seems
> > > like a false positive? If you are holding the i_mutex on an inode,
> > > then you have a reference to the inode and hence memory reclaim
> > > won't ever take the i_mutex on that inode.
> > > 
> > > FWIW, this sort of false positive was a long stabding problem for
> > > XFS - we managed to get rid of most of the false positives like this
> > > by ensuring that only the ilock is taken within memory reclaim and
> > > memory reclaim can't be entered while we hold the ilock.
> > > 
> > > You can't do that with the i_mutex, though....
> > > 
> > > Cheers,
> > > 
> > > Dave.
> > 
> > I'm not sure this is a false positive.
> > You can call __d_alloc when creating a file and so are holding i_mutex on 
> > the
> > directory.
> > nfsd might also want to access that directory.
> > 
> > If there was only 1 nfsd thread, it would need to get i_mutex and do it's
> > thing before replying to that request and so before it could handle the
> > COMMIT which __d_alloc is waiting for.
> 
> That seems wrong - the NFS client in __d_alloc holds a mutex on a
> NFS client directory inode. The NFS server can't access that
> specific mutex - it's on the other side of the "network". The NFS
> server accesses mutexs from local filesystems, so __d_alloc would
> have to be blocked on a local filesystem inode i_mutex for the nfsd
> to get hung up behind it...

I'm not thinking of mutexes on the NFS inodes but the local filesystem inodes
exactly as you describe below.

> 
> However, my confusion comes from the fact that we do GFP_KERNEL
> memory allocation with the i_mutex held all over the place.

Do we?  Should we?  Isn't the whole point of GFP_NOFS to use it when holding
any filesystem lock?

>           If the
> problem is:
> 
>       local fs access -> i_mutex
> .....
>       nfsd -> i_mutex (blocked)
> .....
>       local fs access -> kmalloc(GFP_KERNEL)
>                       -> direct reclaim
>                       -> nfs_release_page
>                       -> <send write/commit request to blocked nfsds>
>                          <deadlock>
> 
> then why is it just __d_alloc that needs this fix?  Either this is a
> problem *everywhere* or it's not a problem at all.

I think it is a problem everywhere that it is a problem :-)
If you are holding an FS lock, then you should be using GFP_NOFS.
Currently a given filesystem can get away with sometimes using GFP_KERNEL
because that particular lock never causes contention during reclaim for that
particular filesystem.

Adding loop-back NFS into the mix broadens the number of locks which can
cause a problem as it creates interdependencies between different filesystems.

> 
> If it's a problem everywhere it means that we simply can't allow
> reclaim from localhost NFS mounts to run from contexts that could
> block an NFSD. i.e. you cannot run NFS client memory reclaim from
> filesystems that are NFS server exported filesystems.....

Well.. you cannot allow NFS client memory reclaim *while holding locks in*
filesystems that are NFS exported.

I think this is most effectively generalised to:
  you cannot allow FS memory reclaim while holding locks in filesystems which
  can be NFS exported

which I think is largely the case already - and lockdep can help us find
those places where we currently do allow FS reclaim while holding an FS lock.

Thanks,
NeilBrown

Attachment: signature.asc
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>