xfs
[Top] [All Lists]

Re: [PATCH 3/8] xfs: initialise xfssync work before running quotachecks

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 3/8] xfs: initialise xfssync work before running quotachecks
From: Ben Myers <bpm@xxxxxxx>
Date: Wed, 28 Mar 2012 14:40:18 -0500
Cc: Mark Tinguely <tinguely@xxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <20120326215709.GM5091@dastard>
References: <1332393313-1955-1-git-send-email-david@xxxxxxxxxxxxx> <1332393313-1955-4-git-send-email-david@xxxxxxxxxxxxx> <20120322151548.GS7762@xxxxxxx> <20120322210723.GC5091@dastard> <4F6C7BE7.3060100@xxxxxxx> <20120325232253.GJ5091@dastard> <4F7086FA.9010003@xxxxxxx> <20120326215709.GM5091@dastard>
User-agent: Mutt/1.5.18 (2008-05-17)
Hi Dave,

On Tue, Mar 27, 2012 at 08:57:09AM +1100, Dave Chinner wrote:
> On Mon, Mar 26, 2012 at 10:10:50AM -0500, Mark Tinguely wrote:
> > On 03/25/12 18:22, Dave Chinner wrote:
> > >On Fri, Mar 23, 2012 at 08:34:31AM -0500, Mark Tinguely wrote:
> > >>>  On 03/22/12 16:07, Dave Chinner wrote:
> > >>>>  >On Thu, Mar 22, 2012 at 10:15:48AM -0500, Ben Myers wrote:
> > >>>>>  >>On Thu, Mar 22, 2012 at 04:15:08PM +1100, Dave Chinner wrote:
> > >>>>>>  >>>From: Dave Chinner<dchinner@xxxxxxxxxx>
> > >>>>>>  >>>
> > >>>>>>  >>>Because the mount process can run a quotacheck and consume lots 
> > >>>>>> of
> > >>>>>>  >>>inodes, we need to be able to run periodic inode reclaim during 
> > >>>>>> the
> > >>>>>>  >>>mount process. This will prevent running the system out of memory
> > >>>>>>  >>>during quota checks.
> > >>>>>>  >>>
> > >>>>>>  >>>This essentially reverts 2bcf6e97, but that is safe to do now 
> > >>>>>> that
> > >>>>>>  >>>the quota sync code that was causing problems during long 
> > >>>>>> quotacheck
> > >>>>>>  >>>executions is now gone.
> > >>>>>  >>
> > >>>>>  >>Dave, I've held off on #s 3 and 4 because they appear to be racy.  
> > >>>>> Being
> > >>>>  >
> > >>>>  >What race?
> > >>>>  >
> > >>>>  >Cheers,
> > >>>>  >
> > >>>>  >Dave
> > >>>
> > >>>
> > >>>  2 of the sync workers use iterators
> > >>>    xfs_inode_ag_iterator()
> > >>>     xfs_perag_get()
> > >>>      radix_tree_lookup(&mp->m_perag_tree, agno)
> > >>>
> > >>>  The race I was worried about was in xfs_mount() to initialize the
> > >>>  mp->m_perag_lock, and the radix tree initialization:
> > >>>    INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC)).
> > >>>
> > >>>  There is a lock and 2 or 3 unbuffered I/O are performed in 
> > >>> xfs_mountfs()
> > >>>  before the mp->m_perag_tree is initialized.
> > >Yes they are uncached IOs so do not utilise the cache that
> > >requires the mp->m_perag_tree to be initialised.
> > 
> > The point I was trying to make is the sync workers use iterators.
> > The race is to get the mp->m_perag_tree initialized before one of
> > the sync workers tries to do a xfs_perag_get().
> 
> Firstly, xfs_sync_worker does not iterate AGs at all anymore - it
> pushes the log and the AIL, and nothing else. So there is no
> problems there.

xfs_sync_worker forces the log and pushes the ail, and a sync is queued in
xfs_syncd_init before either the log or ail are initialized.  A sync should
not be queued before the log and ail are initialized regardless of the
value of xfs_syncd_centisecs.  In the error path of xfs_fs_fill_super a
sync could still be running after xfs_unmount is called, so there is a
window there where it could dereference m_log which had been set to NULL.

> Secondly xfs_flush_worker() is only triggered by ENOSPC, and that
> can't happen until the filesystem is mounted and real work starts.

xfs_flush_worker is triggered by xfs_flush_inodes on ENOSPC in xfs_create
and xfs_iomap_write_delay.  I agree that in upon startup one would not be
able to trigger an ENOSPC event from either of these codepaths until the
filesystem has mounted.  Further, if we hit any error in fill_super you
could not possibly trigger ENOSPC because the root inode had not been
allocated yet.

> Finally, the reclaim worker does iterate the perag tree,

xfs_reclaim_worker has a similar issue with the perag tree in this patch.
It could look at the tree before it has been initialized.

> but the next patch in the series ensures that is started on demand, not
> from xfs_syncd_init().  This ensures that iteration does not occur until
> after the first inode is placed into reclaim, and that must happen after
> the perag tree is initialised because otherwise we can't read in inodes,
> let alone put them into a reclaim state....

I suggest 3 and 4 should be combined into one patch.

You've reordered xfs_syncd_stop with respect to xfs_unmount in the error
path of xfs_fs_fill_super, but not in the regular unmount path
xfs_fs_put_super.  I think for consistency they should not be reordered in
the error path of xfs_fs_fill_super.

As long as workers can run before xfs_mountfs is run, they need to protect
themselves to ensure that the structures they are using are initialized.
It looks like xfs_reclaim_worker would do this in the next patch by using
MS_ACTIVE, but FWICS xfs_sync_worker still does not protect itself.

-Ben

<Prev in Thread] Current Thread [Next in Thread>