xfs
[Top] [All Lists]

Re: [PATCH] xfs: shutdown xfs_sync_worker before the log

To: Brian Foster <bfoster@xxxxxxxxxx>
Subject: Re: [PATCH] xfs: shutdown xfs_sync_worker before the log
From: Ben Myers <bpm@xxxxxxx>
Date: Tue, 29 May 2012 12:04:30 -0500
Cc: xfs@xxxxxxxxxxx
In-reply-to: <4FC4ED13.6030904@xxxxxxxxxx>
References: <20120323174327.GU7762@xxxxxxx> <20120514203449.GE16099@xxxxxxx> <20120516015626.GN25351@dastard> <20120516170402.GD3963@xxxxxxx> <20120517071658.GP25351@dastard> <20120524223952.GU16099@xxxxxxx> <20120525204536.GA4721@xxxxxxx> <20120529150715.GB4721@xxxxxxx> <4FC4ED13.6030904@xxxxxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
Hey Brian,

On Tue, May 29, 2012 at 11:36:51AM -0400, Brian Foster wrote:
> On 05/29/2012 11:07 AM, Ben Myers wrote:
> > On Fri, May 25, 2012 at 03:45:36PM -0500, Ben Myers wrote:
> >> On Thu, May 24, 2012 at 05:39:52PM -0500, Ben Myers wrote:
> >>> Anyway, I'll make some time to work on this tomorrow so I can test it
> >>> over the weekend.
> >>
> >> This is going to spin over the weekend.  See what you think.
> > 
> > I'm reasonably satisfied with the test results over the weekend.  I did end
> > up hitting an unrelated assert:
> 
> I started testing the xfsaild idle patch based against the xfs tree over the
> weekend (after testing successfully against Linus' tree for several days) and
> reproduced the xfs_sync_worker() hang that Mark alerted me to last week.  I
> was considering doing a bisect in that tree since it doesn't occur in Linus'
> tree, but it sounds like I can pull this patch now and shouldn't expect to
> reproduce the sync_worker() hang either, correct? Thanks.

D'oh!  The xfs_sync_worker hang that Mark mentioned last week is when the sync
worker blocks on log reservation for the dummy transaction used to cover the
log, which means that it will not be calling xfs_ail_push_all, which might have
the effect of loosening things up a bit.

This thread is about a crash due to the xfs_sync_worker racing with unmount.  A
fix for this crash is in Linus' tree as of late last week.  Here we're looking
into replacing the existing fix with something that is a bit cleaner.  s_umount
is overkill for this situation, so now we're calling cancel_delayed_work_sync
to shutdown the sync_worker before shutting down the log in order to prevent
the crash.

Unfortunately this fix won't help you with the hang.  If you're considering
bisecting this, I think that Juerg Haefliger has reproduced a/the log hang all
the way back to 2.6.38.  Also Chris J Arges has reproduced one on 2.6.32.52.

See thread 'Still seeing hangs in xlog_grant_log_space'.  The log hang is a
wily coyote.  ;)

Regards,
Ben

<Prev in Thread] Current Thread [Next in Thread>