[PATCH] xfs: flush workers before stopping log

Mark Tinguely tinguely at sgi.com
Fri Aug 31 13:15:01 CDT 2012


On 08/30/12 17:35, Dave Chinner wrote:
> On Thu, Aug 30, 2012 at 12:25:49PM -0500, Ben Myers wrote:
>> On Thu, Aug 30, 2012 at 10:23:35AM +1000, Dave Chinner wrote:
>>> On Wed, Aug 29, 2012 at 08:46:25AM -0500, tinguely at sgi.com wrote:
>>>> Index: b/fs/xfs/xfs_mount.c
>>>> ===================================================================
>>>> --- a/fs/xfs/xfs_mount.c
>>>> +++ b/fs/xfs/xfs_mount.c
>>>> @@ -1490,6 +1490,11 @@ xfs_unmountfs(
>>>>
>>>>   	xfs_qm_unmount(mp);
>>>>
>>>> +	/* flush the worker queues while the log still exists and
>>>> +	 * before the final sync and unmount record.
>>>> +	 */
>>>> +	xfs_syncd_stop(mp);
>>>
>>> xfs_syncd_stop() needs to die rather than being moved from place to
>>> place every time some problem is seemd.  I outlined what we need to
>>> do to solve the problems once and for all a couple of months ago:
>>>
>>> http://oss.sgi.com/archives/xfs/2012-06/msg00064.html
>>
>> Yikes, that patchset you've posted is very nice.  Probably it is appropriate
>> for the 3.7 merge window.  We also need a fix for 3.5-stable and 3.6.  Do you
>> have a suggestion for that?
>
> 3.7 - it's too late for 3.6.
>
>> It looks like you've also moved the final log
>> force to after cancellation of the work queue in patch 6, similar to what Mark
>> has done... so it would seem that Mark has the right idea.  I think Mark's
>> patch is appropriate for 3.6 and 3.5-stable inclusion to fix the regression
>> introduced by me in 8866fc6.
>>
>> What is your opinion?
>
> Don't do unnecessary work. :)
>
> Realistically, the question that I always ask myself in this
> situation is: do we really need to fix it right now?
>
> Backports to stable kernels are typically done in response to user
> reported issues. If users are not hitting the problem, then there's
> no compelling reason for backporting it. i.e. the answer is "no".
> If someone is hitting the problem, then the answer is "yes" and we
> need a temporary fix until the proper fix is done.
>
> To that extent, I've seen one user report of an unmount problem in 3.5
> (reported on #xfs), but it's not clear if the problem they hit was
> this problem or the fact that the superblock IO is not being waited
> for during unmount in 3.5. This was fixed in 3.6 by commit 9a57fa8e
> ("xfs: wait for the write the superblock on unmount"):
>
> http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs.git;a=commit;h=9a57fa8ee7c29e11c2a29ce058573ba99157eda7
>
> So, if we take this reported as a "yes, we need to fix it" trigger,
> then we've got a couple of fixes that will need to be backported to
> 3.5.x..
>
> Cheers,
>
> Dave.

You are correct that Ben, Phil White, and myself have seen the problem - 
yes all internal, but then we probably do more filesystem unmounts than 
the ordinary host. Maybe users have good habits such as syncing 3 times 
before umounting the filesystem (keep it up!) and they avoid this nastiness.

I see your point on fixing problems in older branches when/if they are 
reported by an user. I am not glowing with pride with the patch, it is 
something that survived a week of testing that would cause a panic in a 
couple hours without the patch. Since we hit this problem with such 
frequency, that we wanted to push for a little proactive attention to 
prevent future panics.

Thanks,

--Mark Tinguely.



More information about the xfs mailing list