On 6/6/16 12:12 PM, Brian Foster wrote:
> The filesystem quiesce sequence performs the operations necessary to
> drain all background work, push pending transactions through the log
> infrastructure and wait on I/O resulting from the final AIL push. We
> have had reports of remount,ro hangs in xfs_log_quiesce() ->
> xfs_wait_buftarg(), however, and some instrumentation code to detect
> transaction commits at this point in the quiesce sequence has inculpated
> the eofblocks background scanner as a cause.
>
> While higher level remount code generally prevents user modifications by
> the time the filesystem has made it to xfs_log_quiesce(), the background
> scanner may still be alive and can perform pending work at any time. If
> this occurs between the xfs_log_force() and xfs_wait_buftarg() calls
> within xfs_log_quiesce(), this can lead to an indefinite lockup in
> xfs_wait_buftarg().
>
> To prevent this problem, cancel the background eofblocks scan worker
> during the remount read-only quiesce sequence. This suspends background
> trimming when a filesystem is remounted read-only. This is only done in
> the remount path because the freeze codepath has already locked out new
> transactions by the time the filesystem attempts to quiesce (and thus
> waiting on an active work item could deadlock). Kick the eofblocks
> worker to pick up where it left off once an fs is remounted back to
> read-write.
>
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> ---
>
> To confirm the problem, I have managed to manufacture the remount,ro
> hang issue described above by hacking in some delays/coordination
> between the quiesce and eofblocks background worker.
>
> Also, an alternative approach that I was considering is to run a
> synchronous scan around the same place this patch cancels the background
> scan. The idea is that the background scan would then have no work to do
> on the subsequent iteration and then die off naturally (until
> preallocation occurs once again). I suppose the downside is that a
> remount might not necessarily be expected to muck with preallocation
> state of affected files. This patch seemed more simple, but I'm open to
> either. Thoughts?
Your approach makes sense to me - "finishing" the scan prior to quiesce
could take a relatively unknown amount of time as well, right? And I guess
I don't see a real advantage to that; we don't wait for it on unmount
(right?)
Reviewed-by: Eric Sandeen <sandeen@xxxxxxxxxx>
> Brian
>
> fs/xfs/xfs_icache.c | 2 +-
> fs/xfs/xfs_icache.h | 1 +
> fs/xfs/xfs_super.c | 8 ++++++++
> 3 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 99ee6eee..fb39a66 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -765,7 +765,7 @@ restart:
> * Background scanning to trim post-EOF preallocated space. This is queued
> * based on the 'speculative_prealloc_lifetime' tunable (5m by default).
> */
> -STATIC void
> +void
> xfs_queue_eofblocks(
> struct xfs_mount *mp)
> {
> diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
> index 62f1f91..05bac99 100644
> --- a/fs/xfs/xfs_icache.h
> +++ b/fs/xfs/xfs_icache.h
> @@ -68,6 +68,7 @@ void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip);
> int xfs_icache_free_eofblocks(struct xfs_mount *, struct xfs_eofblocks *);
> int xfs_inode_free_quota_eofblocks(struct xfs_inode *ip);
> void xfs_eofblocks_worker(struct work_struct *);
> +void xfs_queue_eofblocks(struct xfs_mount *);
>
> int xfs_inode_ag_iterator(struct xfs_mount *mp,
> int (*execute)(struct xfs_inode *ip, int flags, void *args),
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 4700f09..7965371 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1294,6 +1294,7 @@ xfs_fs_remount(
> */
> xfs_restore_resvblks(mp);
> xfs_log_work_queue(mp);
> + xfs_queue_eofblocks(mp);
> }
>
> /* rw -> ro */
> @@ -1306,6 +1307,13 @@ xfs_fs_remount(
> * return it to the same size.
> */
> xfs_save_resvblks(mp);
> +
> + /*
> + * Cancel background eofb scanning so it cannot race with the
> + * final log force+buftarg wait and deadlock the remount.
> + */
> + cancel_delayed_work_sync(&mp->m_eofblocks_work);
> +
> xfs_quiesce_attr(mp);
> mp->m_flags |= XFS_MOUNT_RDONLY;
> }
>
|