xfs
[Top] [All Lists]

Re: [PATCH v4 3/7] mm: Allow filesystems to defer cmtime updates

To: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
Subject: Re: [PATCH v4 3/7] mm: Allow filesystems to defer cmtime updates
From: Jan Kara <jack@xxxxxxx>
Date: Wed, 4 Sep 2013 16:57:40 +0200
Cc: linux-kernel@xxxxxxxxxxxxxxx, linux-ext4@xxxxxxxxxxxxxxx, Dave Chinner <david@xxxxxxxxxxxxx>, Theodore Ts'o <tytso@xxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx, Jan Kara <jack@xxxxxxx>, Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <e1620c8e4909a65e270c8e9590e307c22fd96a44.1377193658.git.luto@xxxxxxxxxxxxxx>
References: <cover.1377193658.git.luto@xxxxxxxxxxxxxx> <e1620c8e4909a65e270c8e9590e307c22fd96a44.1377193658.git.luto@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu 22-08-13 17:03:19, Andy Lutomirski wrote:
> Filesystems that defer cmtime updates should update cmtime when any
> of these events happen after a write via a mapping:
> 
>  - The mapping is written back to disk.  This happens from all kinds
>    of places, most of which eventually call ->writepages.  (The
>    exceptions are vmscan and migration.)
> 
>  - munmap is called or the mapping is removed when the process exits
> 
>  - msync(MS_ASYNC) is called.  Linux currently does nothing for
>    msync(MS_ASYNC), but POSIX says that cmtime should be updated some
>    time between an mmaped write and the subsequent msync call.
>    MS_SYNC calls ->writepages, but MS_ASYNC needs special handling.
> 
> Filesystems are responsible for checking for pending deferred cmtime
> updates in .writepages (a helper is provided for this purpose) and
> for doing the actual update in .update_cmtime_deferred.
> 
> These changes have no effect by themselves; filesystems must opt in
> by implementing .update_cmtime_deferred and removing any
> file_update_time call in .page_mkwrite.
> 
> This patch does not implement the MS_ASYNC case; that's in the next
> patch.
> 
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
...
> +/**
> + * generic_update_cmtime_deferred - update cmtime after an mmapped write
> + * @mapping: The mapping
> + *
> + * This library function implements .update_cmtime_deferred.  It is unlikely
> + * that any filesystem will want to do anything here except update the time
> + * (using this helper) or nothing at all (by leaving .update_cmtime_deferred
> + * NULL).
> + */
> +void generic_update_cmtime_deferred(struct address_space *mapping)
> +{
> +     struct blk_plug plug;
> +     blk_start_plug(&plug);
> +     inode_update_time_writable(mapping->host);
> +     blk_finish_plug(&plug);
> +}
> +EXPORT_SYMBOL(generic_update_cmtime_deferred);
> +
  You can remove the pluggin here. Inode update will likely result in a
single write so there's no point.

> @@ -1970,6 +1988,39 @@ int write_one_page(struct page *page, int wait)
>  }
>  EXPORT_SYMBOL(write_one_page);
>  
> +void mapping_flush_cmtime(struct address_space *mapping)
> +{
> +     if (mapping_test_clear_cmtime(mapping) &&
> +         mapping->a_ops->update_cmtime_deferred)
> +             mapping->a_ops->update_cmtime_deferred(mapping);
> +}
> +EXPORT_SYMBOL(mapping_flush_cmtime);
  Hum, is there a reason for update_cmtime_deferred() operation? I can
hardly imagine anyone will want to do anything else than what
inode_update_time_writable() does so why bother? You mention tmpfs & co.
don't fit into your scheme well with which I agree so let's just keep
file_update_time() in their page_mkwrite() operation. But I don't see a
real need for avoiding the deferred cmtime logic...

> +
> +void mapping_flush_cmtime_nowb(struct address_space *mapping)
> +{
> +     /*
> +      * We get called from munmap and msync.  Both calls can race
> +      * with fs freezing.  If the fs is frozen after
> +      * mapping_test_clear_cmtime but before the time update, then
> +      * sync_filesystem will miss the cmtime update (because we
> +      * just cleared it) and we don't be able to write (because the
> +      * fs is frozen).  On the other hand, we can't just return if
> +      * we're in the SB_FREEZE_PAGEFAULT state because our caller
> +      * expects the timestamp to be synchronously updated.  So we
> +      * get write access without blocking, at the SB_FREEZE_FS
> +      * level.  If the fs is already fully frozen, then we already
> +      * know we have nothing to do.
> +      */
> +
> +     if (!mapping_test_cmtime(mapping))
> +             return;  /* Optimization: nothing to do. */
> +
> +     if (__sb_start_write(mapping->host->i_sb, SB_FREEZE_FS, false)) {
> +             mapping_flush_cmtime(mapping);
> +             __sb_end_write(mapping->host->i_sb, SB_FREEZE_FS);
> +     }
> +}
  This is wrong because SB_FREEZE_FS level is targetted for filesystem
internal use. Also it is racy. mapping_flush_cmtime() ends up calling
mark_inode_dirty() and filesystems such as ext4 or xfs will start a
transaction to store inode in the journal. This gets freeze protection at
SB_FREEZE_FS level again. If freeze_super() sets s_writers.frozen to
SB_FREEZE_FS before this second protection, things will deadlock.

Since the callers of this function hold mmap_sem, using SB_FREEZE_PAGEFAULT
protection would be appropriate. Also since there are just two places that
need the freeze protection I'd be inclined to open code the protection in
the two places rather than hiding it in a special function.

                                                                Honza

-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR

<Prev in Thread] Current Thread [Next in Thread>