xfs
[Top] [All Lists]

Re: [PATCH v3 3/5] mm: Notify filesystems when it's time to apply a defe

To: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
Subject: Re: [PATCH v3 3/5] mm: Notify filesystems when it's time to apply a deferred cmtime update
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 21 Aug 2013 08:43:49 +1000
Cc: Jan Kara <jack@xxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, "linux-ext4@xxxxxxxxxxxxxxx" <linux-ext4@xxxxxxxxxxxxxxx>, Theodore Ts'o <tytso@xxxxxxx>, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx, Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CALCETrXU3jm_R2SJbfgsM2COAuX_tRACQmuc7t7z5b2B+=kBgg@xxxxxxxxxxxxxx>
References: <cover.1376679411.git.luto@xxxxxxxxxxxxxx> <ec267e95fd21891986373c7af1c72b4c8b507332.1376679411.git.luto@xxxxxxxxxxxxxx> <20130820023615.GE6023@dastard> <CALCETrV-Toj-NGpmWnmoUbCwrMUXOSbjQdYsSVuTiH+2dEgPTQ@xxxxxxxxxxxxxx> <20130820040814.GH6023@dastard> <CALCETrWreWBwKdKS2w=fS+MwdaZv1eEsKjYo=P9eeXe7fZS6Jw@xxxxxxxxxxxxxx> <20130820160057.GC2862@xxxxxxxxxxxxx> <CALCETrXotDRKV8FtLn3g_9g_yOVceHzKpwXMAH+w7tBZ1P2F7A@xxxxxxxxxxxxxx> <20130820214819.GK6023@dastard> <CALCETrXU3jm_R2SJbfgsM2COAuX_tRACQmuc7t7z5b2B+=kBgg@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, Aug 20, 2013 at 02:54:01PM -0700, Andy Lutomirski wrote:
> On Tue, Aug 20, 2013 at 2:48 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > On Tue, Aug 20, 2013 at 09:42:34AM -0700, Andy Lutomirski wrote:
> >> On Tue, Aug 20, 2013 at 9:00 AM, Jan Kara <jack@xxxxxxx> wrote:
> >> > On Mon 19-08-13 21:14:44, Andy Lutomirski wrote:
> >> >> >> I could require ->writepages *and* ->flush_cmtime to handle the time
> >> >> >> update, but that would complicate non-transactional filesystems.
> >> >> >> Those filesystems should just flush cmtime at the end of writepages.
> >> >> >
> >> >> > do_writepages() is the wrong place to do such updates - we can get
> >> >> > writeback directly through .writepage, so the time updates need to
> >> >> > be in .writepage. That first .writepage call will clear the bit on
> >> >> > the mapping, so it's only done on the first call to .writepage on
> >> >> > the given mapping.
> >> >>
> >> >> Last time I checked, all the paths that actually needed the timestamp
> >> >> update went through .writepages.  I'll double-check.
> >> >   kswapd can call just .writepage to do the writeout so timestamp update
> >> > should be handled there as well. Otherwise all pages in a mapping can be
> >> > cleaned without timestamp being updated.
> >>
> >> OK, I'll fix that.
> >>
> >> >
> >> > Which btw made me realize that even your scheme doesn't completely make
> >> > sure timestamp is updated after mmap write - if you have pages 0 and 1, 
> >> > you
> >> > write to both of them - CMTIME flag gets set. Then fsync_range(fd, 0, 
> >> > 4096)
> >> > is called. We write the page 0, writeprotect it, update timestamps. But
> >> > page 1 is still writeable so writes to it won't set CMTIME flag, neither
> >> > update the timestamp... Not that I think this can be reasonably solved 
> >> > but
> >> > it is a food for thought.
> >>
> >> This should already work.  AS_CMTIME is set when the pte goes from
> >> dirty to clean, not when the pte goes from wp to writable.  So
> >> whenever clear_page_dirty_for_io is called on page 1, AS_CMTIME will
> >> be set and a subsequent writepages call will update the timestamp.
> >
> > Oh, I missed that - I thought you were setting AS_CMTIME during
> > .page_mkwrite.
> >
> > Setting it in clear_page_dirty_for_io() is too late for filesystems
> > to include it in their existing transactions during .writepage, (at
> > least for XFs and ext4) because they do their delayed allocation
> > transactions before changing page state....
> 
> Couldn't it go between mpage_map_and_submit_extent and
> ext4_journal_stop in ext4_writepages?

Maybe - I'm not an ext4 expert - but even if you can make it work
for ext4 in some way, that doesn't mean it is possible for any other
filesystem to use the same method. You're adding code to generic,
non-filesystem specific code paths and so the solutions need to be
generic rather not tied to how a specific filesystem is implemented.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>