On Fri 28-11-14 13:14:21, Ted Tso wrote:
> On Fri, Nov 28, 2014 at 06:23:23PM +0100, Jan Kara wrote:
> > Hum, when someone calls fsync() for an inode, you likely want to sync
> > timestamps to disk even if everything else is clean. I think that doing
> > what you did in last version:
> > dirty = inode->i_state & I_DIRTY_INODE;
> > inode->i_state &= ~I_DIRTY_INODE;
> > spin_unlock(&inode->i_lock);
> > if (dirty & I_DIRTY_TIME)
> > mark_inode_dirty_sync(inode);
> > looks better to me. IMO when someone calls __writeback_single_inode() we
> > should write whatever we have...
> Yes, but we also have to distinguish between what happens on an
> fsync() versus what happens on a periodic writeback if I_DIRTY_PAGES
> (but not I_DIRTY_SYNC or I_DIRTY_DATASYNC) is set. So there is a
> check in the fsync() code path to handle the concern you raised above.
Ah, this is the thing you have been likely talking about but which I was
constantly missing in my thoughts. You don't want to write times when inode
has only dirty pages and timestamps - I was always thinking about a
situation where inode has only dirty timestamps and not pages. This
situation also complicates the writeback logic because when inode has dirty
pages, you need to track it as normal dirty inode for page writeback (with
dirtied_when correspoding to time when pages were dirtied) but in
parallel you now need to track the information that inode has timestamps
that weren't written for X long. And even if we stored how old are
timestamps it isn't easily possible to keep the list of inodes with just
dirty timestamps sorted by dirty time. So now I finally understand why you
did things the way you did them... Sorry for misleading you.
So let's restart the design so that things are clear:
1) We have new inode bit I_DIRTY_TIME. This means that only timestamps in
the inode have changed. The desired behavior is that inode is with
I_DIRTY_TIME and without I_DIRTY_SYNC | I_DIRTY_DATASYNC is written by
background writeback only once per 24 hours. Such inodes do get written by
sync(2) and fsync(2) calls.
2) Inodes with only I_DIRTY_TIME are tracked in a new dirty list
b_dirty_time. We use i_wb_list list head for this. Unlike b_dirty list,
this list isn't kept sorted by dirtied_when. If queue_io() sees for_sync
bit set in the work item, it will call mark_inode_dirty_sync() for all
inodes in b_dirty_time before queuing io from b_dirty list. Once per hour
(or something like that) flusher thread scans the whole b_dirty_time list
and calls mark_inode_dirty_sync() for all inodes that have too old dirty
timestamps (to detect this we need a new time stamp in the inode).
3) When fsync() sees inode with I_DIRTY_TIME set, it calls
4) When we are dropping last inode reference and inode has I_DIRTY_TIME
set, we call mark_inode_dirty_sync().
And that should be it, right?
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR