On Thu, Mar 03, 2011 at 10:48:19AM -0500, Christoph Hellwig wrote:
> I don't think we'll be able to get around chaning the dirty_inode
> callback. We need a way to prevent the VFS from marking the inode
> dirty, otherwise we have no chance of reclaiming it.
>
> Except for that I think it's really simple:
>
> 1) we need to reintroduce the i_update_size flag or an equivalent to
> distinguish unlogged timestamp from unlogged size updates for fsync
> vs fdatasync. At that point we can stop looking at the VFS dirty
> bits in fsync.
> 2) ->dirty_inode needs to tag the inode as dirty in the inode radix
> tree
>
> With those minimal changes we should be set - we already
> callxfs_sync_attr from the sync_fs path, and xfs_sync_inode_attr
> properly picks up inodes with unlogged changes.
Actually xfs_sync_attr does not get called from the sync path right now,
which is a bit odd. But once we add it, possibly with an earlier
trylock pass and/or an inode cluster read-ahead the above plan still
stands.
What's also rather odd is how much we use xfs_sync_data - unlike the
inodes where our own code doing writeback based on disk order makes
a lot of sense data is actually handled very well by the core writeback
code. The two remaining callers of xfs_sync_data are
xfs_flush_inodes_work and xfs_quiesce_data. The former area really
belongs into this patchset - can you try what only calling
writeback_inodes* from the ENOSPC handler instead of doing our own stuff
does? It should give you the avoidance of double writeout for free, and
get rid of one of xfs_sync_data callers.
After that we just need to look into xfs_quiesce_data. The core
writeback code now does reliably writeback before calling into
->sync_fs, so the actual writeback should be superflous. We will still
need a log force after it, and we might need an iteration through all
inodes to do an xfs_ioend_wait, but this are can be simplified a lot.
|