[Top] [All Lists]

Re: [PATCH 5/5] xfs: kick inode writeback when low on memory

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 5/5] xfs: kick inode writeback when low on memory
From: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Thu, 3 Mar 2011 11:19:29 -0500
Cc: chris.mason@xxxxxxxxxx, xfs@xxxxxxxxxxx
In-reply-to: <20110303154819.GA3945@xxxxxxxxxxxxx>
References: <1298412969-14389-1-git-send-email-david@xxxxxxxxxxxxx> <1298412969-14389-6-git-send-email-david@xxxxxxxxxxxxx> <20110302030602.GD4905@dastard> <20110302141220.GA4363@xxxxxxxxxxxxx> <20110303024228.GB15097@dastard> <20110303154819.GA3945@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Mar 03, 2011 at 10:48:19AM -0500, Christoph Hellwig wrote:
> I don't think we'll be able to get around chaning the dirty_inode
> callback.  We need a way to prevent the VFS from marking the inode
> dirty, otherwise we have no chance of reclaiming it.
> Except for that I think it's really simple:
>  1) we need to reintroduce the i_update_size flag or an equivalent to
>     distinguish unlogged timestamp from unlogged size updates for fsync
>     vs fdatasync.  At that point we can stop looking at the VFS dirty
>     bits in fsync.
>  2) ->dirty_inode needs to tag the inode as dirty in the inode radix
>     tree
> With those minimal changes we should be set - we already
> callxfs_sync_attr from the sync_fs path, and xfs_sync_inode_attr
> properly picks up inodes with unlogged changes.

Actually xfs_sync_attr does not get called from the sync path right now,
which is a bit odd.  But once we add it, possibly with an earlier
trylock pass and/or an inode cluster read-ahead the above plan still

What's also rather odd is how much we use xfs_sync_data - unlike the
inodes where our own code doing writeback based on disk order makes
a lot of sense data is actually handled very well by the core writeback
code.  The two remaining callers of xfs_sync_data are
xfs_flush_inodes_work and xfs_quiesce_data.  The former area really
belongs into this patchset - can you try what only calling
writeback_inodes* from the ENOSPC handler instead of doing our own stuff
does?  It should give you the avoidance of double writeout for free, and
get rid of one of xfs_sync_data callers.

After that we just need to look into xfs_quiesce_data.  The core
writeback code now does reliably writeback before calling into
->sync_fs, so the actual writeback should be superflous.  We will still
need a log force after it, and we might need an iteration through all
inodes to do an xfs_ioend_wait, but this are can be simplified a lot.

<Prev in Thread] Current Thread [Next in Thread>