On Mon, May 05, 2014 at 03:19:42PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> truncate_setsize() removes pages from the page cache, and hence
> requires page locks to be held. It is not valid to lock a page cache
> page inside a transaction context as we can hold page locks when we
> we reserve space for a transaction. If we do, then we expose an ABBA
> deadlock between log space reservation and page locks.
> That is, both the write path and writeback lock a page, then start a
> transaction for block allocation, which means they can block waiting
> for a log reservation with the page lock held. If we hold a log
> reservation and then do something that locks a page (e.g.
> truncate_setsize in xfs_setattr_size) then that page lock can block
> on the page locked and waiting for a log reservation. If the
> transaction that is waiting for the page lock is the only active
> transaction in the system that can free log space via a commit,
> then writeback will never make progress and so log space will never
> free up.
> This issue with xfs_setattr_size() was introduced back in 2010 by
> commit fa9b227 ("xfs: new truncate sequence") which moved the page
> cache truncate from outside the transaction context (what was
> xfs_itruncate_data()) to inside the transaction context as a call to
> The reason truncate_setsize() was located where in this place was
> that we can't shouldn't change the file size until after we are in
> the transaction context and the operation will either succeed or
> shut down the filesystem on failure. However, block_truncate_page()
> already modifies the file contents before we enter the transaction
> context, so we can't really fulfill this guarantee in any way. Hence
> we may as well ensure that on success or failure, the in-memory
> inode and data is truncated away and that the application cleans up
> the mess appropriately.
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>