[Top] [All Lists]

Re: [PATCH V2] xfs: truncate_setsize should be outside transactions

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: [PATCH V2] xfs: truncate_setsize should be outside transactions
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Sat, 3 May 2014 09:23:39 +1000
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140502100802.GB14028@xxxxxxxxxxxxx>
References: <1398983979-23696-1-git-send-email-david@xxxxxxxxxxxxx> <20140502045443.GA8867@xxxxxxxxxxxxx> <20140502050053.GA17578@xxxxxxxxxxxxx> <20140502064700.GB26353@dastard> <20140502070054.GC26353@dastard> <20140502100802.GB14028@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Fri, May 02, 2014 at 03:08:02AM -0700, Christoph Hellwig wrote:
> On Fri, May 02, 2014 at 05:00:54PM +1000, Dave Chinner wrote:
> > The reason truncate_setsize() was located where in this place was
> > that we can't change the file size until after we are in the
> > transaction context and the operation will either succeed or shut
> > down the filesystem on failure. Hence we have to split
> > truncate_setsize() back into a pagecache operation that occurs
> > before the transaction context, and a i_size_write() call that
> > happens within the transaction context.
> Further updating myself earlier on the comment next to
> truncate_pagecache claims that the file size must have been updated
> before, but I can't see a reason for that.

Oh, I can, and that reminds me of why - racing with mmap page
faults, which aren't serialised against truncate except by an
indirect combination of the page locks and i_size updates. hence if
we remove the pages before updating the inode size, then a page
fault can re-instantiate a page after the truncation beyond the new
EOF when, in fact, it should SEGV.

So, no, we can't split truncate_setsize() like this.

As it is, we've already made a user visible data change in the truncate process
before we get to the transaction that can fail:
block_truncate_page() zeroes the tail of the page cache page. Hence
if the transaction reservation fails, we've already trashed the file
data - we may as well finish off the job and at least make it look
like the truncate succeeded from a user point of view. They then get
a ENOMEM error (only non-fatal error that can come from
xfs_trans_reserve) and try the truncate again....

So I now think the first version of the patch is better than this


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>