On Wed, Apr 30, 2008 at 01:17:38PM +0530, Aneesh Kumar K.V wrote:
> On Tue, Apr 29, 2008 at 08:06:01PM +1000, David Chinner wrote:
> > Folks,
> > It appears to me that vmtruncate() is not used correctly in
> > block_write_begin() and friends. The short summary is that it
> > appears that the usage in these functions implies that vmtruncate()
> > should cause truncation of blocks on disk but no filesystem
> > appears to do this, nor does the documentation imply they should.
> Looking at ext*_truncate, I see we are freeing blocks as a part of vmtruncate.
> Or did I miss something ?
No I missed something. I was looking at block_truncate_page() which is
called by various truncate methods but does not do truncation itself.
Still doesn't help XFS, though, as updating different parts of the
inode in different transactions will result in non-atomic ->setattr
updates. Which, given that XFS tends to excel at exposing non-atomic
modifications in crash recovery, is a really bad thing.
Looking further, doing the truncate operation in ->truncate is
probably really stupid simply because the interface does not allow
errors to be returned to the caller. e.g. ufs_setattr() has this
* We don't define our `inode->i_op->truncate', and call it here,
* because of:
* - there is no way to know old size
* - there is no way inform user about error, if it happens in `truncate'
and I've just added a WARN_ON(error) to xfs_vn_truncate() so that
errors don't get lost silently.
UFS also uses block_write_begin(), so it will have exactly the same
problem as XFS - blocks beyond EOF don't get truncated away by
vmtruncate if an error occurs in block_write_begin().
AFAICT, gfs2 is another filesystem that does not have a ->truncate
callback - truncation is driven through the ->setattr interface.
However, gfs2_write_begin() calls vmtruncate() like
block_write_begin() on error from block_prepare_write() and hence
also has this bug.
I'm sure there are other filesystems that, like XFS, UFS and GFS2,
don't do block truncation in ->truncate. Hence it really does seem
that calling vmtruncate() from anything other than a ->setattr method
is a bug because to do so is to make a false assumption about how
filesystems are implemented....
SGI Australian Software Group