xfs_vm_writepage() walks each buffer_head on the page, maps to the block
on disk and attaches to a running ioend structure that represents the
I/O submission. A new ioend is created when the type of I/O (unwritten,
delayed allocation or overwrite) required for a particular buffer_head
differs from the previous. If a buffer_head is a delalloc or unwritten
buffer, the associated bits are cleared by xfs_map_at_offset() once the
buffer_head is added to the ioend.
The process of mapping each buffer_head occurs in xfs_map_blocks() and
acquires the ilock in blocking or non-blocking mode, depending on the
type of writeback in progress. If the lock cannot be acquired for
non-blocking writeback, we cancel the ioend, redirty the page and
return. Writeback will revisit the page at some later point.
Note that we acquire the ilock for each buffer on the page. Therefore
during non-blocking writeback, it is possible to add an unwritten buffer
to the ioend, clear the unwritten state, fail to acquire the ilock when
mapping a subsequent buffer and cancel the ioend. If this occurs, the
unwritten status of the buffer sitting in the ioend has been lost. The
page will eventually hit writeback again, but xfs_vm_writepage() submits
overwrite I/O instead of unwritten I/O and does not perform unwritten
extent conversion at I/O completion. This leads to data corruption
because unwritten extents are treated as holes on reads and zeroes are
returned instead of reading from disk.
Modify xfs_cancel_ioend() to restore the buffer unwritten bit for ioends
of type XFS_IO_UNWRITTEN. This ensures that unwritten extent conversion
occurs once the page is eventually written back.
Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
This fixes the problem I've been chasing the past few days with regard
to data corruption after hole punch. It turns out the problem isn't
directly related to the hole punch after all. I found that the problem
still occurs when I remove the truncate_pagecache_range() rounding from
xfs_free_file_space(). The difference is data is still sitting in cache
and the pages aren't thrown away until reclaimed. E.g., the file
corruption is reproducible on subsequent remount. The current method of
rounding the truncate throws the pages away such that the corruption is
detectable immediately after the hole punch.
This is only lightly tested so far with my hacky reproducer. The next
step is to turn it into a regression test. xfstests test to follow...
fs/xfs/xfs_aops.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 2f50253..f5b2453 100644
@@ -560,6 +560,13 @@ xfs_cancel_ioend(
next_bh = bh->b_private;
+ * The unwritten flag is cleared when added to the
+ * ioend. We're not submitting for I/O so mark the
+ * buffer unwritten again for next time around.
+ if (ioend->io_type == XFS_IO_UNWRITTEN)
} while ((bh = next_bh) != NULL);