Ping?
On Wed, May 23, 2007 at 07:21:03PM +1000, David Chinner wrote:
> [Nathan - probably another one for you]
>
> This test run on ia64 (16k page size) on a 4k block size filesystem:
>
> #!/bin/bash
>
> file=$1
> rm -f $file
>
> xfs_io -f \
> -c "truncate 1048576" \
> -c "resvsp 1032192 16384" \
> -c "pwrite 1033216 2560" \
> -c "pwrite 1040384 8192" \
> -c "bmap -vvp" \
> -c "fsync" \
> -c "bmap -vvp" \
> -c "close" \
> $file
>
> Which writes 3 unwritten blocks in a page (first block and last 2)
> results in a corrupted write.
>
> The problem is that the second block on teh page is uninitialised
> and so is skipped by xfs_page_state_convert. The problem is that the
> xfs_ioend structures are not getting created correctly.
>
> When we skip the uninitialised block, we add the second unwritten block
> we are writing to into the original ioend. While this results in
> the correct I/O being sent to disk, it results in a ioend with a
> start offset of 0 and a length of 3 blocks. When we do unwritten
> extent conversion based on this range, we convert the wrong blocks.
>
> What we need to be doing is creating two xfs_ioend structures, one
> for the first block and one for the second set of blocks in the page.
> That way we get two separate I/O completion events and convert the
> ranges separately and correctly.
>
> I've checked xfs_convert_page(), and I don't think it needs any
> fix here - it already appears to force multiple ioends to be used in this
> case...
>
> Thoughts?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
>
> ---
> fs/xfs/linux-2.6/xfs_aops.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_aops.c
> ===================================================================
> --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_aops.c 2007-05-23
> 16:33:04.000000000 +1000
> +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_aops.c 2007-05-23 17:52:15.540456674
> +1000
> @@ -1008,6 +1008,8 @@ xfs_page_state_convert(
> if (buffer_unwritten(bh) || buffer_delay(bh) ||
> ((buffer_uptodate(bh) || PageUptodate(page)) &&
> !buffer_mapped(bh) && (unmapped || startio))) {
> + int new_ioend = 0;
> +
> /*
> * Make sure we don't use a read-only iomap
> */
> @@ -1026,6 +1028,15 @@ xfs_page_state_convert(
> }
>
> if (!iomap_valid) {
> + /*
> + * if we didn't have a valid mapping then we
> + * need to ensure that we put the new mapping
> + * in a new ioend structure. This needs to be
> + * done to ensure that the ioends correctly
> + * reflect the block mappings at io completion
> + * for unwritten extent conversion.
> + */
> + new_ioend = 1;
> if (type == IOMAP_NEW) {
> size = xfs_probe_cluster(inode,
> page, bh, head, 0);
> @@ -1045,7 +1056,7 @@ xfs_page_state_convert(
> if (startio) {
> xfs_add_to_ioend(inode, bh, offset,
> type, &ioend,
> - !iomap_valid);
> + new_ioend);
> } else {
> set_buffer_dirty(bh);
> unlock_buffer(bh);
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
|