[PATCH] fix corruption case for block size < page size

Dave Chinner david at fromorbit.com
Wed Jan 7 15:42:44 CST 2009


On Wed, Jan 07, 2009 at 05:32:09PM +1100, Lachlan McIlroy wrote:
> Eric Sandeen wrote:
> > Lachlan McIlroy wrote:
> >> Eric Sandeen wrote:
> >>> Eric Sandeen wrote:
> >>>
> >>>> Gah; or not.  what is going on here...  Doing just steps 1, 2, 3, 4
> >>>> (ending on the extending truncate):
> >>>>
> >>>> # xfs_io -c "pwrite -S 0x11 -b 4096 0 4096" -c "mmap -r 0 512" -c "mread
> >>>> 0 512" -c "munmap" -c "truncate 256" -c "truncate 514" -t -d -f
> >>>> /mnt/scratch/testfile
> >>>>
> >>>> # xfs_bmap -v /mnt/scratch/testfile
> >>>> /mnt/scratch/testfile:
> >>>>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
> >>>>    0: [0..0]:          63..63            0 (63..63)             1
> >>>>    1: [1..1]:          hole                                     1
> >>>>
> >>>> It looks like what I expect, at this point.  But then:
> >>>>
> >>>> # sync
> >>>> # xfs_bmap -v /mnt/scratch/testfile
> >>>> /mnt/scratch/testfile:
> >>>>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
> >>>>    0: [0..1]:          63..64            0 (63..64)             2
> >>>>
> >>>> Um, why'd that last block get mapped in?  mmap vs. direct IO I'm
> >>>> guessing... w/o the mmap read this does not happen.
> >>> Replying to myself twice?  I really need to go to bed.
> >>>
> >>> So this all does seem to come back to page_state_convert.
> >>>
> >>> Both the extending write in the original case and the sync above find
> >>> their way there; but esp. in the sync test above, why do we have *any*
> >>> work to do?
> >> Eric, did you find out why sync was allocating that second block?
> > 
> > I'm afraid this has been on the back burner (or maybe further back) for
> > a while... so... either "no" or "I don't remember" :)
> 
> Just trying your test case.  It's not related to direct I/O or mmap I/O
> since I can reproduce it without those.
> 
> # xfs_io -f -c "pwrite -S 0x11 -b 513 0 513" -c "truncate 1" -c "truncate 513" file
> wrote 513/513 bytes at offset 0
> 513.000000 bytes, 1 ops; 0.0000 sec (8.895 MiB/sec and 18181.8182 ops/sec)
> # xfs_bmap -vvp file; sync; xfs_bmap -vvp file
> file:
>   EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>     0: [0..0]:          48..48            0 (48..48)             1 00000
>     1: [1..1]:          hole                                     1
....
> file:
>   EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>     0: [0..1]:          48..49            0 (48..49)             2 00000
....
> xfs_bmap will cause the file to be flushed so there should be no dirty
> data to be flushed during the sync.  Strange.

Yeah, I noticed that as well. AFAICT it is because the state on
on the second buffer has been trashed by the discard but it has been
left uptodate. The result is that xfs_flushpages() doesn't think
that there is anything to write out and the bmap code finds no
extent on that range so іt is considered a hole.

However, when the VFS comes along and writes the dirty inode it
thinks that the page is dirty and writes it.
xfs_page_state_convert() then sees the second buffer as uptodate but
unmapped and within EOF so it allocates the block and writes it out.

IIRC, the code in xfs_page_state_convert() did this allocation to
catch mmap() writes into holes. We have ->page_mkwrite to catch this
now and turn them into delalloc writes, so perhaps this code path is
no longer needed anymore?

Cheers,

Dave.
-- 
Dave Chinner
david at fromorbit.com




More information about the xfs mailing list