[PATCH] fix corruption case for block size < page size

Lachlan McIlroy lachlan at sgi.com
Thu Jan 8 18:18:37 CST 2009


Dave Chinner wrote:
> On Wed, Jan 07, 2009 at 05:32:09PM +1100, Lachlan McIlroy wrote:
>> Eric Sandeen wrote:
>>> Lachlan McIlroy wrote:
>>>> Eric Sandeen wrote:
>>>>> Eric Sandeen wrote:
>>>>>
>>>>>> Gah; or not.  what is going on here...  Doing just steps 1, 2, 3, 4
>>>>>> (ending on the extending truncate):
>>>>>>
>>>>>> # xfs_io -c "pwrite -S 0x11 -b 4096 0 4096" -c "mmap -r 0 512" -c "mread
>>>>>> 0 512" -c "munmap" -c "truncate 256" -c "truncate 514" -t -d -f
>>>>>> /mnt/scratch/testfile
>>>>>>
>>>>>> # xfs_bmap -v /mnt/scratch/testfile
>>>>>> /mnt/scratch/testfile:
>>>>>>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>>>>>>    0: [0..0]:          63..63            0 (63..63)             1
>>>>>>    1: [1..1]:          hole                                     1
>>>>>>
>>>>>> It looks like what I expect, at this point.  But then:
>>>>>>
>>>>>> # sync
>>>>>> # xfs_bmap -v /mnt/scratch/testfile
>>>>>> /mnt/scratch/testfile:
>>>>>>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
>>>>>>    0: [0..1]:          63..64            0 (63..64)             2
>>>>>>
>>>>>> Um, why'd that last block get mapped in?  mmap vs. direct IO I'm
>>>>>> guessing... w/o the mmap read this does not happen.
>>>>> Replying to myself twice?  I really need to go to bed.
>>>>>
>>>>> So this all does seem to come back to page_state_convert.
>>>>>
>>>>> Both the extending write in the original case and the sync above find
>>>>> their way there; but esp. in the sync test above, why do we have *any*
>>>>> work to do?
>>>> Eric, did you find out why sync was allocating that second block?
>>> I'm afraid this has been on the back burner (or maybe further back) for
>>> a while... so... either "no" or "I don't remember" :)
>> Just trying your test case.  It's not related to direct I/O or mmap I/O
>> since I can reproduce it without those.
>>
>> # xfs_io -f -c "pwrite -S 0x11 -b 513 0 513" -c "truncate 1" -c "truncate 513" file
>> wrote 513/513 bytes at offset 0
>> 513.000000 bytes, 1 ops; 0.0000 sec (8.895 MiB/sec and 18181.8182 ops/sec)
>> # xfs_bmap -vvp file; sync; xfs_bmap -vvp file
>> file:
>>   EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>>     0: [0..0]:          48..48            0 (48..48)             1 00000
>>     1: [1..1]:          hole                                     1
> ....
>> file:
>>   EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>>     0: [0..1]:          48..49            0 (48..49)             2 00000
> ....
>> xfs_bmap will cause the file to be flushed so there should be no dirty
>> data to be flushed during the sync.  Strange.
> 
> Yeah, I noticed that as well. AFAICT it is because the state on
> on the second buffer has been trashed by the discard but it has been
> left uptodate. The result is that xfs_flushpages() doesn't think
> that there is anything to write out and the bmap code finds no
> extent on that range so іt is considered a hole.
> 
> However, when the VFS comes along and writes the dirty inode it
> thinks that the page is dirty and writes it.
> xfs_page_state_convert() then sees the second buffer as uptodate but
> unmapped and within EOF so it allocates the block and writes it out.
> 
> IIRC, the code in xfs_page_state_convert() did this allocation to
> catch mmap() writes into holes. We have ->page_mkwrite to catch this
> now and turn them into delalloc writes, so perhaps this code path is
> no longer needed anymore?

Thanks Dave.

Eric, I tried your patch again and it fixed this problem but from the
sound of Dave's description it may be a case of removing code rather
than adding another check.




More information about the xfs mailing list