On Wed, Jan 07, 2009 at 05:32:09PM +1100, Lachlan McIlroy wrote:
Eric Sandeen wrote:
Lachlan McIlroy wrote:
Eric Sandeen wrote:
Eric Sandeen wrote:
Gah; or not. what is going on here... Doing just steps 1, 2, 3, 4
(ending on the extending truncate):
# xfs_io -c "pwrite -S 0x11 -b 4096 0 4096" -c "mmap -r 0 512" -c "mread
0 512" -c "munmap" -c "truncate 256" -c "truncate 514" -t -d -f
/mnt/scratch/testfile
# xfs_bmap -v /mnt/scratch/testfile
/mnt/scratch/testfile:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..0]: 63..63 0 (63..63) 1
1: [1..1]: hole 1
It looks like what I expect, at this point. But then:
# sync
# xfs_bmap -v /mnt/scratch/testfile
/mnt/scratch/testfile:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..1]: 63..64 0 (63..64) 2
Um, why'd that last block get mapped in? mmap vs. direct IO I'm
guessing... w/o the mmap read this does not happen.
Replying to myself twice? I really need to go to bed.
So this all does seem to come back to page_state_convert.
Both the extending write in the original case and the sync above find
their way there; but esp. in the sync test above, why do we have *any*
work to do?
Eric, did you find out why sync was allocating that second block?
I'm afraid this has been on the back burner (or maybe further back) for
a while... so... either "no" or "I don't remember" :)
Just trying your test case. It's not related to direct I/O or mmap I/O
since I can reproduce it without those.
# xfs_io -f -c "pwrite -S 0x11 -b 513 0 513" -c "truncate 1" -c "truncate 513"
file
wrote 513/513 bytes at offset 0
513.000000 bytes, 1 ops; 0.0000 sec (8.895 MiB/sec and 18181.8182 ops/sec)
# xfs_bmap -vvp file; sync; xfs_bmap -vvp file
file:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..0]: 48..48 0 (48..48) 1 00000
1: [1..1]: hole 1
....
file:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
0: [0..1]: 48..49 0 (48..49) 2 00000
....
xfs_bmap will cause the file to be flushed so there should be no dirty
data to be flushed during the sync. Strange.
Yeah, I noticed that as well. AFAICT it is because the state on
on the second buffer has been trashed by the discard but it has been
left uptodate. The result is that xfs_flushpages() doesn't think
that there is anything to write out and the bmap code finds no
extent on that range so Ñt is considered a hole.
However, when the VFS comes along and writes the dirty inode it
thinks that the page is dirty and writes it.
xfs_page_state_convert() then sees the second buffer as uptodate but
unmapped and within EOF so it allocates the block and writes it out.
IIRC, the code in xfs_page_state_convert() did this allocation to
catch mmap() writes into holes. We have ->page_mkwrite to catch this
now and turn them into delalloc writes, so perhaps this code path is
no longer needed anymore?