http://oss.sgi.com/bugzilla/show_bug.cgi?id=418
------- Additional Comments From dgc@xxxxxxx 2007-02-06 01:35 CST -------
Ok, so running the test program properly (silly cut'n'paste problem),
I see the problem. Sorry, my fault.
I modified the test case to a 128k file (smaller, simpler), then
ran it once to get an unwritten extent on disk. I then wrote
from /dev/urandom into the file and sync'd that to disk. Then,
using direct I/o from teh block device, I read the data that was
written to disk to make sure it was there. It was.
Taking advantage of the fact that if you truncate a file and then
write back to it, you typically get the same extent, i modified the
test program to use O_TRUNC on open() rather than needing to rm it
every time. This puts the unwritten extent straight back over the
top of the blocks that were freed at open and that we wrote random
data to.
Then I ran the test program again and read the data back off disk
from the block device to see whether the changes made by mmap actually
hit the disk. The first 16k (this is an altix with 16k page size I'm
testing on) of the blocks on disk had 0xc3 as teh first byte and zeros
for the rest. The last byte of the blocks on disk was 0xa5 and the
last rest of the last 16k of the blocks on disk was zero. The modification
inteh middle of 0xb4 was there surrounded by zeros as well.
IOWs, the data written by mmap hit the disk but we did not do unwritten
extent conversion when we wrote the data out.
Clearly, this is because the page is initially read from disk by the
page fault code, and then later it marks the page dirty. The problem here
is the read from disk does not mark the buffers on the page unwritten
if we are reading from an unwritten extent. Hence when it gets dirtied
and written out, all we do is allocate the underlying disk space; we
don't actually do unwritten extent conversion on it because it is
not an unwritten extent.
Given this, i think that if I read() from an unwritten extent, then
write() to that same region, the write will not cause unwritten extent
conversion as there will already be mapped buffers on the page and
so we won't remap the page and hence can't get the unwritten state set.
From my history:
1112 do_trunc= do_mmap= ./resv 131072 blaat
1113 dd if=/dev/urandom of=blaat bs=128k count=1 conv=notrunc
1114 do_trunc= do_mmap= ./resv 131072 blaat
1115 ls -lh blaat && du -sh blaat && xfs_bmap -v blaat
[get block number from xfs_bmap -v output]
[single unwritten extent]
1116 dd if=/dev/mapper/test_vg-fred of=t.t bs=512 skip=770776 count=256
iflag=direct;
1117 od -x -A x t.t > t.tt
[check t.tt for zero regions in start middle and end]
[still a single unwritten extent]
1119 dd if=blaat of=/dev/null bs=16384 skip=2 count=1; dd if=/dev/zero
of=blaat bs=16384 seek=2 count=1 conv=notrunc; sync
1120 dd if=/dev/mapper/test_vg-fred of=t.t bs=512 skip=770776 count=256
iflag=direct; od -x -A x t.t > t.tt
1121 ls -lh blaat && du -sh blaat && xfs_bmap -v blaat
[ still a single unwritten extent]
[ check t.tt - new zero region at offset 32k for 16k ]
So, I can reproduce the mmap behaviour with read() and write().
Hmmmm - i think that delalloc has the same problem as well, only
that will cause unreserved allocation during writeback - that could
cause problems near ENOSPC, I think.
So I know what the problem is, I'll have to think about how to fix it
now. More later.
--
Configure bugmail: http://oss.sgi.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
|