First, we (Russell and I) have fixed a data corruption bug that has
been plauging us for so long. It was a problem in that prepare_write
would sometimes not read a page. There were two coding errors,
first the valid bits are always true when we allocate
a new page. __pb_block_prepare_write sets the buffer up to date state
if the valid bits are set. This is always done even if the buffer really
isn't up to date.
second, when at_eof is true during a write, __pb_block_prepare_write()
would never read a page. A page must be read first (before modifying) if
the user's write does not start on the page boundary containing EOF.
Russell just checked this in.
The delay_alloc path is broken in a similar way (I think).
Try running the test:
bonnie.engr:~mostek/mmap_l.c
bonnie.engr:~mostek/mmap_l
Run it like:
mmap_l /tmp/a
mmap_l /xfs/a
then cmp the two.
With delay_alloc set, we are not correctly zero'ing the parts of
pages which are not written by a user's write.
The problem comes in the same area where Russell and I found a
data corruption bug (in the delwri path).
The bug is that the block_map invalid bits are never set when we calls
grab_cche_page . It returns a page with the block_map all zeros (i.e.
Valid).
__pb_block_prepare_write_async checks early on:
if (PageBlockAllValid(page)) {
dprintk(pbpw_debug, ("pbpw: page all valid\n"));
goto out;
}
which will always get one to goto out.
We tried fixing this by calling:
PageBlockSetAllInvalid(page);
right after grab_cache_page but this breaks doio with 5 threads.
Do you want to pick this up?
Jim
|