xfs
[Top] [All Lists]

delay alloc broken?

To: ananth@xxxxxxx
Subject: delay alloc broken?
From: Jim Mostek <mostek@xxxxxxx>
Date: Fri, 12 May 2000 14:03:29 -0500 (CDT)
Cc: linux-xfs@xxxxxxxxxxx
Sender: owner-linux-xfs@xxxxxxxxxxx
First, we (Russell and I) have fixed a data corruption bug that has
been plauging us for so long. It was a problem in that prepare_write
would sometimes not read a page. There were two coding errors,
        first the valid bits are always true when we allocate
a new page. __pb_block_prepare_write sets the buffer up to date state
if the valid bits are set. This is always done even if the buffer really
isn't up to date.
        second, when at_eof is true during a write, __pb_block_prepare_write()
would never read a page. A page must be read first (before modifying) if
the user's write does not start on the page boundary containing EOF.
Russell just checked this in.

The delay_alloc path is broken in a similar way (I think).

Try running the test:

        bonnie.engr:~mostek/mmap_l.c
        bonnie.engr:~mostek/mmap_l

Run it like:

        mmap_l /tmp/a
        mmap_l /xfs/a

then cmp the two.

With delay_alloc set, we are not correctly zero'ing the parts of
pages which are not written by a user's write.

The problem comes in the same area where Russell and I found a
data corruption bug (in the delwri path).

The bug is that the block_map invalid bits are never set when we calls
grab_cche_page . It returns a page with the block_map all zeros (i.e.
Valid).

__pb_block_prepare_write_async checks early on:

        if (PageBlockAllValid(page)) {
                dprintk(pbpw_debug, ("pbpw: page all valid\n"));
                goto out;
        }

which will always get one to goto out.
We tried fixing this by calling:

        PageBlockSetAllInvalid(page);

right after grab_cache_page but this breaks doio with 5 threads.

Do you want to pick this up?

Jim

<Prev in Thread] Current Thread [Next in Thread>
  • delay alloc broken?, Jim Mostek <=