xfs
[Top] [All Lists]

Re: REVIEW: Fix for incore extent corruption.

To: "Eric Sandeen" <sandeen@xxxxxxxxxxx>
Subject: Re: REVIEW: Fix for incore extent corruption.
From: "Eric Sandeen" <sandeen@xxxxxxxxxxx>
Date: Thu, 18 Sep 2008 14:28:58 -0500 (CDT)
Cc: lachlan@xxxxxxx, "Russell Cattelan" <cattelan@xxxxxxxxxxx>, xfs@xxxxxxxxxxx
Importance: Normal
In-reply-to: <59243.131.252.241.230.1221762601.squirrel@xxxxxxxxxxx>
References: <48D19A83.4040608@xxxxxxxxxxx> <48D1CD46.4010104@xxxxxxx> <48D1DCD5.7040502@xxxxxxxxxxx> <48D218AE.9090400@xxxxxxx> <59243.131.252.241.230.1221762601.squirrel@xxxxxxxxxxx>
User-agent: SquirrelMail/1.4.8-4.0.1.el5.centos.2
Eric Sandeen wrote:
> Lachlan McIlroy wrote:
>> Russell, this fixes xfs_iext_irec_compact_full().  If we don't move
>> all the records from the next page into the current page then we need
>> to update the er_extoff of the modified page as we move the remaining
>> extents up.  Would you mind giving it a go?
>>
>> --- a/fs/xfs/xfs_inode.c     2008-09-18 18:48:46.000000000 +1000
>> +++ b/fs/xfs/xfs_inode.c     2008-09-18 18:57:18.000000000 +1000
>> @@ -4623,6 +4623,7 @@ xfs_iext_irec_compact_full(
>>                                      (XFS_LINEAR_EXTS -
>>                                              erp_next->er_extcount) *
>>                                      sizeof(xfs_bmbt_rec_t));
>> +                            erp_next->er_extoff += ext_diff;
>>                      }
>>              }
>
> Lachlan, I concur.  I spent way too long last night looking at this and
> arrived at the same conclusion about the root cause of the problem, but
> didn't hae *quite* the right solution.  I blame it on 2am ;)  Your fix
> looks right.

FWIW; some supporting information from debugging etc.

xfs_iext_irec_compact_full:

Move 1 item from BUF2 into BUF1, and compact BUF2

                         copy                memmove/zero
         BUF1    BUF2    --->   BUF1     BUF2    --->    BUF1    BUF2
        +-----+ +-----+         +-----+ +-----+         +-----+ +-----+
        |  0  | |  3  |         |  0  | |     |         |  0  | |  4  |
        +-----+ +-----+         +-----+ +-----+         +-----+ +-----+
        |  1  | |  4  |         |  1  | |  4  |         |  1  | |     |
        +-----+ +-----+         +-----+ +-----+         +-----+ +-----+
        |  2  | |     |         |  2  | |     |         |  2  | |     |
        +-----+ +-----+         +-----+ +-----+         +-----+ +-----+
        |     | |     |         |  3  | |     |         |  3  | |     |
        +-----+ +-----+         +-----+ +-----+         +-----+ +-----+
er_count   3       2                                       4       1
er_offset  0       3                                       0       4

If we don't update er_offset properly in BUF2, then a lookup for extent
index 3 may find the first one in BUF2, not the last one in BUF1 (both
claim to be "extent index 3"

>From some tracing when I hit this path:

...
250: ffff810065c61fa0 startoff 251 startblock 263 blockcount 1 flag 1
251: ffff810065c61fb0 startoff 252 startblock NULLSTARTBLOCK(5) blockcount
1 flag 0
252: ffff810065c61fc0 startoff 253 startblock 265 blockcount 1 flag 1
253: ffff810065c61fd0 startoff 254 startblock NULLSTARTBLOCK(5) blockcount
1 flag 0
254: ffff810065c90000 startoff 255 startblock 267 blockcount 1 flag 1
255: ffff810065c90010 startoff 256 startblock NULLSTARTBLOCK(5) blockcount
1 flag 0
256: ffff810065c90020 startoff 257 startblock 269 blockcount 1 flag 1
257: ffff810065c90030 startoff 258 startblock NULLSTARTBLOCK(5) blockcount
1 flag 0
258: ffff810065c90040 startoff 259 startblock 271 blockcount 1 flag 1
259: ffff810065c90050 startoff 260 startblock NULLSTARTBLOCK(5) blockcount
1 flag 0
...

move enough to fill the previous page:
copy 2 (32) from ffff810065c90000 to ffff810065c61fe0

next page is not empty, so shift up:

move 254 (4064) from ffff810065c90020 to ffff810065c90000

But then I ran through the entire extent list for all indexes in order, and:

250: ffff810065c61fa0 startoff 251 startblock 263 blockcount 1 flag 1
251: ffff810065c61fb0 startoff 252 startblock NULLSTARTBLOCK(5) blockcount
1 flag 0
252: ffff810065c61fc0 startoff 253 startblock 265 blockcount 1 flag 1
253: ffff810065c61fd0 startoff 254 startblock NULLSTARTBLOCK(5) blockcount
1 flag 0
--- XXX where are starting offsets 255, 256 XXX ---
254: ffff810065c90000 startoff 257 startblock 269 blockcount 1 flag 1
255: ffff810065c90010 startoff 258 startblock NULLSTARTBLOCK(5) blockcount
1 flag 0
256: ffff810065c90020 startoff 259 startblock 271 blockcount 1 flag 1

starting *offsets* 255, 256 are lost because the next buffer was still
claiming to start at extent index 254 so it essentially jumped there,
missing the 2 extents we added to the previous buffer.

in addition, since the er_startoff for this last buffer was wrong, so was
the last extent record - off by one, and looked at uninit'd memory:

507: ffff810065c90fd0 startoff 510 startblock NULLSTARTBLOCK(5) blockcount
1 flag 0
508: ffff810065c92fc0 startoff 483406127300608 startblock 2014118168
blockcount 196608 flag 0
wtf, ext 509 out of order (1888313573376 < 483406127300608)?

hope that's more useful than confusing :)

Anyway I really looked closely at this and I think Lachlan is spot-on.

I might even suggest proposing this and the previous fix for -stable....

-Eric

<Prev in Thread] Current Thread [Next in Thread>