[PATCH] xfs: di_flushiter considered harmful

Mark Tinguely tinguely at sgi.com
Mon Jul 22 09:40:49 CDT 2013


On 07/22/13 06:07, Markus Trippelsdorf wrote:
> On 2013.07.22 at 20:18 +1000, Dave Chinner wrote:
>> From: Dave Chinner<dchinner at redhat.com>
>>
>> When we made all inode updates transactional, we no longer needed
>> the log recovery detection for inodes being newer on disk than the
>> transaction being replayed - it was redundant as replay of the log
>> would always result in the latest version of the inode woul dbe on
>> disk. It was redundant, but left in place because it wasn't
>> considered to be a problem.
>>
>> However, with the new "don't read inodes on create" optimisation,
>> flushiter has come back to bite us. Essentially, the optimisation
>> made always initialises flushiter to zero in the create transaction,
>> and so if we then crash and run recovery and the inode already on
>> disk has a non-zero flushiter it will skip recovery of that inode.
>> As a result, log recovery does the wrong thing and we end up with a
>> corrupt filesystem.
>>
>> Because we have to support old kernel to new kernl upgrades, we
>> can't just get rid of the flushiter support in log recovery as we
>> might be upgrading from a kernel that doesn't have fully transaction
>> inode updates.  Unfortunately, for v4 superblocks there is no way to
>> guarantee that log recovery knows about this fact.
>>
>> We cannot add a new inode format flag to say it's a "special inode
>> create" because it won't be understood by older kernels and so
>> recovery could do the wrong thing on downgrade. We cannot specially
>> detect the combination of zero mode/non-zero flushiter on disk to
>> non-zero mode, zero flushiter in the log item during recovery
>> because wrapping of the flushiter can result in false detection.
>>
>> Hence that makes this "don't use flushiter" optimisation limited to
>> a disk format that guarantees that we don't need it. And that means
>> the only fix here is to limit the "no read IO on create"
>> optimisation to version 5 superblocks....
>
> I think your patch misses the following part:
>

Hi Markus.

Dave's patch is limited to the new v5 (crc) superblock. The constraints 
that has to be dealt with are in the commit message as to why it is 
limited to the new v5 superblock.

Going back to your 07/10/2013 message, your filesystem is:

/dev/root on / type xfs  (rw,relatime,attr2,inode64,logbsize=256k,noquota)

or the non-crc v4 superblock with inode 2 that is probably why it is 
still failing for you.

It seems to me that since we cannot fix this for inode 1/2, then besides 
this patch we have to revert patch cca9f93a52d and make it inode 3+ / 
superblock 5+ (crc) dependent.

--Mark.



More information about the xfs mailing list