[PATCH, RFC] xfs: don't break from growfs ag update loop on error
Mark Tinguely
tinguely at sgi.com
Tue Sep 10 10:21:49 CDT 2013
On 09/09/13 15:36, Eric Sandeen wrote:
> On 8/15/13 1:15 PM, Eric Sandeen wrote:
>> When xfs_growfs_data_private() is updating backup superblocks,
>> it bails out on the first error encountered, whether reading or
>> writing:
>
> Any thoughts on this one? W/ the verifiers, we have a higher
> chance of encountering an error, and leaving the rest of the
> supers un-updated. Repair will then possibly revert the fs to
> it's pre-growfs state, and data loss will ensue...
>
> Thanks,
> -Eric
>
>> * If we get an error writing out the alternate superblocks,
>> * just issue a warning and continue. The real work is
>> * already done and committed.
>>
>> This can cause a problem later during repair, because repair
>> looks at all superblocks, and picks the most prevalent one
>> as correct. If we bail out early in the backup superblock
>> loop, we can end up with more "bad" matching superblocks than
>> good, and a post-growfs repair may revert the filesystem to
>> the old geometry.
>>
>> With the combination of superblock verifiers and old bugs,
>> we're more likely to encounter read errors due to verification.
>>
>> And perhaps even worse, we don't even properly write any of the
>> newly-added superblocks in the new AGs.
>>
>> Even with this change, growfs will still say:
>>
>> xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning
>> data blocks changed from 319815680 to 335216640
>>
>> which might be confusing to the user, but it at least communicates
>> that something has gone wrong, and dmesg will probably highlight
>> the need for an xfs_repair.
>>
>> And this is still best-effort; if verifiers fail on more than
>> half the backup supers, they may still "win" - but that's probably
>> best left to repair to more gracefully handle by doing its own
>> strict verification as part of the backup super "voting."
>>
>> Signed-off-by: Eric Sandeen<sandeen at redhat.com>
>> ---
Make sense to me - it could have been any kind of error including not
being able to get a xfs_buf for the new secondary (a temp ENOMEM).
I wonder if it could be possible to fix corrupt entries rather than just
skip them...
Probably could test this patch by corrupting a v5 secondary superblock
and verifying with xfs_db.
Reviewed-by: Mark Tinguely <tinguely at sgi.com>
More information about the xfs
mailing list