xfs
[Top] [All Lists]

Re: [PATCH, RFC] xfs: don't break from growfs ag update loop on error

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: [PATCH, RFC] xfs: don't break from growfs ag update loop on error
From: Mark Tinguely <tinguely@xxxxxxx>
Date: Tue, 10 Sep 2013 10:21:49 -0500
Cc: Eric Sandeen <sandeen@xxxxxxxxxx>, "'linux-xfs@xxxxxxxxxxx'" <linux-xfs@xxxxxxxxxxx>
Delivered-to: linux-xfs@xxxxxxxxxxx
In-reply-to: <522E3142.7090501@xxxxxxxxxxx>
References: <520D1AAC.8090701@xxxxxxxxxx> <522E3142.7090501@xxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120122 Thunderbird/9.0
On 09/09/13 15:36, Eric Sandeen wrote:
On 8/15/13 1:15 PM, Eric Sandeen wrote:
When xfs_growfs_data_private() is updating backup superblocks,
it bails out on the first error encountered, whether reading or
writing:

Any thoughts on this one?  W/ the verifiers, we have a higher
chance of encountering an error, and leaving the rest of the
supers un-updated.  Repair will then possibly revert the fs to
it's pre-growfs state, and data loss will ensue...

Thanks,
-Eric

* If we get an error writing out the alternate superblocks,
* just issue a warning and continue.  The real work is
* already done and committed.

This can cause a problem later during repair, because repair
looks at all superblocks, and picks the most prevalent one
as correct.  If we bail out early in the backup superblock
loop, we can end up with more "bad" matching superblocks than
good, and a post-growfs repair may revert the filesystem to
the old geometry.

With the combination of superblock verifiers and old bugs,
we're more likely to encounter read errors due to verification.

And perhaps even worse, we don't even properly write any of the
newly-added superblocks in the new AGs.

Even with this change, growfs will still say:

   xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning
   data blocks changed from 319815680 to 335216640

which might be confusing to the user, but it at least communicates
that something has gone wrong, and dmesg will probably highlight
the need for an xfs_repair.

And this is still best-effort; if verifiers fail on more than
half the backup supers, they may still "win" - but that's probably
best left to repair to more gracefully handle by doing its own
strict verification as part of the backup super "voting."

Signed-off-by: Eric Sandeen<sandeen@xxxxxxxxxx>
---

Make sense to me - it could have been any kind of error including not being able to get a xfs_buf for the new secondary (a temp ENOMEM).

I wonder if it could be possible to fix corrupt entries rather than just skip them...

Probably could test this patch by corrupting a v5 secondary superblock and verifying with xfs_db.

Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>

<Prev in Thread] Current Thread [Next in Thread>