xfsrestore report SUCCESS but not restore all files in kernel 3.17

Tommy Wu wu.tommy at gmail.com
Thu Oct 30 20:51:27 CDT 2014


After apply this patch, xfsdump will hang again.

the log for xfsdump:
https://mega.co.nz/#!tUgz3QrC!q7Ix53sl9lYCg0iKRHjgOqF0tZJAVKHBNz0OHWof-ds

But 'echo w > /proc/sysrq-trigger' no any blocked task in dmesg.


2014-10-31 8:09 GMT+08:00 Dave Chinner <david at fromorbit.com>:

> On Fri, Oct 31, 2014 at 10:04:57AM +1100, Dave Chinner wrote:
> > On Thu, Oct 30, 2014 at 04:39:01PM +0800, Tommy Wu wrote:
> > > I'm confusing now...
> > >
> > > in the test VM and one of my machine, rebuild xfsdump/xfsprogs will
> make
> > > the test work.
> >
> > I know why, too.
> >
> > > the xfsdump command I use:
> > > xfsdump -v debug -l 0 -o -p 300 -J -F -M test -L test -f test.xfsdump
> > > /mnt/backup > xfsdump.log
> > > the log:
> > >
> https://mega.co.nz/#!xIhTjD6Y!Gwp_ilW5UyRiqn2DSxPTHM0TH5qBghMlKXAmHoWk2iQ
> >
> > This indicates the bulkstat on the dump side was terminated
> > prematurely.
> >
> > > the xfsrestore command I use:
> > > xfsrestore -v debug,tree=nitty -p 300 -J -f test.xfsdump
> /vol/backup/x/ >
> > > xfsrestore.log
> > > the log:
> > >
> https://mega.co.nz/#!oZIFWQST!VUY90EZ8XvZDYy9GwF5mqtQVtWNW90Sar5MZhNzGbYI
> >
> > And the errors in the restore side are because there are missing
> > directory inodes in the dump, hence orphaning occurs.
> >
> > It might take me a little while to come up with a fix - I've noticed
> > a few things that I need to check out in more detail in tracking
> > this down....
>
> Can you try this patch?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david at fromorbit.com
>
> xfs: handle munged bulkstat cookies correctly
>
> From: Dave Chinner <dchinner at redhat.com>
>
> xfsdump munges the last inode cookie that bulkstat passes to
> userspace so subsequent calls can restart at the last inode that
> bulkstat parsed. xfsdump thinks that it needs to decrement the
> cookie so that it catches the last inode, and this affects the
> tree search initialisation code.
>
> By decrementing the lastino cookie, the restart inode number can
> point to non-existent inode number. In 3.16, bulkstat would simply
> increment the cookie until it either found the next valid inode or
> the end of the AG, at which point it would just continue on as
> though nothing had happened. This is a relatively difficult
> situation to contrive so is extremely hard to test for, but is
> likely to occur fairly frequently in the real world.
>
> The current code, however, treats a cookie that points off into the
> weeds as invalid and returns a error. This has the effect of causing
> xfsdump to think it has reached the end of the bulkstat scan, and so
> if it's cookie munging happens just wrong it will cause incomplete
> dumps to occur. These dumps still have all the file data in them,
> just he directory structure is broken and so restore will end up
> with lots and lots of files in the orphanage (i.e. lost+found).
>
> IOWs, we broke xfsdump in a nasty, subtle way that users don't
> notice until they restore a dump.
>
> To fix this, restore the old behaviour of incrementing the inode
> number derived from the cookie when we fail to find a matching inode
> record.
>
> cc: stable at vger.kernel.org
> Reported-by: Tommy Wu <wu.tommy at gmail.com>
> Signed-off-by: Dave Chinner <dchinner at redhat.com>
> ---
>  fs/xfs/xfs_itable.c | 34 ++++++++++++++++++++++++++++++----
>  1 file changed, 30 insertions(+), 4 deletions(-)
>
> diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> index 7765ff7..dcc0c99 100644
> --- a/fs/xfs/xfs_itable.c
> +++ b/fs/xfs/xfs_itable.c
> @@ -208,6 +208,15 @@ xfs_bulkstat_ichunk_ra(
>   * if we found the chunk.  If the inode was not the last in the chunk and
> there
>   * are some left allocated, update the data for the pointed-to record as
> well as
>   * return the count of grabbed inodes.
> + *
> + * If this is the first lookup in a bulkstat call, we have to be careful
> about
> + * the agino that is passed - it may not point to a valid inode record.
> xfsdump,
> + * in particular, screws with the lastino cookie between bulkstat calls
> and so
> + * can point it off into the weeds. Hence we can't return an error here if
> + * the agino doesn't point to the correct inode chunk. The previous code
> simply
> + * incremented agino and tried again. This will work for xfsdump as the
> next
> + * lookup will either find the correct chunk or the end of the AG, so
> let's just
> + * keep doing that.
>   */
>  STATIC int
>  xfs_bulkstat_grab_ichunk(
> @@ -226,7 +235,7 @@ xfs_bulkstat_grab_ichunk(
>                 return error;
>         if (!stat) {
>                 *icount = 0;
> -               return error;
> +               return 0;
>         }
>
>         /* Get the record, should always work */
> @@ -236,8 +245,11 @@ xfs_bulkstat_grab_ichunk(
>         XFS_WANT_CORRUPTED_RETURN(stat == 1);
>
>         /* Check if the record contains the inode in request */
> -       if (irec->ir_startino + XFS_INODES_PER_CHUNK <= agino)
> -               return -EINVAL;
> +       if (irec->ir_startino + XFS_INODES_PER_CHUNK <= agino) {
> +               /* oops, someone is feeding us garbage! */
> +               *icount = -1;
> +               return 0;
> +       }
>
>         idx = agino - irec->ir_startino + 1;
>         if (idx < XFS_INODES_PER_CHUNK &&
> @@ -428,6 +440,17 @@ xfs_bulkstat(
>                         error = xfs_bulkstat_grab_ichunk(cur, agino,
> &icount, &r);
>                         if (error)
>                                 goto del_cursor;
> +                       if (icount < 0) {
> +                               /*
> +                                * A pox be cast over xfsdump!
> +                                *
> +                                * No record containing agino was found.
> Bump
> +                                * it, try again until we find a chunk or
> fall
> +                                * off the end of the AG.
> +                                */
> +                               agino++;
> +                               goto del_cursor;
> +                       }
>                         if (icount) {
>                                 irbp->ir_startino = r.ir_startino;
>                                 irbp->ir_freecount = r.ir_freecount;
> @@ -435,14 +458,17 @@ xfs_bulkstat(
>                                 irbp++;
>                                 agino = r.ir_startino +
> XFS_INODES_PER_CHUNK;
>                         }
> +
>                         /* Increment to the next record */
>                         error = xfs_btree_increment(cur, 0, &tmp);
>                 } else {
>                         /* Start of ag.  Lookup the first inode chunk */
>                         error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE,
> &tmp);
>                 }
> -               if (error)
> +               if (error || tmp == 0) {
> +                       end_of_ag = 1;
>                         goto del_cursor;
> +               }
>
>                 /*
>                  * Loop through inode btree records in this ag,
>



-- 

Tommy Wu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20141031/e357557c/attachment.html>


More information about the xfs mailing list