<div dir="ltr"><div><div>After apply this patch, xfsdump will hang again.<br><br></div>the log for xfsdump:<br><a href="https://mega.co.nz/#!tUgz3QrC!q7Ix53sl9lYCg0iKRHjgOqF0tZJAVKHBNz0OHWof-ds">https://mega.co.nz/#!tUgz3QrC!q7Ix53sl9lYCg0iKRHjgOqF0tZJAVKHBNz0OHWof-ds</a><br><br></div>But 'echo w > /proc/sysrq-trigger' no any blocked task in dmesg.<br><br></div><div class="gmail_extra"><br><div class="gmail_quote">2014-10-31 8:09 GMT+08:00 Dave Chinner <span dir="ltr"><<a href="mailto:david@fromorbit.com" target="_blank">david@fromorbit.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Fri, Oct 31, 2014 at 10:04:57AM +1100, Dave Chinner wrote:<br>
> On Thu, Oct 30, 2014 at 04:39:01PM +0800, Tommy Wu wrote:<br>
> > I'm confusing now...<br>
> ><br>
> > in the test VM and one of my machine, rebuild xfsdump/xfsprogs will make<br>
> > the test work.<br>
><br>
> I know why, too.<br>
><br>
> > the xfsdump command I use:<br>
> > xfsdump -v debug -l 0 -o -p 300 -J -F -M test -L test -f test.xfsdump<br>
> > /mnt/backup > xfsdump.log<br>
> > the log:<br>
> > <a href="https://mega.co.nz/#!xIhTjD6Y!Gwp_ilW5UyRiqn2DSxPTHM0TH5qBghMlKXAmHoWk2iQ" target="_blank">https://mega.co.nz/#!xIhTjD6Y!Gwp_ilW5UyRiqn2DSxPTHM0TH5qBghMlKXAmHoWk2iQ</a><br>
><br>
> This indicates the bulkstat on the dump side was terminated<br>
> prematurely.<br>
><br>
> > the xfsrestore command I use:<br>
> > xfsrestore -v debug,tree=nitty -p 300 -J -f test.xfsdump /vol/backup/x/ ><br>
> > xfsrestore.log<br>
> > the log:<br>
> > <a href="https://mega.co.nz/#!oZIFWQST!VUY90EZ8XvZDYy9GwF5mqtQVtWNW90Sar5MZhNzGbYI" target="_blank">https://mega.co.nz/#!oZIFWQST!VUY90EZ8XvZDYy9GwF5mqtQVtWNW90Sar5MZhNzGbYI</a><br>
><br>
> And the errors in the restore side are because there are missing<br>
> directory inodes in the dump, hence orphaning occurs.<br>
><br>
> It might take me a little while to come up with a fix - I've noticed<br>
> a few things that I need to check out in more detail in tracking<br>
> this down....<br>
<br>
</div></div>Can you try this patch?<br>
<span class=""><br>
Cheers,<br>
<br>
Dave.<br>
--<br>
Dave Chinner<br>
<a href="mailto:david@fromorbit.com">david@fromorbit.com</a><br>
<br>
</span>xfs: handle munged bulkstat cookies correctly<br>
<br>
From: Dave Chinner <<a href="mailto:dchinner@redhat.com">dchinner@redhat.com</a>><br>
<br>
xfsdump munges the last inode cookie that bulkstat passes to<br>
userspace so subsequent calls can restart at the last inode that<br>
bulkstat parsed. xfsdump thinks that it needs to decrement the<br>
cookie so that it catches the last inode, and this affects the<br>
tree search initialisation code.<br>
<br>
By decrementing the lastino cookie, the restart inode number can<br>
point to non-existent inode number. In 3.16, bulkstat would simply<br>
increment the cookie until it either found the next valid inode or<br>
the end of the AG, at which point it would just continue on as<br>
though nothing had happened. This is a relatively difficult<br>
situation to contrive so is extremely hard to test for, but is<br>
likely to occur fairly frequently in the real world.<br>
<br>
The current code, however, treats a cookie that points off into the<br>
weeds as invalid and returns a error. This has the effect of causing<br>
xfsdump to think it has reached the end of the bulkstat scan, and so<br>
if it's cookie munging happens just wrong it will cause incomplete<br>
dumps to occur. These dumps still have all the file data in them,<br>
just he directory structure is broken and so restore will end up<br>
with lots and lots of files in the orphanage (i.e. lost+found).<br>
<br>
IOWs, we broke xfsdump in a nasty, subtle way that users don't<br>
notice until they restore a dump.<br>
<br>
To fix this, restore the old behaviour of incrementing the inode<br>
number derived from the cookie when we fail to find a matching inode<br>
record.<br>
<br>
cc: <a href="mailto:stable@vger.kernel.org">stable@vger.kernel.org</a><br>
Reported-by: Tommy Wu <<a href="mailto:wu.tommy@gmail.com">wu.tommy@gmail.com</a>><br>
Signed-off-by: Dave Chinner <<a href="mailto:dchinner@redhat.com">dchinner@redhat.com</a>><br>
---<br>
fs/xfs/xfs_itable.c | 34 ++++++++++++++++++++++++++++++----<br>
1 file changed, 30 insertions(+), 4 deletions(-)<br>
<br>
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c<br>
index 7765ff7..dcc0c99 100644<br>
--- a/fs/xfs/xfs_itable.c<br>
+++ b/fs/xfs/xfs_itable.c<br>
@@ -208,6 +208,15 @@ xfs_bulkstat_ichunk_ra(<br>
* if we found the chunk. If the inode was not the last in the chunk and there<br>
* are some left allocated, update the data for the pointed-to record as well as<br>
* return the count of grabbed inodes.<br>
+ *<br>
+ * If this is the first lookup in a bulkstat call, we have to be careful about<br>
+ * the agino that is passed - it may not point to a valid inode record. xfsdump,<br>
+ * in particular, screws with the lastino cookie between bulkstat calls and so<br>
+ * can point it off into the weeds. Hence we can't return an error here if<br>
+ * the agino doesn't point to the correct inode chunk. The previous code simply<br>
+ * incremented agino and tried again. This will work for xfsdump as the next<br>
+ * lookup will either find the correct chunk or the end of the AG, so let's just<br>
+ * keep doing that.<br>
*/<br>
STATIC int<br>
xfs_bulkstat_grab_ichunk(<br>
@@ -226,7 +235,7 @@ xfs_bulkstat_grab_ichunk(<br>
return error;<br>
if (!stat) {<br>
*icount = 0;<br>
- return error;<br>
+ return 0;<br>
}<br>
<br>
/* Get the record, should always work */<br>
@@ -236,8 +245,11 @@ xfs_bulkstat_grab_ichunk(<br>
XFS_WANT_CORRUPTED_RETURN(stat == 1);<br>
<br>
/* Check if the record contains the inode in request */<br>
- if (irec->ir_startino + XFS_INODES_PER_CHUNK <= agino)<br>
- return -EINVAL;<br>
+ if (irec->ir_startino + XFS_INODES_PER_CHUNK <= agino) {<br>
+ /* oops, someone is feeding us garbage! */<br>
+ *icount = -1;<br>
+ return 0;<br>
+ }<br>
<br>
idx = agino - irec->ir_startino + 1;<br>
if (idx < XFS_INODES_PER_CHUNK &&<br>
@@ -428,6 +440,17 @@ xfs_bulkstat(<br>
error = xfs_bulkstat_grab_ichunk(cur, agino, &icount, &r);<br>
if (error)<br>
goto del_cursor;<br>
+ if (icount < 0) {<br>
+ /*<br>
+ * A pox be cast over xfsdump!<br>
+ *<br>
+ * No record containing agino was found. Bump<br>
+ * it, try again until we find a chunk or fall<br>
+ * off the end of the AG.<br>
+ */<br>
+ agino++;<br>
+ goto del_cursor;<br>
+ }<br>
if (icount) {<br>
irbp->ir_startino = r.ir_startino;<br>
irbp->ir_freecount = r.ir_freecount;<br>
@@ -435,14 +458,17 @@ xfs_bulkstat(<br>
irbp++;<br>
agino = r.ir_startino + XFS_INODES_PER_CHUNK;<br>
}<br>
+<br>
/* Increment to the next record */<br>
error = xfs_btree_increment(cur, 0, &tmp);<br>
} else {<br>
/* Start of ag. Lookup the first inode chunk */<br>
error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &tmp);<br>
}<br>
- if (error)<br>
+ if (error || tmp == 0) {<br>
+ end_of_ag = 1;<br>
goto del_cursor;<br>
+ }<br>
<br>
/*<br>
* Loop through inode btree records in this ag,<br>
</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature"><br>Tommy Wu</div>
</div>