xfs
[Top] [All Lists]

Re: [PATCH 0/6 v2] xfs: fix the bulkstat mess

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 0/6 v2] xfs: fix the bulkstat mess
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Thu, 6 Nov 2014 07:49:08 -0500
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20141106065359.GD23575@dastard>
References: <1415145921-31507-1-git-send-email-david@xxxxxxxxxxxxx> <20141105060728.GE28565@dastard> <20141105063226.GF28565@dastard> <20141105131706.GB26724@xxxxxxxxxxxxxxx> <20141105212100.GG28565@dastard> <20141106065359.GD23575@dastard>
User-agent: Mutt/1.5.23 (2014-03-12)
On Thu, Nov 06, 2014 at 05:53:59PM +1100, Dave Chinner wrote:
> On Thu, Nov 06, 2014 at 08:21:00AM +1100, Dave Chinner wrote:
> > On Wed, Nov 05, 2014 at 08:17:06AM -0500, Brian Foster wrote:
> > > On Wed, Nov 05, 2014 at 05:32:26PM +1100, Dave Chinner wrote:
> > > > OK, this looks like a problem with handling the last record in the
> > > > AGI btree:
> > > > 
> > > > $ for i in `cat s.diff | grep "^+/" | sed -e 's/^+//'` ; do ls -i $i; 
> > > > done |sort -n
> > > > 163209114099 /mnt/scratch/2/dbc/5459605f~~~~~~~~RDJX8QBHPPMCGMD7YJQGYPD2
> > > > ....
> > > > 163209114129 /mnt/scratch/2/dbc/5459605f~~~~~~~~U820IYQFKS8A6QYCC8HU3ZBX
> > > > 292057960758 /mnt/scratch/0/dcc/54596070~~~~~~~~9BUH5D5PZTGAC8BT1YL77OZ0
> > > > ...
> > > > 292057960769 /mnt/scratch/0/dcc/54596070~~~~~~~~DAO78GAAFNUZU8PH7Q0UZNRH
> > > > 1395864555809 
> > > > /mnt/scratch/1/e60/54596103~~~~~~~~GEMXGHYNREW409N7W9INBMVA
> > > > .....
> > > > 1395864555841 
> > > > /mnt/scratch/1/e60/54596103~~~~~~~~9XPK9FWHCE21AJ3EN023DU47
> > > > 1653562593576 
> > > > /mnt/scratch/5/e79/5459611c~~~~~~~~BSBZ6EUCT9HOIRQPMFZDVPQ5
> > > > .....
> > > > 1653562593601 
> > > > /mnt/scratch/5/e79/5459611c~~~~~~~~6QY1SO8ZGGNQESAGXSB3G3DH
> > > > $
> > > > 
> > > > xfs_db> convert inode 163209114099 agno
> > > > 0x26 (38)
> > > > xfs_db> convert inode 163209114099 agino
> > > > 0x571f3 (356851)
> > > > xfs_db> convert inode 163209114129 agino
> > > > 0x57211 (356881)
> > > > xfs_db> agi 38
> > > > xfs_db> a root
> > > > xfs_db> a ptrs[2]
> > > > xfs_db> p
> > > > ....
> > > > recs[1-234] = [startino,freecount,free]
> > > > ......
> > > > 228:[356352,0,0] 229:[356416,0,0] 230:[356512,0,0] 231:[356576,0,0]
> > > > 232:[356672,0,0] 233:[356736,0,0] 234:[356832,14,0xfffc000000000000]
> > > > 
> > > > So the first contiguous inode range they all fall into the partial 
> > > > final record
> > > > in the AG.
> > > > 
> > > > xfs_db> convert inode 292057960758 agino
> > > > 0x2d136 (184630)
> > > > .....
> > > > 155:[184544,0,0] 156:[184608,30,0xfffffffc00000000]
> > > > 
> > > > Same.
> > > > 
> > > > xfs_db> convert inode 1395864555809 agino
> > > > 0x2d121 (184609)
> > > > .....
> > > > 155:[184544,0,0] 156:[184608,30,0xfffffffc00000000]
> > > > 
> > > > Same.
> > > > 
> > > > xfs_db> convert inode 1653562593576 agino
> > > > 0x2d128 (184616)
> > > > ....
> > > > 155:[184544,0,0] 156:[184608,30,0xfffffffc00000000]
> > > > 
> > > > Same.
> > > > 
> > > > So they are all falling into the last btree record in the AG, and so
> > > > appear to have been skipped as a result of the same issue. At least
> > > > that gives me something to look at.
> > > > 
> > > 
> > > Interesting, though just to note... is it possible this is related to
> > > records with free inodes?
> > 
> > Oh, definitely possible - I haven't ruled that out yet. However, I
> > would have expected such an issue ot manifest itself during xfstests
> > (e.g. xfs/068) where random files are removed from the filesystem
> > and so leaving fre inodes in the inobt....
> 
> It's not.
> 
> > > It also might be interesting to know whether this repeats without the
> > > last patch in the series. IIRC that one seemed to have the most
> > > potential impact on the overall algorithm (by changing loop iteration
> > > logic, etc.). Just a thought.
> > 
> > I'll try that, but the missing files don't seem aligned to the
> 
> And it's not that.
> 
> THe issue, as I suspected was to do with end of AG handling, in a
> special corner case: the user buffer ran out of space part way
> through processing the last records in the AG.
> 
> That is, we've scanned all the way to the end of the ag and records
> the inobt records into the temp buffer. As this runs off the end of
> the AG, it sets the end_of_ag flag. We then walk the temp
> buffer to format the inodes, and while doing so we run out of space
> before all the inodes are formatted. We then hit the loop
> termination, which does agno++; agino = 0; and then we hit the
> "user buffer full check".
> 

Ok, so this is pretty much what I noted in the comments to the v2 6/6
(though not directly related to the changes in 6/6 I suppose). ;)

> And so the cookie we pass back to userspace doesn't match the last
> inode we actually formatted, and so it skips a bunch of files in the
> dump. So, we have missing inodes in the dump. THe scary part about
> this is that the inodes are actually in the directory tree that
> xfsdump builds but it does not give any warnings that
> inodes weren't found during scan. And xfsrestore doesn't warn that
> entries in the directory tree didn't have any corresponding inode in
> the dump....
> 

Yeah, that sounds scary. :/

> Hence I could look at the dump and find the path and inode number
> corresponding to one of the missing files, mark it for extraction
> and then extract it *successfully* but the file was not restored -
> it is actually missing from the dump....
> 
> Anyway, the bug was in patch 3 - we have to check whether the user
> buffer has space left before processing the end_of_ag state. I
> removed the buffer space check from the for() loop logic because
> it's just plain misleading and with the check in the formatter and
> the end of the main loop is entires redundant. I'll post the series
> again once I've confirmed a good dump and restore.
> 

There's a simple test for that condition, as noted in my previous mail
as well in case you missed it. Again, that probably calls out that we
could be doing better unit testing of bulkstat in xfstests. At the very
least we probably need some bulkstat inode count validation against a
known data set. xfsdump testing is obviously important, but if bulkstat
is broken then we clearly can't expect xfsdump to work (and debugging
the former via the latter appears to be quite painful).

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

<Prev in Thread] Current Thread [Next in Thread>