On 1/29/15 3:59 PM, Gerard Beekmans wrote:
> That gives me this:
> xfs_db> agf 5
> xfs_db> daddr
> current daddr is 5120001
> xfs_db> print
> magicnum = 0
> versionnum = 0
so it is completely zeroed out.
>>> xfs_db: cannot init perag data (117). Continuing anyway.
>>> xfs_db> sb 0
>>> xfs_db> p
>>> magicnum = 0x58465342
>> this must not be the one that repair failed on like:
>>> couldn't verify primary superblock - bad magic number !!!
>> because that magicnum is valid. Did this one also fail to repair?
> How do I know/check/test if "this one" fails to refer? I'm not sure what
> you're referring to (or what to do with it).
I'm sorry, I meant did this filesystem fail to repair based on a bad
>>> agcount = 25
>> 25 ags, presumably the fs was grown in the past, but ok...
> Yes, it was. Ran out of space so I increased the size of the logical
> volume then used xfs_grow to increase the filesystem itself. That was
> the whole reason behind using LVM so this growth can be done on a
> live system without requiring repartitioning and such.
> I did read today that growing an XFS is not necessarily something we
> should be doing? Some posts even suggest that LVM and XFS shouldn't
> be mixed together. Not sure how to separate truth from fiction.
It's fine; the downside is for people who think they can start with 1G
and grow to 10T; that's pretty suboptimal. XFS over LVM is fine.
I'm sure it's not related to this issue (unless it was very recently grown?
Was it grown shortly before the failures?)
Hm, it would have started at 4 AGs by default, and it's the 5th one that
looks bad; maybe that's a clue. Are agf 6, 7, 8 etc also full of 0s?
>> The only thing I can say is that xfs is going to depend on the storage
>> the truth about completed IOs... If the storage told XFS an IO was
>> but it wasn't, and the storage went poof, bad things can happen. I don't
>> know the details of your setup, or TBH much about vmware over nfs ... you
>> weren't mounted with -o nobarrier were you?
> No I wasn't mounted with nobarrier unless it is done by default. I
> never specified the option on command line or in /etc/fstab at any
> rate for what that is worth.
ok. I'm not sure what to tell you at this point; you have at least one
swath of your storage which looks totally zeroed out. That's not a
failure mode we usually see, and makes me think it's more storage related,
although the "how long ago did you grow this fs?" question might be
related, because the first visible corruption is in the first "new"