Michael Monnerie wrote:
> Tonight our server rebooted, and I found in /var/log/warn that he was crying
> a lot about xfs since June 7 already:
...
> But XFS didn't go offline, so nobody found this messages. There are a lot of
> them.
> They obviously are generated by the nightly "xfs_fsr -v -t 7200" which we run
> since then. It would have been nice if xfs_fsr could have displayed
> a message, so we would have received the cron mail. (But it got killed
> by the kernel, that's a good excuse)
ok yeah we should see why fsr didn't print anything ...
> Anyway, so I went to xfs_repair (3.01) and got this:
>
> Phase 3 - for each AG...
> - scan and clear agi unlinked lists...
> - process known inodes and perform inode discovery...
> [snip]
> - agno = 14
> local inode 3857051697 attr too small (size = 3, min size = 4)
> bad attribute fork in inode 3857051697, clearing attr fork
> clearing inode 3857051697 attributes
> cleared inode 3857051697
> [snip]
> Phase 4 - check for duplicate blocks...
> [snip]
> - agno = 15
> data fork in regular inode 3857051697 claims used block 537147998
> xfs_repair: dinode.c:2108: process_inode_data_fork: Assertion `err == 0'
> failed.
Ok, so this is essentially some code which first does a scan; if it
finds an error it bails out and clears the inode, but if not, it calls
essentially the same function again, comments say "set bitmaps this
time" - but on the 2nd call it finds an error, which isn't handled well.
The ASSERT(err == 0) bit is presumably because if the first scan didn't
find anything, the 2nd call shouldn't either, but ... not the case here
:( There are more checks that can go wrong -after- the scan-only portion.
So either the caller needs to cope w/ the error at this point, or the
scan only business needs do all the checks, I think.
Where's Barry when you need him ....
Also I need to look at when the ASSERTs are active and when they should
be; the Fedora packaged xfsprogs doesn't have the ASSERT active, and so
this doesn't trip. After 2 calls to xfs_repair on Fedora, w/o the
ASSERTs active, it checks clean on the 3rd (!). Not great. Not sure
how much was cleared out in the process either...
-Eric
|