Hi guys,
I just managed to get this error and severe fs corruption without RAID,
mongo, huge filesystem, or anything weird. (BTW I'm a bleeding edge
kind of guy, so the fs wasn't critical and I've got backups :-). This
was with the kernel-2.4.9-6SGI_XFS_PR1, and I'm not using any of the
modules that had symbol problems.
Initially, I was trying to xfsdump and gzip a whole filesystem to an xfs
on another disk and I got "file size limit exceeded" and "core dumped."
So I said, "OK so what's the max filesize?" I thought it was pretty
high for XFS, but apparently not for Linux XFS--I was backing up a ~6 GB
partition, so the file size had to be less than that. I didn't find any
clues in a cursory glance at xfs.h, so I decided to test it in a
not-so-nice way:
dd if=/dev/zero of=test_size.img bs=10240k
The process choked and this is what turned up in the system log:
Oct 24 07:47:08 localhost kernel: xfs_force_shutdown(ide1(22,65),0x8)
called from line 1120 of file xfs_trans.c. Return address = 0xc01ca409
Oct 24 07:47:08 localhost kernel: Corruption of in-memory data
detected. Shutting down filesystem: ide1(22,65)
Oct 24 07:47:08 localhost kernel: Please umount the filesystem, and
rectify the problem(s)
When I tried to mount the disk again, I got this error:
Oct 24 07:47:30 localhost kernel: XFS: bad magic number
Oct 24 07:47:30 localhost kernel: XFS: SB validate failed
xfs_repair had to search for quite awhile to find a good alternate SB.
I attached the log of xfs_repair so you can see I did a capital job of
trashing the FS :-) Later, I'll try it again to see if I can reproduce
the problem, then again with the newer 2.4.9 kernel.
--
"Jonathan F. Dill" (dill@xxxxxxxxxxxx) [root@localhost ~]# mount /trans
mount: wrong fs type, bad option, bad superblock on /dev/hdd1,
or too many mounted file systems
[root@localhost ~]# xfs_repair /dev/hdd1
xfs_repair: warning - cannot set blocksize on block device /dev/hdd1:
Input/output error
Phase 1 - find and verify superblock...
bad primary superblock - bad magic number !!!
attempting to find secondary superblock...
found
candidate secondary superblock...
verified secondary superblock...
writing modified primary superblock
sb root inode value 18446744073709551615 inconsistent with calculated value
13835049396628095104
resetting superblock root inode pointer to 18446744069414584448
sb realtime bitmap inode 18446744073709551615 inconsistent with calculated
value 13835049396628095105
resetting superblock realtime bitmap ino pointer to 18446744069414584449
sb realtime summary inode 18446744073709551615 inconsistent with calculated
value 13835049396628095106
resetting superblock realtime summary ino pointer to 18446744069414584450
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
bad magic # 0x0 for agf 0
bad version # 0 for agf 0
bad length 0 for agf 0, should be 262144
bad magic # 0x0 for agi 0
bad version # 0 for agi 0
bad length # 0 for agi 0, should be 262144
reset bad agf for ag 0
reset bad agi for ag 0
bad agbno 0 for btbno root, agno 0
bad agbno 0 for btbcnt root, agno 0
bad agbno 0 for inobt root, agno 0
root inode chunk not found
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
error following ag 0 unlinked list
- process known inodes and perform inode discovery...
- agno = 0
imap claims in-use inode 131 is free, correcting imap
imap claims in-use inode 132 is free, correcting imap
imap claims in-use inode 133 is free, correcting imap
imap claims in-use inode 134 is free, correcting imap
imap claims in-use inode 135 is free, correcting imap
imap claims in-use inode 136 is free, correcting imap
imap claims in-use inode 137 is free, correcting imap
imap claims in-use inode 141 is free, correcting imap
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- agno = 32
- agno = 33
- agno = 34
- agno = 35
- agno = 36
- agno = 37
- process newly discovered inodes...
imap claims in-use inode 929 is free, correcting imap
...snip...
imap claims in-use inode 991 is free, correcting imap
imap claims in-use inode 4176897 is free, correcting imap
...snip...
imap claims in-use inode 4176959 is free, correcting imap
found inodes not in the inode allocation tree
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- clear lost+found (if it exists) ...
- check for inodes claiming duplicate blocks...
- agno = 0
entry "test_size.img" at block 0 offset 1024 in directory inode 136 references
free inode 138
clearing inode number in entry at offset 1024...
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- agno = 16
- agno = 17
- agno = 18
- agno = 19
- agno = 20
- agno = 21
- agno = 22
- agno = 23
- agno = 24
- agno = 25
- agno = 26
- agno = 27
- agno = 28
- agno = 29
- agno = 30
- agno = 31
- agno = 32
- agno = 33
- agno = 34
- agno = 35
- agno = 36
- agno = 37
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- ensuring existence of lost+found directory
- traversing filesystem starting at / ...
rebuilding directory inode 136
- traversal finished ...
- traversing all unattached subtrees ...
- traversals finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Note - stripe unit (0) and width (0) fields have been reset.
Please set with mount -o sunit=<value>,swidth=<value>
done
|