Submitter : nathans *Status : closed
Assigned Engineer : nathans *Fixed By : nathans
*Fixed By Domain : engr *Closed Date : 08/28/00
Priority : 3 *Modified Date : 08/28/00
*Modified User : nathans *Modified User Domain : engr
*Fix Description :
From: nathan scott <nathans@xxxxxxxxxxxxxxxxxxxxxxx> (TAKE)
Date: Aug 28 2000 05:50:03PM
[pvnews version: 1.71]
----------------------------
Modid: 2.4.0-test1-xfs:slinx:73216a
Date: Mon Aug 28 17:41:05 PDT 2000
Workarea: snort:/build4/nathans/base-linux-xfs
Author: nathans
The following file(s) were checked into:
bonnie.engr.sgi.com:/isms/slinx/2.4.0-test1-xfs
cmd/xfs/repair/sb.c - 1.33
- fix endian issue when searching for secondary superblocks.
cmd/xfs/stress/src/devzero.c - 1.4
- correct handling of single block writes, tidy final summary printf.
Description :
In writing some verification tests for xfs_repair, I've found that
a the corrupted primary superblock is not currently recoverable on
Linux.
e.g.
sim/mkfs/mkfs_xfs /dev/foo
stress/src/devzero -b 1 -n 1 /dev/foo
sim/repair/xfs_repair /dev/foo
Phase 1 - find and verify superblock...
.....
==========================
ADDITIONAL INFORMATION (ADD)
From: nathans@engr (BugWorks)
Date: Aug 28 2000 05:14:00PM
==========================
OK, I'm close to having this sorted out, just need some input from
the gurus...
The situation at the moment is:
- libsim mkfs writes bad secondary superblocks
- libxfs mkfs writes good secondary superblocks
(as to why? - i don't know - I can only guess that the bflush
at the end of the old mkfs has the buffers marked as dirty but
not endian converted and flushes them out thus overwriting the
good stuff ... seems very odd though).
- repair does have an endian issue here after all... with a fix,
I get a nicely recovered fs with xfs_repair output like this...
(in gdb...)
run /dev/hda8
Phase 1 - find and verify superblock...
bad primary superblock - bad magic number !!!
attempting to find secondary superblock...
...found candidate secondary superblock...
verified secondary superblock...
writing modified primary superblock
Breakpoint 1, write_primary_sb (sbp=0x40209080, size=512) at sb.c:481
481 if (no_modify)
(gdb) p *sbp
$1 = {sb_magicnum = 1481003842, sb_blocksize = 4096, sb_dblocks = 38146,
sb_rblocks = 0, sb_rextents = 0, sb_uuid = {
__u_bits = "x]o·´\205AÄ\231âK]ùòÀw"}, sb_logstart = 32772,
sb_rootino = 18446744073709551615, sb_rbmino = 18446744073709551615,
sb_rsumino = 18446744073709551615, sb_rextsize = 16, sb_agblocks = 4769,
sb_agcount = 8, sb_rbmblocks = 0, sb_logblocks = 1200,
sb_versionnum = 8324, sb_sectsize = 512, sb_inodesize = 256,
sb_inopblock = 16, sb_fname = "\000\000\000\000\000",
sb_fpack = "\000\000\000\000\000", sb_blocklog = 12 '\f',
sb_sectlog = 9 '\t', sb_inodelog = 8 '\b', sb_inopblog = 4 '\004',
sb_agblklog = 13 '\r', sb_rextslog = 0 '\000', sb_inprogress = 0 '\000',
sb_imax_pct = 25 '\031', sb_icount = 0, sb_ifree = 0, sb_fdblocks = 36914,
sb_frextents = 0, sb_uquotino = 0, sb_pquotino = 0, sb_qflags = 0,
sb_flags = 0 '\000', sb_shared_vn = 0 '\000', sb_inoalignmt = 2,
sb_unit = 0, sb_width = 0, sb_dirblklog = 0 '\000',
sb_dummy = "\000\000\000\000\000\000"}
(gdb) c
Continuing.
sb root inode value 18446744073709551615 inconsistent with calculated value
137438953600
resetting superblock root inode pointer to 137438953600
sb realtime bitmap inode 18446744073709551615 inconsistent with calculated
value 137438953601
resetting superblock realtime bitmap ino pointer to 137438953601
sb realtime summary inode 18446744073709551615 inconsistent with calculated
value 137438953602
resetting superblock realtime summary ino pointer to 137438953602
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- clear lost+found (if it exists) ...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- ensuring existence of lost+found directory
- traversing filesystem starting at / ...
- traversal finished ...
- traversing all unattached subtrees ...
- traversals finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Note - stripe unit (0) and width (0) fields have been reset.
Please set with mount -o sunit=<value>,swidth=<value>
done
So, my question is - I know there's code in mkfs to go through and
sprinkle the known-good root inode into some AGs (looks like we use
the last AG and one in the middle - below the comment "write out
multiple copies of superblocks with the rootinode field set" in mkfs).
At this point we know what the root inode (+rt inodes) are, and we
have all the AGs setup, so why do we not write these inode numbers
in _all_ of the AG superblocks rather than just a couple? (would it
be worthwhile changing mkfs to do this?)
Looks like repair doesn't find the good one at the moment, or
doesn't keep looking for long enough (I suspect its picked the SB
in AG 1), so we get those "resetting" messages at the end of phase1.
many thanks.
|