Bugzilla – Bug 355
unreplayable log after crash
Last modified: 2008-12-25 03:40:23 CST
I have a ~700G filesystem that experienced a hard crash (controller failure), but the subsequent mount fails when the log would be replayed. xfs_check tells me that the filesystem is funky: bad magic # 0x421cd4bf in inode 11281887 bmbt block 66/1011263 expected level 0 got 6369 in inode 11281887 bmbt block 66/1011263 bad btree nrecs (13911, min=127, max=254) in inode 11281887 bmap block 70217279 extent count for ino 11281887 data fork too low (0) for file format bad nblocks 7169 for inode 11281887, counted 1 bad nextents 28 for inode 11281887, counted 0 bad nblocks 67585 for inode 11317715, counted 67073 bad nextents 155 for inode 11317715, counted 154 bad nblocks 98817 for inode 11343754, counted 99329 bad nextents 242 for inode 11343754, counted 244 bad nblocks 52737 for inode 11343803, counted 53761 bad nextents 185 for inode 11343803, counted 189 bad nblocks 24577 for inode 11358383, counted 24065 bad nextents 71 for inode 11358383, counted 68 ... mount says mount: wrong fs type, bad option, bad superblock on /dev/md0, or too many mounted file systems ...and returns with exit code 32. xfs_repair instructs me to zero the log before attempting a repair. Device is software striped array across two 3ware hardware RAID-5s. Filesystem geometry (info from xfs_db, superblock 0): blocksize = 4096 dblocks = 175829568 agblocks = 1048512 agcount = 168 logblocks = 32768 versionnum = 0x3584 sectsize = 512 inodesize = 512 logsunit = 262144 Compressed xfs_logprint output is available at http://cern.ch/fuji/cruft/md0.logprint.bz2 [1.7 MiB, 57 MiB uncompressed] I'm currently dumping the md0 device to another machine for further analysis. Kernel is 2.4.20-31.7.2.cernsmp which is basically RedHat plus some local patches to the SCSI tape layer. Hardware is dual Xeon 2.4GHz 1G RAM 3ware 7850-8/7500-4, WD 120G disks. Peter
*** Bug 354 has been marked as a duplicate of this bug. ***
I have another machine showing the same symptoms, with the additional difficulty that it refuses to mount even with the -o norecovery,ro option. The machine had to go back to production, so I was left no other choice than zeroing the log and live with some corrupted files. However, this time I made sure to gather as much information about the fs as possible: http://cern.ch/fuji/xfs/lxfs5046.xfs_info http://cern.ch/fuji/xfs/lxfs5046.xfs_check.bz2 http://cern.ch/fuji/xfs/lxfs5046.xfs_logprint.bz2 http://cern.ch/fuji/xfs/lxfs5046.xfs_repair-L.bz2 http://cern.ch/fuji/xfs/lxfs5046.xfs_repair-n.bz2 http://cern.ch/fuji/xfs/lxfs5046.xfs_repair.bz2 Could we at least theorize what causes this, because the probability of hardware error is diminishing rather fast as more cases are developed? Same hardware, same kernel than previous entry. Thanks, Peter
hi Peter, > but the subsequent mount fails when the log would be replayed. xfs_check tells > me that the filesystem is funky: Most likely because the log not replayed, so this isn't useful information unless it persists after a mount/umount (more recent xfsprogs will warn), or repair in this case. > Kernel is 2.4.20-31.7.2.cernsmp which is basically RedHat plus some local > patches to the SCSI tape layer. Where did the XFS code come from (which version? cvs?) > sure to gather as much information about the fs as possible: Thanks! (fyi - one other useful piece - the -C option to xfs_logprint can capture the log off to a file for later post-processing). > Could we at least theorize what causes this, because the probability of > hardware error is diminishing rather fast as more cases are developed? Lots of variables here - if we can eliminate some, we could come up with a theory. Things to exclude if possible - software RAID, large inodes vs default, v2 log vs defaults, stripe alignment vs none, etc. Any additional info will help. Any mount options in use? thanks.
Thanks for the tip. XFS code in that kernel was 1.3.1. But now I reproduced the bug with RHES3 U2 kernel and XFS from SGI RHES patch. One of the links was describing the filesystem layout: meta-data=/shift/lxfs5046/data01 isize=512 agcount=168, agsize=1048512 blks = sectsz=512 data = bsize=4096 blocks=175829568, imaxpct=25 = sunit=64 swidth=128 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=32768, version=2 = sectsz=512 sunit=64 blks realtime =none extsz=524288 blocks=0, rtextents=0 Mount options were "logbufs=8,logbsize=262144". Filesystem is sitting on top of software RAID0 array with 256k chunks. I've seen the problem with 2.6.5 vanilla as well, I will see if I can get a kgdb-enabled kernel up.
Linux 2.6.8.1-mm2 correctly replayed the log and mounted the filesystem. I will try with vanilla 2.4.x next. Peter
Linux 2.4.27 succeeds replaying the log and mounting the filesystem. Linux 2.4.21-15.EL.sgi3 fails. Peter
Hi Peter, This is likely to be attributed to the incident: sgi_pv#913531 - recovery of v2 logs of log record size of 256K will fail on Linux The fix for this was checked in on June 15 2004. There was a problem with a memory allocation function which had an artificial limit of 128K. (I had an obvious hole in my v2 log qa tests up until that time ;-( --Tim
Tim, Thanks for the update. Investigating the 15.EL.sgi3 source, the fix of June 15th would not apply since the tree has been reshuffled. I also tried to see if the problem exists if I explicitly specify -l su=128k, and it does. Does su= have any effect on the log record size? Thanks, Peter
Hi Peter, You wrote: "I also tried to see if the problem exists if I explicitly specify -l su=128k, and it does." I'm not sure I am following you here. The fix/bug I was describing was for using a log record size of 256k nothing to do with the log stripe size. So trying it with a log size of 128k i.e. mounting with "-o logbsize=128k" is worth doing if you want to check that the problem goes away. You also wrote: "Does su= have any effect on the log record size?" Not really, except that the log record size must be bigger than the log stripe size, and thus the stripe size can't be any bigger than the maximum log record size. i.e. log-su <= MAX-log-record-size(256k) logbsize >= log-su So if you wanted to mount with 128k logbufs, then it would only succeed if the log stripe was <= 128k. (Sorry for not replying sooner - been away a lot lately:) --Tim
Tim, I wanted to check whether the 256k logrecord fix applies to my situation. Your response confirmed my line of thinking when I tried the following and failed: * mount -o logbsize=128k (failed with su=256k, it is expected) * mount -o logbsize=128k (failed with su=128k, NOT EXPECTED) ...which means the problem is there even if I restrict the filesystem to 128k logbsize logbufs only => the 256k fix does not solve it. Peter
Closing all 2.4 kernel bugs with WONTFIX as XFS in Linux 2.4 hasn't been maintained for a long time. Please open a new bug if you see something similar with a recent Linux 2.6 kernel.