Bug 355 - unreplayable log after crash
: unreplayable log after crash
Status: RESOLVED WONTFIX
Product: XFS
Classification: Unclassified
Component: XFS kernel code
: unspecified
: All Linux
: P1 normal
: ---
Assigned To: XFS power people
:
:
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-08-10 21:07 CDT by KELEMEN Peter
Modified: 2008-12-25 03:40 CST (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description KELEMEN Peter 2004-08-10 21:07:52 CDT
I have a ~700G filesystem that experienced a hard crash (controller failure),
but the subsequent mount fails when the log would be replayed.  xfs_check tells
me that the filesystem is funky:

bad magic # 0x421cd4bf in inode 11281887 bmbt block 66/1011263
expected level 0 got 6369 in inode 11281887 bmbt block 66/1011263
bad btree nrecs (13911, min=127, max=254) in inode 11281887 bmap block 70217279
extent count for ino 11281887 data fork too low (0) for file format
bad nblocks 7169 for inode 11281887, counted 1
bad nextents 28 for inode 11281887, counted 0
bad nblocks 67585 for inode 11317715, counted 67073
bad nextents 155 for inode 11317715, counted 154
bad nblocks 98817 for inode 11343754, counted 99329
bad nextents 242 for inode 11343754, counted 244
bad nblocks 52737 for inode 11343803, counted 53761
bad nextents 185 for inode 11343803, counted 189
bad nblocks 24577 for inode 11358383, counted 24065
bad nextents 71 for inode 11358383, counted 68
...

mount says
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       or too many mounted file systems
...and returns with exit code 32.

xfs_repair instructs me to zero the log before attempting a repair.

Device is software striped array across two 3ware hardware RAID-5s.
Filesystem geometry (info from xfs_db, superblock 0):
blocksize = 4096
dblocks = 175829568
agblocks = 1048512
agcount = 168
logblocks = 32768
versionnum = 0x3584
sectsize = 512
inodesize = 512
logsunit = 262144

Compressed xfs_logprint output is available at
http://cern.ch/fuji/cruft/md0.logprint.bz2
[1.7 MiB, 57 MiB uncompressed]

I'm currently dumping the md0 device to another machine for further analysis.

Kernel is 2.4.20-31.7.2.cernsmp which is basically RedHat plus some local
patches to the SCSI tape layer.

Hardware is dual Xeon 2.4GHz 1G RAM 3ware 7850-8/7500-4, WD 120G disks.

Peter
Comment 1 KELEMEN Peter 2004-08-10 21:10:01 CDT
*** Bug 354 has been marked as a duplicate of this bug. ***
Comment 2 KELEMEN Peter 2004-08-12 02:58:13 CDT
I have another machine showing the same symptoms, with the additional
difficulty that it refuses to mount even with the -o norecovery,ro option.
The machine had to go back to production, so I was left no other choice than
zeroing the log and live with some corrupted files.  However, this time I made
sure to gather as much information about the fs as possible:

http://cern.ch/fuji/xfs/lxfs5046.xfs_info
http://cern.ch/fuji/xfs/lxfs5046.xfs_check.bz2
http://cern.ch/fuji/xfs/lxfs5046.xfs_logprint.bz2
http://cern.ch/fuji/xfs/lxfs5046.xfs_repair-L.bz2
http://cern.ch/fuji/xfs/lxfs5046.xfs_repair-n.bz2
http://cern.ch/fuji/xfs/lxfs5046.xfs_repair.bz2

Could we at least theorize what causes this, because the probability of
hardware error is diminishing rather fast as more cases are developed?

Same hardware, same kernel than previous entry.

Thanks,
Peter
Comment 3 Nathan Scott 2004-08-15 17:59:07 CDT
hi Peter,

> but the subsequent mount fails when the log would be replayed.  xfs_check tells
> me that the filesystem is funky:

Most likely because the log not replayed, so this isn't useful information
unless it persists after a mount/umount (more recent xfsprogs will warn),
or repair in this case.

> Kernel is 2.4.20-31.7.2.cernsmp which is basically RedHat plus some local
> patches to the SCSI tape layer.

Where did the XFS code come from (which version?  cvs?)

> sure to gather as much information about the fs as possible:

Thanks!  (fyi - one other useful piece - the -C option to xfs_logprint
can capture the log off to a file for later post-processing).

> Could we at least theorize what causes this, because the probability of
> hardware error is diminishing rather fast as more cases are developed?

Lots of variables here - if we can eliminate some, we could come up
with a theory.  Things to exclude if possible - software RAID, large
inodes vs default, v2 log vs defaults, stripe alignment vs none, etc.
Any additional info will help.  Any mount options in use?

thanks.
Comment 4 KELEMEN Peter 2004-08-20 12:49:15 CDT
Thanks for the tip.  XFS code in that kernel was 1.3.1.  But now
I reproduced the bug with RHES3 U2 kernel and XFS from SGI RHES
patch.  One of the links was describing the filesystem layout:

meta-data=/shift/lxfs5046/data01 isize=512    agcount=168, agsize=1048512 blks
         =                       sectsz=512  
data     =                       bsize=4096   blocks=175829568, imaxpct=25
         =                       sunit=64     swidth=128 blks, unwritten=1
naming   =version 2              bsize=4096  
log      =internal               bsize=4096   blocks=32768, version=2
         =                       sectsz=512   sunit=64 blks
realtime =none                   extsz=524288 blocks=0, rtextents=0

Mount options were "logbufs=8,logbsize=262144".  Filesystem is sitting on
top of software RAID0 array with 256k chunks.  I've seen the problem with
2.6.5 vanilla as well, I will see if I can get a kgdb-enabled kernel up.
Comment 5 KELEMEN Peter 2004-08-22 07:42:11 CDT
Linux 2.6.8.1-mm2 correctly replayed the log and mounted the filesystem.
I will try with vanilla 2.4.x next.

Peter
Comment 6 KELEMEN Peter 2004-08-23 11:13:12 CDT
Linux 2.4.27 succeeds replaying the log and mounting the filesystem.
Linux 2.4.21-15.EL.sgi3 fails.

Peter
Comment 7 Tim Shimmin 2004-08-24 21:11:57 CDT
Hi Peter,

This is likely to be attributed to the incident:
  sgi_pv#913531 - 
      recovery of v2 logs of log record size of 256K will fail on Linux
The fix for this was checked in on June 15 2004.
There was a problem with a memory allocation function which had an artificial
limit of 128K. 
(I had an obvious hole in my v2 log qa tests up until that time ;-(

--Tim
Comment 8 KELEMEN Peter 2004-09-02 08:20:00 CDT
Tim,

Thanks for the update.  Investigating the 15.EL.sgi3 source, the fix of June 15th
would not apply since the tree has been reshuffled.  I also tried to see if the
problem exists if I explicitly specify -l su=128k, and it does.  Does su= have
any effect on the log record size?

Thanks,
Peter
Comment 9 Tim Shimmin 2004-09-09 01:49:34 CDT
Hi Peter,

You wrote:
  "I also tried to see if the problem exists if I explicitly specify -l su=128k,
   and it does."

I'm not sure I am following you here.
The fix/bug I was describing was for using a log record size of 256k nothing
to do with the log stripe size.
So trying it with a log size of 128k i.e. mounting with "-o logbsize=128k"
is worth doing if you want to check that the problem goes away.

You also wrote:
"Does su= have any effect on the log record size?"
Not really, except that the log record size must be bigger than the
log stripe size, and thus the stripe size can't be any bigger than
the maximum log record size.
i.e. 
  log-su <= MAX-log-record-size(256k)
  logbsize >= log-su
So if you wanted to mount with 128k logbufs, then it would only
succeed if the log stripe was <= 128k.

(Sorry for not replying sooner - been away a lot lately:)

--Tim
Comment 10 KELEMEN Peter 2004-09-09 03:12:12 CDT
Tim,

I wanted to check whether the 256k logrecord fix applies to my
situation.  Your response confirmed my line of thinking when I
tried the following and failed:

* mount -o logbsize=128k (failed with su=256k, it is expected)
* mount -o logbsize=128k (failed with su=128k, NOT EXPECTED)

...which means the problem is there even if I restrict the
filesystem to 128k logbsize logbufs only => the 256k fix does not
solve it.

Peter
Comment 11 Christoph Hellwig 2008-12-25 03:40:23 CST
Closing all 2.4 kernel bugs with WONTFIX as XFS in Linux 2.4 hasn't been
maintained for a long time.  Please open a new bug if you see something similar
with a recent Linux 2.6 kernel.