http://oss.sgi.com/bugzilla/show_bug.cgi?id=272
------- Additional Comments From sandeen@xxxxxxx 2004-04-08 07:24 PDT -------
Talked with peter, confirmed no other messages prior to shutdown:
Aug 4 10:37:01 oplapro97 kernel: XFS mounting filesystem md0
Aug 4 10:39:50 oplapro97 kernel: xfs_force_shutdown(md0,0x8) called from line
1088 of file fs/xfs_trans.c. Return address = 0xa0000002002b4dd0
If anyone who's hitting this reliably can get kdb compiled in, please
set the xfs_panic_mask to BUG on a shutdown (see xfs.txt in the kernel tree),
load xfsidbg, and then when the fs shuts down & bugs, try dumping the
transaction with the "xtp" command in kdb - find the transaction pointer in
one of the arguments on the stack.
-Eric
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
rom owner-linux-xfs Wed Aug 4 08:34:13 2004
Received: with ECARTIS (v1.0.0; list linux-xfs); Wed, 04 Aug 2004 08:34:21
-0700 (PDT)
Received: from gusi.leathercollection.ph (gusi.leathercollection.ph
[202.163.192.10])
by oss.sgi.com (8.13.0/8.13.0) with ESMTP id i74FYBKk016583
for <linux-xfs@xxxxxxxxxxx>; Wed, 4 Aug 2004 08:34:12 -0700
Received: from localhost (lawin.alabang.leathercollection.ph [192.168.0.2])
by gusi.leathercollection.ph (Postfix) with ESMTP id BB29E886E25
for <linux-xfs@xxxxxxxxxxx>; Wed, 4 Aug 2004 23:34:06 +0800 (PHT)
Received: from lawin.alabang.leathercollection.ph
(lawin.alabang.leathercollection.ph [192.168.0.2])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(No client certificate requested)
by gusi.leathercollection.ph (Postfix) with ESMTP id 35F0C886E22
for <linux-xfs@xxxxxxxxxxx>; Wed, 4 Aug 2004 23:33:58 +0800 (PHT)
Received: by lawin.alabang.leathercollection.ph (Postfix, from userid 1000)
id 9017EA563B19; Wed, 4 Aug 2004 23:33:56 +0800 (PHT)
Date: Wed, 4 Aug 2004 23:33:56 +0800
From: Federico Sevilla III <jijo@xxxxxxxxxxx>
To: Linux-XFS Mailing List <linux-xfs@xxxxxxxxxxx>
Subject: Irreparable 'corrupt dinode ... error 990' Revisited
Message-ID: <20040804153356.GD26826@xxxxxxxxxxxxxxxxxxxx>
Mail-Followup-To: Linux-XFS Mailing List <linux-xfs@xxxxxxxxxxx>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="9zSXsLTf0vkW971A"
Content-Disposition: inline
X-Organization: The Leather Collection, Inc.
X-Organization-URL: http://www.leathercollection.ph
X-Personal-URL: http://jijo.free.net.ph
User-Agent: Mutt/1.5.6+20040523i
X-Virus-Scanned: by amavisd-new-20030616-p9 (Debian) at leathercollection.ph
X-archive-position: 3834
X-ecartis-version: Ecartis v1.0.0
Sender: linux-xfs-bounce@xxxxxxxxxxx
Errors-to: linux-xfs-bounce@xxxxxxxxxxx
X-original-sender: jijo@xxxxxxxxxxx
Precedence: bulk
X-list: linux-xfs
--9zSXsLTf0vkW971A
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Hi everyone,
This isn't the first time I bumped into this problem, although this is
the first time I'm reporting it. I'm using Debian Sid and Linux kernel
2.4.26 with the following patches installed from
<http://www.plumlocosoft.com/kernel/>:
- 010-lckbase.diff.bz2
- 011-readlatency2.diff.bz2
- 012-rl2dt.diff.bz2
- 013-j64.diff.bz2
- 014-vhz.diff.bz2
- 032-nfsacl-0.8.71.diff.bz2
The problem was triggered by, I believe, the hard system lockup caused
by some problem with my external USB hard drive enclosure (accessed via
usb-storage, it's essentially a USB to IDE controller/converter with a
case). I don't know how to reproduce the lockup itself, however during
both instances that this lockup happened I got filesystem corruption
after that xfs_repair could not repair.
I would have understood filesystem corruption in the filesystem stored
on the external storage. Unfortunately, I had filesystem corruption in
my root filesystem, which is /dev/hda1.
Right after the lockup I did a hard reset (the system would not respond
to anything: no network, no keyboard, no mouse) and the system booted up
fine. I had a hunch there was corruption, though, so I rebooted using
Knoppix 3.4 2004-05-17 (which has Linux kernel 2.4.26 and xfsprogs
2.6.11) and used xfs_repair to try and fix /dev/hda1.
The first pass put a number of files in lost+found, which on inspection
were files from tmp that were probably in the middle of being saved or
modified during the lockup. It reported a lot of other errors, but I
wasn't able to take note of these. It ended with a report about an
unrepairable error 990 on a corrupted dinode.
I was able to capture the output of the second pass of xfs_repair. The
same dinode corruption prevented it from finishing cleanly. The output
of the second pass of xfs_repair is attached as xfs_repair_verbose.log.
Having had "experience" with this, I used the du utility to find out
which particular file had a problem. I isolated it to the file
</usr/share/gimp/2.0/gfig/curves>, which I had not touched since I
upgraded the gimp-data package at least a few weeks ago. I can't
understand why this inode was corrupt.
I didn't need the file, though, so I just got some more information on
the corrupt dinode based on instructions to someone else on the list,
then removed it by zeroing out core.size and core.extents before running
a third pass of xfs_repair which completed successfully.
The output of xfs_info for the filesystem is attached as xfs_info.log.
The output of xfs_check querying the corrupt inode is attached as
xfs_db_244162409.log. The output of the third pass of xfs_repair is
attached as xfs_repair_verbose_afterxfsdbpurge244162409.log.
No other inodes seem to have been corrupted. xfs_check gave the
filesystem a clean bill of health. I haven't tried a fourth pass with
xfs_repair, though. Should I do it again, just to be sure?
Some other files were corrupted as far as the applications that use them
is concerned, though, so I had to work things out to correct these on
the application level (eg: do a full reimport of Lurker's archives due
to a database corruption there, remove Cyrus user.seen files due to
corruption there). Why these files were corrupted is beyond me. I figure
it's the data vs. metadata journalling story, once again. Oh well. :(
I hope the information I've provided here can help in any way. The last
time it happened I also "fixed" things by removing the files using the
xfs_db hack. They couldn't be removed any other way (eg: any normal
filesystem access to the corrupt inode barfs with an error 990).
--> Jijo
--
Federico Sevilla III : jijo.free.net.ph : When we speak of free software
GNU/Linux Specialist : GnuPG 0x93B746BE : we refer to freedom, not price.
--9zSXsLTf0vkW971A
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="xfs_repair_verbose.log"
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
zero_log: head block 799 tail block 799
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
data fork in ino 50343204 claims free block 3157855
data fork in ino 50343204 claims free block 3157856
data fork in ino 50343204 claims free block 3157857
data fork in ino 50343204 claims free block 3157858
data fork in ino 50343204 claims free block 3157859
data fork in ino 50343204 claims free block 3157860
data fork in ino 50343204 claims free block 3157861
data fork in ino 50343204 claims free block 3158364
data fork in ino 50343204 claims free block 3158365
data fork in ino 50343204 claims free block 3158366
data fork in ino 50343204 claims free block 3158367
data fork in ino 50343204 claims free block 3158368
data fork in ino 50343204 claims free block 3158369
data fork in ino 50343204 claims free block 3158370
data fork in ino 50343204 claims free block 3158371
data fork in ino 50343204 claims free block 3158372
data fork in ino 50343204 claims free block 3158373
data fork in ino 50343204 claims free block 3158374
data fork in ino 50343204 claims free block 3158375
data fork in ino 50343204 claims free block 3158363
data fork in ino 50343204 claims free block 3158362
data fork in ino 50343204 claims free block 3158360
data fork in ino 50343204 claims free block 3158361
data fork in ino 50343204 claims free block 3158359
data fork in ino 50343204 claims free block 3158358
data fork in ino 50343204 claims free block 3158357
data fork in ino 50343204 claims free block 3158356
data fork in ino 50343204 claims free block 3158355
data fork in ino 50343204 claims free block 3158353
data fork in ino 50343204 claims free block 3158354
data fork in ino 50343204 claims free block 3158352
data fork in ino 50343204 claims free block 3158351
data fork in ino 50343204 claims free block 3158350
data fork in ino 50343204 claims free block 3158349
data fork in ino 50343204 claims free block 3158348
data fork in ino 50343204 claims free block 3158347
data fork in ino 50343204 claims free block 3158346
data fork in ino 50343204 claims free block 3158344
data fork in ino 50343204 claims free block 3158345
data fork in ino 50343204 claims free block 3158343
data fork in ino 50343204 claims free block 3158342
data fork in ino 50343204 claims free block 3158341
data fork in ino 50343204 claims free block 3158340
data fork in ino 50343204 claims free block 3158338
data fork in ino 50343204 claims free block 3158339
data fork in ino 50343204 claims free block 3158337
data fork in ino 50343204 claims free block 3158336
data fork in ino 50343204 claims free block 3158335
data fork in ino 50343204 claims free block 3158334
data fork in ino 50343204 claims free block 3158333
data fork in ino 50343204 claims free block 3158332
data fork in ino 50343204 claims free block 3158331
data fork in ino 50343204 claims free block 3158330
correcting nblocks for inode 50343204, was 80620 - counted 81072
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
correcting nblocks for inode 244162409, was 0 - counted 1
- agno = 15
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- clear lost+found (if it exists) ...
- clearing existing "lost+found" inode
- marking entry "lost+found" to be deleted
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
data fork in regular inode 50343204 claims used block 7920128
correcting nblocks for inode 50343204, was 81072 - counted 80620
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
data fork in regular inode 244162409 claims used block 15260160
correcting nblocks for inode 244162409, was 1 - counted 0
- agno = 15
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- ensuring existence of lost+found directory
- traversing filesystem starting at / ...
rebuilding directory inode 128
- traversal finished ...
- traversing all unattached subtrees ...
- traversals finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
corrupt dinode 244162409, extent total = 1, nblocks = 0. Unmount and run
xfs_repair.
fatal error -- couldn't map inode 244162409, err = 990
--9zSXsLTf0vkW971A
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="xfs_info.log"
meta-data=/mnt/hda1 isize=256 agcount=16, agsize=580097 blks
= sectsz=512
data = bsize=4096 blocks=9281552, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0
--9zSXsLTf0vkW971A
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="xfs_db_244162409.log"
core.magic = 0x494e
core.mode = 0100644
core.version = 1
core.format = 2 (extents)
core.nlinkv1 = 1
core.uid = 0
core.gid = 0
core.flushiter = 9
core.atime.sec = Sun Jun 27 23:59:12 2004
core.atime.nsec = 000000000
core.mtime.sec = Thu Jun 24 21:13:39 2004
core.mtime.nsec = 000000000
core.ctime.sec = Sun Jun 27 23:59:33 2004
core.ctime.nsec = 566118000
core.size = 503
core.nblocks = 0
core.extsize = 0
core.nextents = 1
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.gen = 3
next_unlinked = null
u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,15260160,1,0]
--9zSXsLTf0vkW971A
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment;
filename="xfs_repair_verbose_afterxfsdbpurge244162409.log"
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
zero_log: head block 2 tail block 2
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
data fork in ino 50343204 claims free block 3157855
data fork in ino 50343204 claims free block 3157856
data fork in ino 50343204 claims free block 3157857
data fork in ino 50343204 claims free block 3157858
data fork in ino 50343204 claims free block 3157859
data fork in ino 50343204 claims free block 3157860
data fork in ino 50343204 claims free block 3157861
data fork in ino 50343204 claims free block 3158364
data fork in ino 50343204 claims free block 3158365
data fork in ino 50343204 claims free block 3158366
data fork in ino 50343204 claims free block 3158367
data fork in ino 50343204 claims free block 3158368
data fork in ino 50343204 claims free block 3158369
data fork in ino 50343204 claims free block 3158370
data fork in ino 50343204 claims free block 3158371
data fork in ino 50343204 claims free block 3158372
data fork in ino 50343204 claims free block 3158373
data fork in ino 50343204 claims free block 3158374
data fork in ino 50343204 claims free block 3158375
data fork in ino 50343204 claims free block 3158363
data fork in ino 50343204 claims free block 3158362
data fork in ino 50343204 claims free block 3158360
data fork in ino 50343204 claims free block 3158361
data fork in ino 50343204 claims free block 3158359
data fork in ino 50343204 claims free block 3158358
data fork in ino 50343204 claims free block 3158357
data fork in ino 50343204 claims free block 3158356
data fork in ino 50343204 claims free block 3158355
data fork in ino 50343204 claims free block 3158353
data fork in ino 50343204 claims free block 3158354
data fork in ino 50343204 claims free block 3158352
data fork in ino 50343204 claims free block 3158351
data fork in ino 50343204 claims free block 3158350
data fork in ino 50343204 claims free block 3158349
data fork in ino 50343204 claims free block 3158348
data fork in ino 50343204 claims free block 3158347
data fork in ino 50343204 claims free block 3158346
data fork in ino 50343204 claims free block 3158344
data fork in ino 50343204 claims free block 3158345
data fork in ino 50343204 claims free block 3158343
data fork in ino 50343204 claims free block 3158342
data fork in ino 50343204 claims free block 3158341
data fork in ino 50343204 claims free block 3158340
data fork in ino 50343204 claims free block 3158338
data fork in ino 50343204 claims free block 3158339
data fork in ino 50343204 claims free block 3158337
data fork in ino 50343204 claims free block 3158336
data fork in ino 50343204 claims free block 3158335
data fork in ino 50343204 claims free block 3158334
data fork in ino 50343204 claims free block 3158333
data fork in ino 50343204 claims free block 3158332
data fork in ino 50343204 claims free block 3158331
data fork in ino 50343204 claims free block 3158330
correcting nblocks for inode 50343204, was 80620 - counted 81072
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- clear lost+found (if it exists) ...
- clearing existing "lost+found" inode
- marking entry "lost+found" to be deleted
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
data fork in regular inode 50343204 claims used block 7920128
correcting nblocks for inode 50343204, was 81072 - counted 80620
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- ensuring existence of lost+found directory
- traversing filesystem starting at / ...
rebuilding directory inode 128
- traversal finished ...
- traversing all unattached subtrees ...
- traversals finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
--9zSXsLTf0vkW971A--
|