Bug 274 - xfs_iunlink_remove: xfs_inotobp() returned error 22
: xfs_iunlink_remove: xfs_inotobp() returned error 22
Status: RESOLVED INVALID
Product: XFS
Classification: Unclassified
Component: XFS kernel code
: Current
: All Linux
: P1 critical
: ---
Assigned To: XFS power people
:
:
:
Depends on:
Blocks: 326
  Show dependency treegraph
 
Reported: 2003-08-14 18:41 CDT by Walt Holman
Modified: 2011-03-03 08:00 CST (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Walt Holman 2003-08-14 18:41:21 CDT
One of our servers recently reported the above error in its logs and shutdown
the running filesystem. The server is an AMD 2x2200 SMP machine with software
raid 5 consisting of 6 x U160 SCSI drives connected via the onboard AIC7902
adaptec controller. The partition in question is a used for user specific
fileserver store. It is shared via samba and netatalk. It is running linux-xfs
pulled from CVS @ 7/19/2003 with latest aic79xx code taken from Justin Gibbs
download area. The server didn't oops, and on reboot, an xfs_check showed the
filesystem in question to be clean. 

Any input on the stability of current CVS is welcome, as I plan on updating it
this weekend in hopes of pre-empting a recurrence.

The hardware is:
Tyan K7X Pro 2469-u MB w/ integrated AIC7902
2 x AMD 2200MP
1 GB Reg. ECC Ram
6 x 36 GB U160 SCSI Disks
Software Raid5 - multiple md's.
Comment 1 Nathan Scott 2003-08-14 19:36:04 CDT
Can you transcribe the (exact) error message?  From looking at the code,
you should have at least two lines in your log (can you send both please).
The xfs_info output from the mounted filesystem may prove useful as well.

thanks.
Comment 2 Walt Holman 2003-08-14 19:55:39 CDT
Sorry about that. Here's the relevant info taken from: 

/var/log/kernel/info
Aug 13 12:21:04 goliath kernel: xfs_inactive:^Ixfs_ifree() returned an error =
22 on md(9,2)
Aug 13 12:21:04 goliath kernel: xfs_force_shutdown(md(9,2),0x1) called from line
1846 of file xfs_vnodeops.c.  Return address = 0xc01eaa5b

/var/log/kernel/warnings
Aug 13 12:21:04 goliath kernel: xfs_inotobp: xfs_imap()  returned an error 22 on
md(9,2).  Returning error.
Aug 13 12:21:04 goliath kernel: xfs_iunlink_remove: xfs_inotobp()  returned an
error 22 on md(9,2).  Returning error.

[root@goliath kernel]# xfs_info /home
meta-data=/home                  isize=256    agcount=12, agsize=262144 blks
         =                       sectsz=512
data     =                       bsize=4096   blocks=3076352, imaxpct=25
         =                       sunit=16     swidth=80 blks, unwritten=0
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=8192, version=2
         =                       sectsz=512   sunit=16 blks
realtime =none                   extsz=327680 blocks=0, rtextents=0

I forgot to mention it earlier, but kernel is 2.4.21 from CVS.
Comment 3 Walt Holman 2003-10-14 11:57:40 CDT
Just happened again unfortunately. Log Info (apologize for the mess):


Oct 14 08:18:10 goliath kernel: xfs_inactive:^Ixfs_ifree() returned an error =
22 on md(9,3)
Oct 14 08:18:10 goliath kernel: xfs_force_shutdown(md(9,3),0x1) called from line
1845 of file xfs_vnodeops.c.  Return address = 0xc01f423c
Oct 14 08:18:10 goliath kernel: Filesystem "md(9,3)": I/O Error Detected. 
Shutting down filesystem: md(9,3)
Oct 14 08:18:10 goliath kernel: Please umount the filesystem, and rectify the
problem(s)
Oct 14 08:18:10 goliath kernel: xfs_inotobp: xfs_imap()  returned an error 22 on
md(9,3).  Returning error.
Oct 14 08:18:10 goliath kernel: xfs_iunlink_remove: xfs_inotobp()  returned an
error 22 on md(9,3).  Returning error.


Since my original report, the kernel was upgraded and is currently running
2.4.22 from a CVS pull ~ Sep. 14, 2003

This system was originally a Mandrake install and prior kernels were compiled
using 2.91.66 as well as their 2.96 versioned gcc. The current kernel was
compiled from my development workstation using gcc-3.2.3
I boot the system passing it acpi=off and mem=nopentium and have tried booting
without with similar results. Uptimes are variable from 1 - 3 weeks now. Getting
heat about this box :(  

Any other info you need?
Comment 4 Christoph Hellwig 2004-01-02 05:30:08 CST
Does this still happen?  And if yes can you run xfs_check on the filesystem to
check whether there's any corruption?
Comment 5 Walt Holman 2004-01-02 07:32:16 CST
Sorry, can't test this server anymore. I've since converted its filesystems to
ext3. That seems to be more reliable on this hardware. When this problem last
occurred, the system had been running approximately 3 weeks since the last
occurrence. In an attempt to catch corruptions earlier, I had been unmounting
it's filesystems weekly and running xfs_check on them. No errors were ever
reported. This particular problem would surface out of the blue with no
irregular behaviour apparent. 

I've got a dhcp, dns, squid proxy server still using XFS and it's rock solid.
It's a P3 based setup, however. Perhaps there's a strange interaction taking
place on Athlon based setups? 

Since converting the filesystems to ext3, the box has been up with no problem to
report. When I was experiencing the problems on this box, and xfs_check
afterward always showed minimal corruption. Most (>90%) of the time, the
xfs_repair process would junk the root inode, but recovery was easy. A bit
scary, to be sure.
Comment 6 Christoph Hellwig 2004-01-02 08:44:42 CST
Okay, marking Worksforme because it's not reproducible here.
Comment 7 Glen Overby 2004-01-02 13:07:40 CST
This report is a symptom of another problem.  The error came back from inotobp
because XFS read an inode off disk that did not look like an inode (bad magic,
etc.).

See case 2491479, and PVs 906636 and 905898 for the customer reported details
we have on this bug.

Glen Overby
Comment 8 Bryan Whitehead 2004-02-11 15:56:04 CST
I got this error as well on a mandrake kernel. The kernel that had the problem
started when we switched to 2.4.19-37mdkenterprise. Since this is a production
server we reverted back to vmlinuz-2.4.19-36mdkenterprise. If we get the error
again I'll try to send in more data....

Feb  8 21:02:06 micro kernel: xfs_inotobp: xfs_imap()  returned an error 22 on
sd(8,5).  Returning error.
Feb  8 21:02:06 micro kernel: xfs_iunlink_remove: xfs_inotobp()  returned an
error 22 on sd(8,5).  Returning error.
Feb  8 21:02:06 micro kernel: xfs_inactive:  xfs_ifree() returned an error = 22
on sd(8,5)
Feb  8 21:02:06 micro kernel: xfs_force_shutdown(sd(8,5),0x1) called from line
1952 of file xfs_vnodeops.c.  Return address = 0xf88bacb3
Feb  8 21:02:06 micro kernel: I/O Error Detected.  Shutting down filesystem: sd(8,5)
Feb  8 21:02:06 micro kernel: Please umount the filesystem, and rectify the
problem(s)
Comment 9 Eric Sandeen 2004-02-11 20:15:39 CST
If you are fairly certain that you see it in one kernel and not the other,
and you have the source for both handy, could you put the diff of 
fs/xfs between the two somewhere?  not sure how big it might be...
Comment 10 Bryan Whitehead 2004-02-12 13:19:21 CST
I got the error on the old kernel as well. I can reproduce easily with xfsdump.
Here is the output of xfsdump:

xfsdump: using file dump (drive_simple) strategy
xfsdump: version 3.0 - Running single-threaded
xfsdump: WARNING: no session label specified
xfsdump: level 0 dump of micro.jpl.nasa.gov:/
xfsdump: dump date: Thu Feb 12 11:18:21 2004
xfsdump: session id: d0d6ffd2-60bd-4c96-9a69-6ae71f242f00
xfsdump: session label: ""
xfsdump: ino map phase 1: skipping (no subtrees specified)
xfsdump: ino map phase 2: constructing initial dump list
xfsdump: WARNING: failed to get bulkstat information for inode 29360279
xfsdump: WARNING: failed to get bulkstat information for inode 29360289
xfsdump: WARNING: failed to get bulkstat information for inode 29360295
xfsdump: syssgi( SGI_FS_BULKSTAT ) on fsroot failed: Input/output error
xfsdump: Dump Status: ERROR

I ran badblocks on this disk (read-only) and it found no errors with the disk. I
didn't have time to-do a read-write test nor can I destroy data on the partition.

I ran xfs repair and got this:
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
zero_log: head block 3029 tail block 3029
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - clear lost+found (if it exists) ...
        - check for inodes claiming duplicate blocks...
        - agno = 0
data fork in regular inode 3523176 claims used block 223904
xfs_repair: dinode.c:2429: process_dinode_int: Assertion `err == 0' failed.


Let me know what else I can do?
Comment 11 Bryan Whitehead 2004-02-12 13:21:42 CST
BTW, this is the log of what happens when xfsdump is ran:
Feb 12 11:18:36 micro kernel: xfs_inotobp: xfs_imap()  returned an error 22 on
sd(8,5).  Returning error.
Feb 12 11:18:36 micro kernel: xfs_iunlink_remove: xfs_inotobp()  returned an
error 22 on sd(8,5).  Returning error.
Feb 12 11:18:36 micro kernel: xfs_inactive:  xfs_ifree() returned an error = 22
on sd(8,5)
Feb 12 11:18:36 micro kernel: xfs_force_shutdown(sd(8,5),0x1) called from line
1952 of file xfs_vnodeops.c.  Return address = 0xfc8bbcb3
Feb 12 11:18:36 micro kernel: I/O Error Detected.  Shutting down filesystem: sd(8,5)
Feb 12 11:18:36 micro kernel: Please umount the filesystem, and rectify the
problem(s)
Comment 12 Bryan Whitehead 2004-02-12 13:27:12 CST
more info:
[root@micro log]# xfs_info /
meta-data=/                      isize=256    agcount=8, agsize=223905 blks
data     =                       bsize=4096   blocks=1791239, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=0
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=1200
realtime =none                   extsz=65536  blocks=0, rtextents=0

The machine is a dual Pentium4 XEON 1.7Ghz w1GB/ram, this error is reproducable
with both highmem and no-highmem kernels. Have not tried a uni-proc kernel.

We never has this error untill we upgraded the kernel from
2.4.19-36mdkenterprise to 2.4.19-37mdkenterprise. But now all kernels have this
problem. I'll try to get a diff between the 2 kernel sources and post it.
Comment 13 Bryan Whitehead 2004-02-24 10:23:22 CST
Has anyone found a resolution to this? Even if it is only running a "better"
version of xfsrepair on my corrupt filesystem?

Is there any more data I can send to help out?
Comment 14 Eric Sandeen 2004-02-24 10:45:39 CST
Bryan, can you post the xfs diff between 2.4.19-36mdkenterprise and
2.4.19-37mdkenterprise somewhere?  If you really saw it appear only after that
upgrade, that'd be helpful.
Comment 15 derry 2004-05-06 11:37:13 CDT
This happened also to me.

The system is a dual Xeon PC, with a raid1 (/dev/md0) 20GB in size.

I needed to do xfs_repair, and only then I could remount the XFS filesystem.

There were no errors in the hard drives or the MD device.
Comment 16 derry 2004-05-06 11:38:41 CDT
.. Forgot to mention: Kernel is 2.4.22 with XFS patch (1.3)
Comment 17 Bernd Zeimetz 2004-09-01 08:17:36 CDT
see also bug #326
Comment 18 Christoph Hellwig 2008-12-25 03:33:52 CST
Closing all 2.4 kernel bugs with WONTFIX as XFS in Linux 2.4 hasn't been
maintained for a long time.  Please open a new bug if you see something similar
with a recent Linux 2.6 kernel.
Comment 19 fl 2011-03-03 06:31:05 CST
Good morning!

We've got the same error on a SLES 11 with kernel 2.6.27.54.

:(
Comment 20 Christoph Hellwig 2011-03-03 08:00:39 CST
In which case you need to report it to SuSE, not us.  It's a very backlevel kernel release, and at least traditionally SuSE contained tons of non-upstream XFS modifications.