[Top] [All Lists]

Re: XFS filesystem shutting down on linux (xfs_rename)

To: Gabriel Barazer <gabriel@xxxxxxxx>
Subject: Re: XFS filesystem shutting down on linux (xfs_rename)
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Mon, 27 Jul 2009 12:40:18 -0500
Cc: xfs@xxxxxxxxxxx
In-reply-to: <4A6D9221.5080603@xxxxxxxx>
References: <000c01ca0ae0$e85420a0$b8fc61e0$@fr> <4A67E2F5.2030400@xxxxxxxxxxx> <4A6D9221.5080603@xxxxxxxx>
User-agent: Thunderbird (X11/20090320)
Gabriel Barazer wrote:
> Eric Sandeen wrote:
>> Gabriel Barazer wrote:
>>> Hi,
>>> I recently put a NFS file server into production, with mostly XFS volumes 
>>> on LVM. The server was quite low on traffic until this morning and one of 
>>> the filesystems crashed twice since this morning with the following 
>>> backtrace:
>>> Filesystem "dm-24": XFS internal error xfs_trans_cancel at line 1164 of 
>>> file fs/xfs/xfs_trans.c.  Caller 0xffffffff811b09a7
>>> Pid: 2053, comm: nfsd Not tainted #1
>>> Call Trace:
>>>  [<ffffffff811b09a7>] xfs_rename+0x4a1/0x4f6
>>>  [<ffffffff811b1806>] xfs_trans_cancel+0x56/0xed
>>>  [<ffffffff811b09a7>] xfs_rename+0x4a1/0x4f6
>> ...
>>> xfs_force_shutdown(dm-24,0x8) called from line 1165 of file 
>>> fs/xfs/xfs_trans.c.  Return address = 0xffffffff811b181f
>>> Filesystem "dm-24": Corruption of in-memory data detected.  Shutting down 
>>> filesystem: dm-24
>>> The two crashed are related to the same function: xfs_rename.
>> Can you do objdump -d xfs.ko | grep "xfs_rename\|xfs_trans_cancel" and
>> maybe we can see which call to xfs_trans_cancel in xfs_rename this was.
>> The problem relates to canceling a dirty transaction on an error path.
> Hi,
> sorry for the late reply
> I don't have any xfs.ko as my kernel is compiled without CONFIG_MODULES. 
> However I objdump'd the vmlinux uncompressed kernel, and here are the 
> results:

Ok, that was an over eager grep command, my apologies to the mail
archives ;)

The relevant stuff:

ffffffff811b0506 <xfs_rename>:
ffffffff811b06c1:       e8 ea 10 00 00          callq  ffffffff811b17b0
ffffffff811b09a2:       e8 09 0e 00 00          callq  ffffffff811b17b0

hmm but there are only 2 obvious calls in the disassembly, and there are
4 calls in the function... and neither one seems to line up with your
stated offset in the oops.  :(  I was hoping to sort out which
xfs_trans_cancel call in xfs_rename it was.

Any chance you could add a couple printk's to xfs_rename in the cases
where it calls trans_cancel so we can see which one it was?


<Prev in Thread] Current Thread [Next in Thread>