xfs
[Top] [All Lists]

Re: XFS hang during xfs_fsr run

To: Michael Weissenbacher <mw@xxxxxxxxxxxx>
Subject: Re: XFS hang during xfs_fsr run
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 8 Mar 2010 11:06:01 +1100
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <4B92C71C.5010003@xxxxxxxxxxxx>
References: <4B8F871C.60802@xxxxxxxxxxxx> <20100304112018.GG14317@xxxxxxxxxxxxxxxx> <4B8FA2CD.6010904@xxxxxxxxxxxx> <20100304131511.GH14317@xxxxxxxxxxxxxxxx> <20100304134641.GA26871@xxxxxxxxxxxxx> <4B8FC1B7.3070505@xxxxxxxxxxxx> <20100304222611.GK14317@xxxxxxxxxxxxxxxx> <4B92C71C.5010003@xxxxxxxxxxxx>
User-agent: Mutt/1.5.18 (2008-05-17)
On Sat, Mar 06, 2010 at 10:20:28PM +0100, Michael Weissenbacher wrote:
> > If xfs_fsr hung before it checked the nodefrag flag, then there's
> > only a few things it could get stuck on:
> > 
> >     1. fsync() of the file
> >     2. file lock checks
> >     3. statvfs64()
> >     4. ioctl(XFS_IOC_FSGETXATTR)
> > 
> > A trace would tell us which one it was....
> > 
> Got another one, this time with ksyms enabled:
> [192115.749003] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000018
> [192115.749197] IP: [<ffffffff811ad69d>] xfs_trans_find_item+0x1/0xa
...
> [192115.749332] Call Trace:
> [192115.749338]  [<ffffffff811ad5c0>] ? xfs_trans_log_inode+0x22/0x4c
> [192115.749344]  [<ffffffff81180554>] xfs_bunmapi+0x9ec/0xa36
> [192115.749350]  [<ffffffff8119a452>] xfs_itruncate_finish+0x188/0x2db
> [192115.749355]  [<ffffffff811b0d41>] xfs_inactive+0x218/0x435
> [192115.749360]  [<ffffffff81364338>] ? __mutex_lock_slowpath+0x22d/0x23c
> [192115.749365]  [<ffffffff811bb096>] xfs_fs_clear_inode+0xb3/0xb8
> [192115.749371]  [<ffffffff810df28a>] clear_inode+0x78/0xd1
> [192115.749375]  [<ffffffff810df9f8>] generic_delete_inode+0xf6/0x16b
> [192115.749379]  [<ffffffff810dfa84>] generic_drop_inode+0x17/0x62
> [192115.749382]  [<ffffffff810deace>] iput+0x61/0x65
> [192115.749386]  [<ffffffff810dbf25>] dentry_iput+0xb5/0xc5
> [192115.749389]  [<ffffffff810dc00f>] d_kill+0x43/0x63
> [192115.749393]  [<ffffffff810dc75f>] dput+0x148/0x155
> [192115.749398]  [<ffffffff810cda56>] __fput+0x196/0x1bb
> [192115.749401]  [<ffffffff810cda93>] fput+0x18/0x1a
> [192115.749405]  [<ffffffff810cac36>] filp_close+0x67/0x72
> [192115.749409]  [<ffffffff810cacda>] sys_close+0x99/0xd2
> [192115.749415]  [<ffffffff810029ab>] system_call_fastpath+0x16/0x1b

That's ... unexpected. That implies that ip->i_temp == NULL after it
has been joined to a transaction. I can't see how that could occur
there.  Can you recompile the kernel with CONFIG_XFS_DEBUG and
re-run the test as that option includes all sorts of sanity checks
for ip->i_temp?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>