xfs
[Top] [All Lists]

Re: Hang in XFS reclaim on 3.7.0-rc3

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: Hang in XFS reclaim on 3.7.0-rc3
From: Torsten Kaiser <just.for.lkml@xxxxxxxxxxxxxx>
Date: Tue, 20 Nov 2012 08:09:02 +0100
Cc: xfs@xxxxxxxxxxx, Linux Kernel <linux-kernel@xxxxxxxxxxxxxxx>
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=irtQSi6RdVQFKq0iEF+laU9m58APzK89UoXt7w8KG70=; b=mdvH1bUhCXf3tIbXAII/AxapnGYq9CoPQxdDH243IUGrDb4J5HwYneTnw+4vrVEg7K Y2kaGL0duy4mvnd/WOrFNiLw77Vl1B19iwcqlIF8QQw+M8RDvMWRa5QpSFKtShDiErCI FE5MA3K3qvQfJMT/4UTwRV3mhjSyDg52wZw07XMrLO80ZV1NfkTARItLnZi7oIfvtMbk nJhoCFw04arxp5jaZgCcMQkT33Ujh6z0gO11oYC7Gq2XCc8FDjurUz8IY7DYpls8O+SQ e+akXCZ/LgBuudh9+WiJRL6hscOZkVy7JN+bcdLTj/FuF3617IpycC0mhUtheuEUvH0q Unmg==
In-reply-to: <20121119235306.GX14281@dastard>
References: <CAPVoSvSM9=hictqwT2rzZA-fU_XSwd-_FRzW_J+HQYj7iohTWQ@xxxxxxxxxxxxxx> <20121029222613.GU29378@dastard> <CAPVoSvQATjAVu-D477mrr6K9d0FeY7sH27cH-zNBYMJcRoUY0Q@xxxxxxxxxxxxxx> <CAPVoSvRks32ZM7n6U8but-vEj622TEFqAFdxXqS_6mRxyv=0Ew@xxxxxxxxxxxxxx> <CAPVoSvSKn2FuBhMF+3U1ueuEzBqL4CFTYFGXqGczTa42PgMjRw@xxxxxxxxxxxxxx> <20121118235105.GT14281@dastard> <CAPVoSvR+Gk6ggSZ6=ZMpyvwhosjd4BSGsUqRT=txkzGDGLMTPw@xxxxxxxxxxxxxx> <20121119235306.GX14281@dastard>
On Tue, Nov 20, 2012 at 12:53 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Mon, Nov 19, 2012 at 07:50:06AM +0100, Torsten Kaiser wrote:
> So, both lockdep thingy's are the same:

I suspected this, but as the reports where slightly different I
attached bith of them, as I couldn't decide witch one was the
better/simpler report to debug this.

>> [110926.972482] =========================================================
>> [110926.972484] [ INFO: possible irq lock inversion dependency detected ]
>> [110926.972486] 3.7.0-rc4 #1 Not tainted
>> [110926.972487] ---------------------------------------------------------
>> [110926.972489] kswapd0/725 just changed the state of lock:
>> [110926.972490]  (sb_internal){.+.+.?}, at: [<ffffffff8122b268>] 
>> xfs_trans_alloc+0x28/0x50
>> [110926.972499] but this lock took another, RECLAIM_FS-unsafe lock in the 
>> past:
>> [110926.972500]  (&(&ip->i_lock)->mr_lock/1){+.+.+.}
>
> Ah, what? Since when has the ilock been reclaim unsafe?
>
>> [110926.972500] and interrupts could create inverse lock ordering between 
>> them.
>> [110926.972500]
>> [110926.972503]
>> [110926.972503] other info that might help us debug this:
>> [110926.972504]  Possible interrupt unsafe locking scenario:
>> [110926.972504]
>> [110926.972505]        CPU0                    CPU1
>> [110926.972506]        ----                    ----
>> [110926.972507]   lock(&(&ip->i_lock)->mr_lock/1);
>> [110926.972509]                                local_irq_disable();
>> [110926.972509]                                lock(sb_internal);
>> [110926.972511]                                
>> lock(&(&ip->i_lock)->mr_lock/1);
>> [110926.972512]   <Interrupt>
>> [110926.972513]     lock(sb_internal);
>
> Um, that's just bizzare. No XFS code runs with interrupts disabled,
> so I cannot see how this possible.
>
> .....
>
>
>        [<ffffffff8108137e>] mark_held_locks+0x7e/0x130
>        [<ffffffff81081a63>] lockdep_trace_alloc+0x63/0xc0
>        [<ffffffff810e9dd5>] kmem_cache_alloc+0x35/0xe0
>        [<ffffffff810dba31>] vm_map_ram+0x271/0x770
>        [<ffffffff811e1316>] _xfs_buf_map_pages+0x46/0xe0
>        [<ffffffff811e222a>] xfs_buf_get_map+0x8a/0x130
>        [<ffffffff81233ab9>] xfs_trans_get_buf_map+0xa9/0xd0
>        [<ffffffff8121bced>] xfs_ialloc_inode_init+0xcd/0x1d0
>
> We shouldn't be mapping buffers there, there's a patch below to fix
> this. It's probably the source of this report, even though I cannot
> lockdep seems to be off with the fairies...

I also tried to understand what lockdep was saying, but
Documentation/lockdep-design.txt is not too helpful.
I think 'CLASS'-ON-R / -ON-W means that this lock was 'ON' / held
while 'CLASS' (HARDIRQ, SOFTIRQ, RECLAIM_FS) happend and that makes
this lock unsafe for these contexts. IN-'CLASS'-R / -W seems to be
'lock taken in context 'CLASS'.
A note that 'CLASS'-ON-? means 'CLASS'-unsafe in there would be helpful to me...

Wrt. above interrupt output: I think lockdep doesn't really know about
RECLAIM_FS and threats it as another interrupt. I think that output
should have been something like this:
        CPU0                    CPU1
        ----                    ----
   lock(&(&ip->i_lock)->mr_lock/1);
                                <Allocation enters reclaim>
                                lock(sb_internal);
                                lock(&(&ip->i_lock)->mr_lock/1);
   <Allocation enters reclaim>
     lock(sb_internal);

Entering reclaim on CPU1 would mean that CPU1 would not enter reclaim
again, so the reclaim-'interrupt' would be disabled.
And instead of interrupts disrupting the normal codeflow on CPU0 it
would be 'interrupted' be instead of doing a normal allocation, it
would 'interrupt' the allocation to reclaim memory.
print_irq_lock_scenario() would need to be taught to print a slightly
different message for reclaim-'interrupts'.

I will try your patch, but as I do not have a reliable reproducer to
create this lockdep report, I can't really verify if this fixes it.
But I will definitely mail you, if it happens again with this patch.

Thanks, Torsten

> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
>
> xfs: inode allocation should use unmapped buffers.
>
> From: Dave Chinner <dchinner@xxxxxxxxxx>
>
> Inode buffers do not need to be mapped as inodes are read or written
> directly from/to the pages underlying the buffer. This fixes a
> regression introduced by commit 611c994 ("xfs: make XBF_MAPPED the
> default behaviour").
>
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---
>  fs/xfs/xfs_ialloc.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
> index 2d6495e..a815412 100644
> --- a/fs/xfs/xfs_ialloc.c
> +++ b/fs/xfs/xfs_ialloc.c
> @@ -200,7 +200,8 @@ xfs_ialloc_inode_init(
>                  */
>                 d = XFS_AGB_TO_DADDR(mp, agno, agbno + (j * 
> blks_per_cluster));
>                 fbuf = xfs_trans_get_buf(tp, mp->m_ddev_targp, d,
> -                                        mp->m_bsize * blks_per_cluster, 0);
> +                                        mp->m_bsize * blks_per_cluster,
> +                                        XBF_UNMAPPED);
>                 if (!fbuf)
>                         return ENOMEM;
>                 /*

<Prev in Thread] Current Thread [Next in Thread>