xfs
[Top] [All Lists]

Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs f

To: "Mel Gorman" <mel@xxxxxxxxx>
Subject: Re: [2.6.26-rc7] shrink_icache from pagefault locking (nee: nfsd hangs for a few sec)...
From: "Daniel J Blueman" <daniel.blueman@xxxxxxxxx>
Date: Sun, 22 Jun 2008 19:54:59 +0100
Cc: "Christoph Lameter" <clameter@xxxxxxx>, "Linus Torvalds" <torvalds@xxxxxxxxxxxxxxxxxxxx>, "Alexander Beregalov" <a.beregalov@xxxxxxxxx>, "Linux Kernel" <linux-kernel@xxxxxxxxxxxxxxx>, david@xxxxxxxxxxxxx, xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=aHFWqtNw3yCd3ETj+1pOTFdENoaprArcdSqvZIIr6yo=; b=e6m2Wls60r2EAatnSBZh7JVljRWNtlHSo/XFvr35d8dy3NR+aLS68+gPZoChH9GJlz Cd6rv12PJlEybWxZgthnxH20sDvDlqS14T1LYDCxLjjqdqq7XY+LWzSRzXhZ7V1U0R+9 7cqRly6TzuagMCpwCqU186piPnyOEMmJHM34g=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=s7A77hBJcYP76sV/UsVU1f2pIhksK4BxAilj/poWEqUIKAs3+t66ROgb6wBgsIoRE+ 2gcalZlUwG1NrMTO4Cl5tCY0Jmg7/XD8fvSUKMuSVSo1X8N1FeePGkR5aJxzwmElbmTY fCfET98h1nILBu5k+pJfCWP3oF3NS2Q0FYcsQ=
In-reply-to: <20080622181449.GD625@xxxxxxxxx>
References: <6278d2220806220256g674304ectb945c14e7e09fede@xxxxxxxxxxxxxx> <6278d2220806220258p28de00c1x615ad7b2f708e3f8@xxxxxxxxxxxxxx> <20080622181011.GC625@xxxxxxxxx> <20080622181449.GD625@xxxxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
On Sun, Jun 22, 2008 at 7:14 PM, Mel Gorman <mel@xxxxxxxxx> wrote:
> On (22/06/08 10:58), Daniel J Blueman didst pronounce:
>> I'm seeing a similar issue [2] to what was recently reported [1] by
>> Alexander, but with another workload involving XFS and memory
>> pressure.
>>
>
> Is NFS involved or is this XFS only? It looks like XFS-only but no harm in
> being sure.

The application is reading a 10MB file from NFS around every few
seconds, and writing back ~30MB every few seconds to local XFS;
another thread in the same application is consuming that data and
writing out ~2MB to NFS after circa 10 input files,  so NFS isn't
dominant but is involved.

> I'm beginning to wonder if this is a problem where a lot of dirty inodes are
> being written back in this path and we stall while that happens. I'm still
> not getting why we are triggering this now and did not before 2.6.26-rc1
> or why it bisects to the zonelist modifications. Diffing the reclaim and
> allocation paths between 2.6.25 and 2.6.26-rc1 has not yielded any candidates
> for me yet that would explain this.
>
>> SLUB allocator is in use and config is at 
>> http://quora.org/config-client-debug .
>>
>> Let me know if you'd like more details/vmlinux objdump etc.
>>
>> Thanks,
>>  Daniel
>>
>> --- [1]
>>
>> http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/e673c9173d45a735/db9213ef39e4e11c
>>
>> --- [2]
>>
>> =======================================================
>> [ INFO: possible circular locking dependency detected ]
>> 2.6.26-rc7-210c #2
>> -------------------------------------------------------
>> AutopanoPro/4470 is trying to acquire lock:
>>  (iprune_mutex){--..}, at: [<ffffffff802d94fd>] 
>> shrink_icache_memory+0x7d/0x290
>>
>> but task is already holding lock:
>>  (&mm->mmap_sem){----}, at: [<ffffffff805e3e15>] do_page_fault+0x255/0x890
>>
>> which lock already depends on the new lock.
>>
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #2 (&mm->mmap_sem){----}:
>>       [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020
>>       [<ffffffff802793f5>] lock_acquire+0x65/0x90
>>       [<ffffffff805df5ab>] down_read+0x3b/0x70
>>       [<ffffffff805e3e3c>] do_page_fault+0x27c/0x890
>>       [<ffffffff805e16cd>] error_exit+0x0/0xa9
>>       [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> -> #1 (&(&ip->i_iolock)->mr_lock){----}:
>>       [<ffffffff80278f4d>] __lock_acquire+0xbdd/0x1020
>>       [<ffffffff802793f5>] lock_acquire+0x65/0x90
>>       [<ffffffff8026d746>] down_write_nested+0x46/0x80
>>       [<ffffffff8039df29>] xfs_ilock+0x99/0xa0
>>       [<ffffffff8039e0cf>] xfs_ireclaim+0x3f/0x90
>>       [<ffffffff803ba889>] xfs_finish_reclaim+0x59/0x1a0
>>       [<ffffffff803bc199>] xfs_reclaim+0x109/0x110
>>       [<ffffffff803c9541>] xfs_fs_clear_inode+0xe1/0x110
>>       [<ffffffff802d906d>] clear_inode+0x7d/0x110
>>       [<ffffffff802d93aa>] dispose_list+0x2a/0x100
>>       [<ffffffff802d96af>] shrink_icache_memory+0x22f/0x290
>>       [<ffffffff8029d868>] shrink_slab+0x168/0x1d0
>>       [<ffffffff8029e0b6>] kswapd+0x3b6/0x560
>>       [<ffffffff8026921d>] kthread+0x4d/0x80
>>       [<ffffffff80227428>] child_rip+0xa/0x12
>>       [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> -> #0 (iprune_mutex){--..}:
>>       [<ffffffff80278db7>] __lock_acquire+0xa47/0x1020
>>       [<ffffffff802793f5>] lock_acquire+0x65/0x90
>>       [<ffffffff805dedd5>] mutex_lock_nested+0xb5/0x300
>>       [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290
>>       [<ffffffff8029d868>] shrink_slab+0x168/0x1d0
>>       [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0
>>       [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0
>>       [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10
>>       [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0
>>       [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0
>>       [<ffffffff805e3ecc>] do_page_fault+0x30c/0x890
>>       [<ffffffff805e16cd>] error_exit+0x0/0xa9
>>       [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> other info that might help us debug this:
>>
>> 2 locks held by AutopanoPro/4470:
>>  #0:  (&mm->mmap_sem){----}, at: [<ffffffff805e3e15>] 
>> do_page_fault+0x255/0x890
>>  #1:  (shrinker_rwsem){----}, at: [<ffffffff8029d732>] shrink_slab+0x32/0x1d0
>>
>> stack backtrace:
>> Pid: 4470, comm: AutopanoPro Not tainted 2.6.26-rc7-210c #2
>>
>> Call Trace:
>>  [<ffffffff80276823>] print_circular_bug_tail+0x83/0x90
>>  [<ffffffff80275e09>] ? print_circular_bug_entry+0x49/0x60
>>  [<ffffffff80278db7>] __lock_acquire+0xa47/0x1020
>>  [<ffffffff802793f5>] lock_acquire+0x65/0x90
>>  [<ffffffff802d94fd>] ? shrink_icache_memory+0x7d/0x290
>>  [<ffffffff805dedd5>] mutex_lock_nested+0xb5/0x300
>>  [<ffffffff802d94fd>] ? shrink_icache_memory+0x7d/0x290
>>  [<ffffffff802d94fd>] shrink_icache_memory+0x7d/0x290
>>  [<ffffffff8029d732>] ? shrink_slab+0x32/0x1d0
>>  [<ffffffff8029d868>] shrink_slab+0x168/0x1d0
>>  [<ffffffff8029db38>] try_to_free_pages+0x268/0x3a0
>>  [<ffffffff8029c240>] ? isolate_pages_global+0x0/0x40
>>  [<ffffffff802979d6>] __alloc_pages_internal+0x206/0x4b0
>>  [<ffffffff80297c89>] __alloc_pages_nodemask+0x9/0x10
>>  [<ffffffff802b2bc2>] alloc_page_vma+0x72/0x1b0
>>  [<ffffffff802a3642>] handle_mm_fault+0x462/0x7b0
>>  [<ffffffff80277e2f>] ? trace_hardirqs_on+0xbf/0x150
>>  [<ffffffff805e3e15>] ? do_page_fault+0x255/0x890
>>  [<ffffffff805e3ecc>] do_page_fault+0x30c/0x890
>>  [<ffffffff805e16cd>] error_exit+0x0/0xa9
>> --
>> Daniel J Blueman
>>
>
> --
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab
-- 
Daniel J Blueman


<Prev in Thread] Current Thread [Next in Thread>