xfs
[Top] [All Lists]

RE: nfsd lockups with xfs during SPEC SFS testing

To: "'Steve Lord '" <lord@xxxxxxx>, "HABBINGA,ERIK (HP-Loveland,ex1)" <erik_habbinga@xxxxxx>
Subject: RE: nfsd lockups with xfs during SPEC SFS testing
From: "HABBINGA,ERIK (HP-Loveland,ex1)" <erik_habbinga@xxxxxx>
Date: Mon, 18 Feb 2002 11:32:33 -0800
Cc: "''linux-xfs@xxxxxxxxxxx' '" <linux-xfs@xxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
Steve + crew,

Here's the latest list of our lockups during SPEC SFS NFS testing.  Two
consecutive tests locked up in lock_wait, either through mraccessf or
mrupdatef.  What are these functions trying to accomplish, and who could be
holding the locks that mraccessf/mrupdatef are waiting for?

Here are the call traces from magic_sysrq for the locked up processes for
the two tests:

run #1
task: kswapd (pid: 7)
        c01e60ed: c01e6040 T lock_wait
        c01e61dd: c01e61b4 T mrupdatef
        c01d5329: c01d5310 T xfs_finish_reclaim
        c01b88d5: c01b8870 T xfs_ilock_ra
        c01b8913: c01b8900 T xfs_ilock
        c01d5329: c01d5310 T xfs_finish_reclaim
        c01d5329: c01d5310 T xfs_finish_reclaim
        c01d52ec: c01d5110 t xfs_reclaim
        c01e5088: c01e5068 T vn_reclaim
        c01e54ad: c01e541c T vn_purge
        c01e55f5: c01e55b0 T vn_remove
        c01e44db: c01e44c0 T linvfs_clear_inode
        c01476a4: c014761c T clear_inode
        c014771f: c01476e4 t dispose_list
        c014796c: c01478b4 T prune_icache
        c01479a7: c014798c T shrink_icache_memory
        c012d0e9: c012d07c t shrink_caches
        c012d13c: c012d100 T try_to_free_pages
        c012d1d3: c012d190 t kswapd_balance_pgdat
        c012d22e: c012d21c t kswapd_balance
        c012d33d: c012d2a4 T kswapd
        c0105594: c010556c T kernel_thread
 
run #2:
        c01e60ed: c01e6040 T lock_wait
        c01e619c: c01e614c T mraccessf
        c01cf9d4: c01cf990 T xfs_getattr
        c01b88f6: c01b8870 T xfs_ilock_ra
        c01b8913: c01b8900 T xfs_ilock
        c01cf9d4: c01cf990 T xfs_getattr
        c01cf9d4: c01cf990 T xfs_getattr
        c01e5347: c01e5310 T vn_revalidate
        c01cd57a: c01cd448 T xfs_dir_lookup_int
        c01b8a1f: c01b89dc T xfs_iunlock
        c01b886b: c01b885c T xfs_iunlock_map_shared
        c01d1baf: c01d1ac8 t xfs_lookup
        c01dfe5e: c01dfe48 T linvfs_revalidate_core
        c01df6ce: c01df634 T linvfs_lookup
        c013e145: c013e098 T lookup_hash
        c013e1eb: c013e194 T lookup_one_len
        c016262d: c0162360 T nfsd_lookup
        c0167fe8: c0167f14 t nfsd3_proc_lookup
        c015f943: c015f874 t nfsd_dispatch
        c02bd765: c02bd4d8 T svc_process
        c015f6e9: c015f49c t nfsd
        c0105594: c010556c T kernel_thread

Both locked processes accessed mrupdatef and mraccessf through the following
code in fs/xfs/xfs_iget.c:xfs_ilock_ra()

        if (lock_flags & XFS_ILOCK_EXCL) {
                mrupdatef(&ip->i_lock, PLTWAIT);
                ip->i_ilock_ra = return_address;
        } else if (lock_flags & XFS_ILOCK_SHARED) {
                mraccessf(&ip->i_lock, PLTWAIT);
        }  


-----Original Message-----
From: Steve Lord
To: HABBINGA,ERIK   " "(HP-Loveland,ex1)
Cc: 'linux-xfs@xxxxxxxxxxx'
Sent: 2/1/02 10:34 AM
Subject: RE: nfsd lockups with xfs during SPEC SFS testing

On Thu, 2002-01-31 at 16:38, HABBINGA,ERIK (HP-Loveland,ex1) wrote:
> Steve,
>    A co-worker of mine sent me a patch containing the CVS bits at
3:21pm
> Mountain time 1/30/02, he started the CVS download at 2:37pm Mountain
time
> 1/30/02 and ended it at 2:38pm Mountain time 1/30/02.  I've been
working on
> a patch to remove the BKL from the nfsd process, and have been seeing
> lockups in the xfs code.  I saw these lockups with XFS CVS downloads
from
> 1/10/02 and 1/18/02.  I finally started running SPEC tests with out my
> nfsd-BKL-removal patch and still got lockups in the XFS code.  So, I
don't
> think this is a regression.
> 
> I ran another test this morning, and got a different profile of
lockups.
> I've attached the decoded output from alt-sysrq.  kupdated is locked
up in
> xlog_grant_log_space, and all the nfsd processes are locked up either
in:
> 
> - fh_lock (all the
nfsd3_proc_create->stext_lock->__down_failed->__down
> cases)
> - nfsd_sync (the nfsd_commit->stext_lock->__down_failed->__down cases)
> - pagebuf_grab_lock (the
> _pagebuf_find_lockable_buffer->stext_lock->__down_failed->__down
cases)
> - lock_wait (the xfs_access->mraccessf->lock_wait cases)
> - xlog_grant_log_space
> 
> I'll help anyway I can to track this problem down.
> 
> Erik
> 

Are you using anything unusual as the block device here? I would be
suspicious of it not coming back with I/O completions. Basically
all the places threads are waiting are places we wait blocked for
a read or a write to complete. If you have other filesystems on
the same device, can you do I/O to them?

Steve
 
-- 

Steve Lord                                      voice: +1-651-683-3511
Principal Engineer, Filesystem Software         email: lord@xxxxxxx


<Prev in Thread] Current Thread [Next in Thread>