xfs
[Top] [All Lists]

RE: nfsd lockups with xfs during SPEC SFS testing

To: "'Steve Lord'" <lord@xxxxxxx>, "HABBINGA,ERIK (HP-Loveland,ex1)" <erik_habbinga@xxxxxx>
Subject: RE: nfsd lockups with xfs during SPEC SFS testing
From: "HABBINGA,ERIK (HP-Loveland,ex1)" <erik_habbinga@xxxxxx>
Date: Thu, 31 Jan 2002 14:38:46 -0800
Cc: "'linux-xfs@xxxxxxxxxxx'" <linux-xfs@xxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
Steve,
   A co-worker of mine sent me a patch containing the CVS bits at 3:21pm
Mountain time 1/30/02, he started the CVS download at 2:37pm Mountain time
1/30/02 and ended it at 2:38pm Mountain time 1/30/02.  I've been working on
a patch to remove the BKL from the nfsd process, and have been seeing
lockups in the xfs code.  I saw these lockups with XFS CVS downloads from
1/10/02 and 1/18/02.  I finally started running SPEC tests with out my
nfsd-BKL-removal patch and still got lockups in the XFS code.  So, I don't
think this is a regression.

I ran another test this morning, and got a different profile of lockups.
I've attached the decoded output from alt-sysrq.  kupdated is locked up in
xlog_grant_log_space, and all the nfsd processes are locked up either in:

- fh_lock (all the nfsd3_proc_create->stext_lock->__down_failed->__down
cases)
- nfsd_sync (the nfsd_commit->stext_lock->__down_failed->__down cases)
- pagebuf_grab_lock (the
_pagebuf_find_lockable_buffer->stext_lock->__down_failed->__down cases)
- lock_wait (the xfs_access->mraccessf->lock_wait cases)
- xlog_grant_log_space

I'll help anyway I can to track this problem down.

Erik

> -----Original Message-----
> From: Steve Lord [mailto:lord@xxxxxxx]
> Sent: Thursday, January 31, 2002 3:16 PM
> To: HABBINGA,ERIK " "(HP-Loveland,ex1)
> Cc: 'linux-xfs@xxxxxxxxxxx'
> Subject: Re: nfsd lockups with xfs during SPEC SFS testing
> 
> 
> On Thu, 2002-01-31 at 11:59, HABBINGA,ERIK (HP-Loveland,ex1) wrote:
> > I'm running linux 2.4.17 with a version of XFS downloaded 
> via CVS on Jan
> > 30th.  When I run the SPEC SFS NFS test against this 
> kernel, nfsd stops
> > responding after awhile.  I captured the state of all of the system
> > processes via magic sysrq, and found 24 nfsd processes 
> locked up in various
> > stages of the nfsd_lookup code:
> > 
> > - 20 of them were locked up in the fh_lock call before 
> lookup_one_len in
> > nfsd_lookup().  
> > - 2 processes were locked up in the _pagebuf_grab_lock call inside
> > _pagebuf_find_lockable_buffer().
> > - 2 processes were locked up in the pagebuf_iowait() call in
> > pagebuf_iostart()
> > 
> > Any ideas on what may be wrong, and how I can help debug 
> and solve this
> > problem?  I've attached the call traces for the locked up 
> nfsd processes.  I
> > can provide vmlinux and System.map for this kernel to help 
> debugging.
> 
> Is this a regression, i.e. did it used to work? And can you 
> say when on
> the 30th?
> 
> Thanks
> 
>    Steve
> 
> 
> -- 
> 
> Steve Lord                                      voice: +1-651-683-3511
> Principal Engineer, Filesystem Software         email: lord@xxxxxxx
> 

Attachment: sysrq_log.txt.out
Description: Binary data

<Prev in Thread] Current Thread [Next in Thread>