Steve,
A co-worker of mine sent me a patch containing the CVS bits at 3:21pm
Mountain time 1/30/02, he started the CVS download at 2:37pm Mountain time
1/30/02 and ended it at 2:38pm Mountain time 1/30/02. I've been working on
a patch to remove the BKL from the nfsd process, and have been seeing
lockups in the xfs code. I saw these lockups with XFS CVS downloads from
1/10/02 and 1/18/02. I finally started running SPEC tests with out my
nfsd-BKL-removal patch and still got lockups in the XFS code. So, I don't
think this is a regression.
I ran another test this morning, and got a different profile of lockups.
I've attached the decoded output from alt-sysrq. kupdated is locked up in
xlog_grant_log_space, and all the nfsd processes are locked up either in:
- fh_lock (all the nfsd3_proc_create->stext_lock->__down_failed->__down
cases)
- nfsd_sync (the nfsd_commit->stext_lock->__down_failed->__down cases)
- pagebuf_grab_lock (the
_pagebuf_find_lockable_buffer->stext_lock->__down_failed->__down cases)
- lock_wait (the xfs_access->mraccessf->lock_wait cases)
- xlog_grant_log_space
I'll help anyway I can to track this problem down.
Erik
> -----Original Message-----
> From: Steve Lord [mailto:lord@xxxxxxx]
> Sent: Thursday, January 31, 2002 3:16 PM
> To: HABBINGA,ERIK " "(HP-Loveland,ex1)
> Cc: 'linux-xfs@xxxxxxxxxxx'
> Subject: Re: nfsd lockups with xfs during SPEC SFS testing
>
>
> On Thu, 2002-01-31 at 11:59, HABBINGA,ERIK (HP-Loveland,ex1) wrote:
> > I'm running linux 2.4.17 with a version of XFS downloaded
> via CVS on Jan
> > 30th. When I run the SPEC SFS NFS test against this
> kernel, nfsd stops
> > responding after awhile. I captured the state of all of the system
> > processes via magic sysrq, and found 24 nfsd processes
> locked up in various
> > stages of the nfsd_lookup code:
> >
> > - 20 of them were locked up in the fh_lock call before
> lookup_one_len in
> > nfsd_lookup().
> > - 2 processes were locked up in the _pagebuf_grab_lock call inside
> > _pagebuf_find_lockable_buffer().
> > - 2 processes were locked up in the pagebuf_iowait() call in
> > pagebuf_iostart()
> >
> > Any ideas on what may be wrong, and how I can help debug
> and solve this
> > problem? I've attached the call traces for the locked up
> nfsd processes. I
> > can provide vmlinux and System.map for this kernel to help
> debugging.
>
> Is this a regression, i.e. did it used to work? And can you
> say when on
> the 30th?
>
> Thanks
>
> Steve
>
>
> --
>
> Steve Lord voice: +1-651-683-3511
> Principal Engineer, Filesystem Software email: lord@xxxxxxx
>
sysrq_log.txt.out
Description: Binary data
|