xfs
[Top] [All Lists]

Re: panic on 4.20 server exporting xfs filesystem

To: Christoph Hellwig <hch@xxxxxx>
Subject: Re: panic on 4.20 server exporting xfs filesystem
From: Kinglong Mee <kinglongmee@xxxxxxxxx>
Date: Fri, 20 Mar 2015 12:06:18 +0800
Cc: "J. Bruce Fields" <bfields@xxxxxxxxxxxx>, Dave Chinner <david@xxxxxxxxxxxxx>, Eric Sandeen <sandeen@xxxxxxxxxxx>, Linux NFS Mailing List <linux-nfs@xxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=LoEc3MD5if7U/iv9F9B0HpSqC36b5ipTSGQZ+KDVwVM=; b=0p+LJJRLSdO/pgkoDItRugk+i61xJabdl1eyqR5w+dYBHxSFAFIMpk4NDvenCQkAkO ey25V43FD+SobJ6rM7b3BdySAXHE3V5V+BNWjIMMTVxBZbn+u9E15pIh2CMZ5IZIb17i Fa6UVM09jkmWuFYpFbzQSaQL5ppNGRXfNveugzCfshs3mcSAoYT3uPI+uGOrZ42me/0q CSoS72G/gb8lP4TMCNmgwnfllngvL9GxFEaH6kwhwSA7s4hp+pGnHNkMaBC2NrgIj9Ln 3rslL1UodGoP+Lz+pTyyT0K011+IEDUS0iCLvvPeKNNzam5WvNb27ytWPPj5MDIu25pC hQCQ==
In-reply-to: <20150305131731.GA16235@xxxxxx>
References: <20150303221033.GB19439@xxxxxxxxxxxx> <20150303224456.GV4251@dastard> <20150304020826.GD19439@xxxxxxxxxxxx> <20150304155421.GE1627@xxxxxxxxxxxx> <20150304220900.GX18360@dastard> <20150304222709.GI1627@xxxxxxxxxxxx> <20150304224557.GY4251@dastard> <54F78BE5.1020608@xxxxxxxxxxx> <20150304225623.GZ4251@dastard> <20150305040849.GJ1627@xxxxxxxxxxxx> <20150305131731.GA16235@xxxxxx>
On Thu, Mar 5, 2015 at 9:17 PM, Christoph Hellwig <hch@xxxxxx> wrote:
> On Wed, Mar 04, 2015 at 11:08:49PM -0500, J. Bruce Fields wrote:
>> Ah-hah:
>>
>>       static void
>>       nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)
>>       {
>>               ...
>>               nfsd4_cb_layout_fail(ls);
>>
>> That'd do it!
>>
>> Haven't tried to figure out why exactly that's getting called, and why
>> only rarely.  Some intermittent problem with the callback path, I guess.
>>
>> Anyway, I think that solves most of the mystery....
>
> Ooops, that was a nasty git merge error in the last rebase, see the fix
> below.  But I really wonder if we need to make the usage of pnfs explicit
> after all, othterwise we'll always hand out layouts on any XFS-exported
> filesystems, which can't be used and will eventually need to be recalled.
>
> ---
> From ad592590cce9f7441c3cd21d030f3a986d8759d7 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@xxxxxx>
> Date: Thu, 5 Mar 2015 06:12:29 -0700
> Subject: nfsd: don't recursively call nfsd4_cb_layout_fail
>
> Due to a merge error when creating c5c707f9 ("nfsd: implement pNFS
> layout recalls"), we recursivelt call nfsd4_cb_layout_fail from itself,
> leading to stack overflows.
>
> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
> ---
>  fs/nfsd/nfs4layouts.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
> index 3c1bfa1..1028a06 100644
> --- a/fs/nfsd/nfs4layouts.c
> +++ b/fs/nfsd/nfs4layouts.c
> @@ -587,8 +587,6 @@ nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)
>
>         rpc_ntop((struct sockaddr *)&clp->cl_addr, addr_str, 
> sizeof(addr_str));
>
> -       nfsd4_cb_layout_fail(ls);
> -

Maybe you want adding "trace_layout_recall_fail(&ls->ls_stid.sc_stateid);" here?
I think the following is better,

@@ -587,7 +587,7 @@ nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)

        rpc_ntop((struct sockaddr *)&clp->cl_addr, addr_str, sizeof(addr_str));

-       nfsd4_cb_layout_fail(ls);
+       trace_layout_recall_fail(&ls->ls_stid.sc_stateid);

        printk(KERN_WARNING
                "nfsd: client %s failed to respond to layout recall. "

thanks,
Kinglong Mee

<Prev in Thread] Current Thread [Next in Thread>