xfs
[Top] [All Lists]

Re: [PATCH] xfs: shutdown filesystem if xfs_perag_get fails

To: Chandra Seetharaman <sekharan@xxxxxxxxxx>
Subject: Re: [PATCH] xfs: shutdown filesystem if xfs_perag_get fails
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 26 Apr 2013 11:32:04 +1000
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, Mark Tinguely <tinguely@xxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1366929706.4098.6.camel@xxxxxxxxxxxxxxxxxx>
References: <20130421174107.007313126@xxxxxxx> <5174603A.8030208@xxxxxxxxxxx> <51753EDE.6000301@xxxxxxx> <51754A13.5000808@xxxxxxxxxxx> <5175532B.3050509@xxxxxxx> <20130422233033.GK30622@dastard> <51769111.3050103@xxxxxxx> <1366732475.3762.32402.camel@xxxxxxxxxxxxxxxxxx> <20130423204956.GM10481@dastard> <1366929706.4098.6.camel@xxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Apr 25, 2013 at 05:41:46PM -0500, Chandra Seetharaman wrote:
> In which case something along the lines of
> 
> --- 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 3806088..3fb2fa6 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -203,7 +203,13 @@ xfs_perag_get(struct xfs_mount *mp, xfs_agnumber_t
> agno)
>         if (pag) {
>                 ASSERT(atomic_read(&pag->pag_ref) >= 0);
>                 ref = atomic_inc_return(&pag->pag_ref);
> -       }
> +       } else 
> +               /*
> +                * xfs_perag_get() is called with invalid agno,
> +                * which cannot happen. This indicates a problem
> +                * in the calling code.
> +                */
> +               BUG();

That's btrfs-style error handling. ;)

In reality, your patch is much worse than just letting the caller
dereference a null pointer, because it happens inside an RCU read
lock. That's guaranteed to hang the system, not just have the thread
that triggers a null pointer dereference go away. And further....

>         rcu_read_unlock();
>         trace_xfs_perag_get(mp, agno, ref, _RET_IP_);

... it means we can't catch the tracepoint with the bad agno in it.

>         return pag;
> --------
> 
> would be useful ?. Since we have a NULL pag, we will trip somewhere
> else. At least with this, there is a pointer to the debugger/sysadmin
> about where/what to look for (may be with more valuable/correct comment
> than above).

The current failure case is pretty damn obvious, so adding a BUG
doesn't make it any better. In fact, it makes it worse because it
prevents a caller from being able to handle a NULL perag return....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>