xfs
[Top] [All Lists]

Re: [PATCH] xfs: fix _xfs_buf_find oops on blocks beyond the filesystem

To: Ben Myers <bpm@xxxxxxx>
Subject: Re: [PATCH] xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 4 Jan 2013 09:02:45 +1100
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20130103212222.GJ27055@xxxxxxx>
References: <1355870625-26638-1-git-send-email-david@xxxxxxxxxxxxx> <20130103212222.GJ27055@xxxxxxx>
Resent-date: Fri, 4 Jan 2013 10:27:01 -0600
Resent-from: Ben Myers <bpm@xxxxxxx>
Resent-message-id: <20130104162701.GB30652@xxxxxxx>
Resent-to: xfs@xxxxxxxxxxx
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Jan 03, 2013 at 03:22:22PM -0600, Ben Myers wrote:
> Dave,
> 
> On Wed, Dec 19, 2012 at 09:43:45AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > When _xfs_buf_find is passed an out of range address, it will fail
> > to find a relevant struct xfs_perag and oops with a null
> > dereference. This can happen when trying to walk a filesystem with a
> > metadata inode that has a partially corrupted extent map (i.e. the
> > block number returned is corrupt, but is otherwise intact) and we
> > try to read from the corrupted block address.
> > 
> > In this case, just fail the lookup. If it is readahead being issued,
> > it will simply not be done, but if it is real read that fails we
> > will get an error being reported.  Ideally this case should result
> > in an EFSCORRUPTED error being reported, but we cannot return an
> > error through xfs_buf_read() or xfs_buf_get() so this lookup failure
> > may result in ENOMEM or EIO errors being reported instead.
> > 
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > ---
> >  fs/xfs/xfs_buf.c |   18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index a80195b..16249d9 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -487,6 +487,7 @@ _xfs_buf_find(
> >     struct rb_node          *parent;
> >     xfs_buf_t               *bp;
> >     xfs_daddr_t             blkno = map[0].bm_bn;
> > +   xfs_daddr_t             eofs;
> >     int                     numblks = 0;
> >     int                     i;
> >  
> > @@ -498,6 +499,23 @@ _xfs_buf_find(
> >     ASSERT(!(numbytes < (1 << btp->bt_sshift)));
> >     ASSERT(!(BBTOB(blkno) & (xfs_off_t)btp->bt_smask));
> >  
> > +   /*
> > +    * Corrupted block numbers can get through to here, unfortunately, so we
> > +    * have to check that the buffer falls within the filesystem bounds.
> > +    */
> > +   eofs = XFS_FSB_TO_BB(btp->bt_mount, btp->bt_mount->m_sb.sb_dblocks);
> > +   if (blkno >= eofs || blkno + numblks > eofs) {
>                            ^^^^^^^^^^^^^^^^^^^^^^
> 
> That looks suspect to me.  I think you need to go over each buffer
> individually.

I'm not trying to validate every single part of a buffer here -
there is no need to do that as the block numbers are validated
against device overruns during IO. i.e. we'll get an EIO and a log
message telling us an attempt to access beyond the end of the device
occurring during IO.

I.e. we aren't doing validity checks on whether a buffer has a sane
block number or not (that's up to the caller), what we are
avoiding is attempting to look up a buffer that is outside of the
range of the cache indexing. i.e. it's validating the cache index we
are about to use, not passing judgement on whether the caller has
asked for a valid set of blocks or not.

> I bounced it off Mark and this was his suggestion:
> 
>  for (i = 0; i < nmaps; i++) {
>          if (map[i].bm_bn >= eofs ||
>                    map[i].bm_bn + map[i].bm_len >= eofs)
>                        ...

Sure, that would work, but we really don't care about the secondary
block numbers here - there are completely unused by the buffer cache
except for when IO is issued. And given that _xfs_buf_find is
probably the hottest function in the XFS code base, avoiding
unnecessary checks is somewhat important...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>