xfs
[Top] [All Lists]

Re: [PATCH] xfs: fix _xfs_buf_find oops on blocks beyond the filesystem

To: Mark Tinguely <tinguely@xxxxxxx>
Subject: Re: [PATCH] xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Tue, 8 Jan 2013 09:41:54 +1100
Cc: Ben Myers <bpm@xxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <50EAE554.5020504@xxxxxxx>
References: <1355870625-26638-1-git-send-email-david@xxxxxxxxxxxxx> <20130103212222.GJ27055@xxxxxxx> <20130103220244.GH3120@dastard> <50EAE554.5020504@xxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Mon, Jan 07, 2013 at 09:10:12AM -0600, Mark Tinguely wrote:
> On 01/03/13 16:02, Dave Chinner wrote:
> >On Thu, Jan 03, 2013 at 03:22:22PM -0600, Ben Myers wrote:
> >>Dave,
> >>
> >>On Wed, Dec 19, 2012 at 09:43:45AM +1100, Dave Chinner wrote:
> >>>From: Dave Chinner<dchinner@xxxxxxxxxx>
> >>>
> >>>When _xfs_buf_find is passed an out of range address, it will fail
> >>>to find a relevant struct xfs_perag and oops with a null
> >>>dereference. This can happen when trying to walk a filesystem with a
> >>>metadata inode that has a partially corrupted extent map (i.e. the
> >>>block number returned is corrupt, but is otherwise intact) and we
> >>>try to read from the corrupted block address.
> >>>
> >>>In this case, just fail the lookup. If it is readahead being issued,
> >>>it will simply not be done, but if it is real read that fails we
> >>>will get an error being reported.  Ideally this case should result
> >>>in an EFSCORRUPTED error being reported, but we cannot return an
> >>>error through xfs_buf_read() or xfs_buf_get() so this lookup failure
> >>>may result in ENOMEM or EIO errors being reported instead.
> >>>
> >>>Signed-off-by: Dave Chinner<dchinner@xxxxxxxxxx>
> >>>---
> >>>  fs/xfs/xfs_buf.c |   18 ++++++++++++++++++
> >>>  1 file changed, 18 insertions(+)
> >>>
> >>>diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> >>>index a80195b..16249d9 100644
> >>>--- a/fs/xfs/xfs_buf.c
> >>>+++ b/fs/xfs/xfs_buf.c
> >>>@@ -487,6 +487,7 @@ _xfs_buf_find(
> >>>   struct rb_node          *parent;
> >>>   xfs_buf_t               *bp;
> >>>   xfs_daddr_t             blkno = map[0].bm_bn;
> >>>+  xfs_daddr_t             eofs;
> >>>   int                     numblks = 0;
> >>>   int                     i;
> >>>
> >>>@@ -498,6 +499,23 @@ _xfs_buf_find(
> >>>   ASSERT(!(numbytes<  (1<<  btp->bt_sshift)));
> >>>   ASSERT(!(BBTOB(blkno)&  (xfs_off_t)btp->bt_smask));
> >>>
> >>>+  /*
> >>>+   * Corrupted block numbers can get through to here, unfortunately, so we
> >>>+   * have to check that the buffer falls within the filesystem bounds.
> >>>+   */
> >>>+  eofs = XFS_FSB_TO_BB(btp->bt_mount, btp->bt_mount->m_sb.sb_dblocks);
> >>>+  if (blkno>= eofs || blkno + numblks>  eofs) {
> >>                    ^^^^^^^^^^^^^^^^^^^^^^
> >>
> >>That looks suspect to me.  I think you need to go over each buffer
> >>individually.
> >
> >I'm not trying to validate every single part of a buffer here -
> >there is no need to do that as the block numbers are validated
> >against device overruns during IO. i.e. we'll get an EIO and a log
> >message telling us an attempt to access beyond the end of the device
> >occurring during IO.
> >
> >I.e. we aren't doing validity checks on whether a buffer has a sane
> >block number or not (that's up to the caller), what we are
> >avoiding is attempting to look up a buffer that is outside of the
> >range of the cache indexing. i.e. it's validating the cache index we
> >are about to use, not passing judgement on whether the caller has
> >asked for a valid set of blocks or not.
> 
> I did not like the second part of the if statement because first
> block number in a "discontiguous" buffer does not have to be the
> lowest block number.

Again, _xfs_buf_find() does not care about secondary blocks in
discontiguous buffers. It cares about the parameters of the cached
items - which are described by the {blkno, numblks} tuple - and
doesn't care about sub-block ranges in discontiguous buffers at all.

> The first half of the if statement alone would prevent the oops.

Right, that fixes the initial cache index lookup problem - I wanted
to validate the entire tuple. But you're right in that discontigous
buffers could have blkno + numblks beyond EOFS so validating cache
item ranges like this is wrong. I'll drop that check.

Cheers,

Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>