xfs
[Top] [All Lists]

Re: [PATCH] xfs: fix _xfs_buf_find oops on blocks beyond the filesystem

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH] xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end
From: Mark Tinguely <tinguely@xxxxxxx>
Date: Mon, 07 Jan 2013 09:10:12 -0600
Cc: Ben Myers <bpm@xxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <20130103220244.GH3120@dastard>
References: <1355870625-26638-1-git-send-email-david@xxxxxxxxxxxxx> <20130103212222.GJ27055@xxxxxxx> <20130103220244.GH3120@dastard>
User-agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120122 Thunderbird/9.0
On 01/03/13 16:02, Dave Chinner wrote:
On Thu, Jan 03, 2013 at 03:22:22PM -0600, Ben Myers wrote:
Dave,

On Wed, Dec 19, 2012 at 09:43:45AM +1100, Dave Chinner wrote:
From: Dave Chinner<dchinner@xxxxxxxxxx>

When _xfs_buf_find is passed an out of range address, it will fail
to find a relevant struct xfs_perag and oops with a null
dereference. This can happen when trying to walk a filesystem with a
metadata inode that has a partially corrupted extent map (i.e. the
block number returned is corrupt, but is otherwise intact) and we
try to read from the corrupted block address.

In this case, just fail the lookup. If it is readahead being issued,
it will simply not be done, but if it is real read that fails we
will get an error being reported.  Ideally this case should result
in an EFSCORRUPTED error being reported, but we cannot return an
error through xfs_buf_read() or xfs_buf_get() so this lookup failure
may result in ENOMEM or EIO errors being reported instead.

Signed-off-by: Dave Chinner<dchinner@xxxxxxxxxx>
---
  fs/xfs/xfs_buf.c |   18 ++++++++++++++++++
  1 file changed, 18 insertions(+)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index a80195b..16249d9 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -487,6 +487,7 @@ _xfs_buf_find(
        struct rb_node          *parent;
        xfs_buf_t               *bp;
        xfs_daddr_t             blkno = map[0].bm_bn;
+       xfs_daddr_t             eofs;
        int                     numblks = 0;
        int                     i;

@@ -498,6 +499,23 @@ _xfs_buf_find(
        ASSERT(!(numbytes<  (1<<  btp->bt_sshift)));
        ASSERT(!(BBTOB(blkno)&  (xfs_off_t)btp->bt_smask));

+       /*
+        * Corrupted block numbers can get through to here, unfortunately, so we
+        * have to check that the buffer falls within the filesystem bounds.
+        */
+       eofs = XFS_FSB_TO_BB(btp->bt_mount, btp->bt_mount->m_sb.sb_dblocks);
+       if (blkno>= eofs || blkno + numblks>  eofs) {
                        ^^^^^^^^^^^^^^^^^^^^^^

That looks suspect to me.  I think you need to go over each buffer
individually.

I'm not trying to validate every single part of a buffer here -
there is no need to do that as the block numbers are validated
against device overruns during IO. i.e. we'll get an EIO and a log
message telling us an attempt to access beyond the end of the device
occurring during IO.

I.e. we aren't doing validity checks on whether a buffer has a sane
block number or not (that's up to the caller), what we are
avoiding is attempting to look up a buffer that is outside of the
range of the cache indexing. i.e. it's validating the cache index we
are about to use, not passing judgement on whether the caller has
asked for a valid set of blocks or not.

I did not like the second part of the if statement because first block number in a "discontiguous" buffer does not have to be the lowest block number.

The first half of the if statement alone would prevent the oops. It seems to me that if a length check is desired to see if the first segment is valid, then the correct thing is to use the first segment length; something like:

        if (blkno >= eofs || blkno + map[0].bm_len >= eofs)
                ...

I bounced it off Mark and this was his suggestion:

  for (i = 0; i<  nmaps; i++) {
          if (map[i].bm_bn>= eofs ||
                     map[i].bm_bn + map[i].bm_len>= eofs)
                         ...

Sure, that would work, but we really don't care about the secondary
block numbers here - there are completely unused by the buffer cache
except for when IO is issued. And given that _xfs_buf_find is
probably the hottest function in the XFS code base, avoiding
unnecessary checks is somewhat important...

Cheers,

Dave.


--Mark.

<Prev in Thread] Current Thread [Next in Thread>