[PATCH] xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end
Mark Tinguely
tinguely at sgi.com
Mon Jan 7 09:10:12 CST 2013
On 01/03/13 16:02, Dave Chinner wrote:
> On Thu, Jan 03, 2013 at 03:22:22PM -0600, Ben Myers wrote:
>> Dave,
>>
>> On Wed, Dec 19, 2012 at 09:43:45AM +1100, Dave Chinner wrote:
>>> From: Dave Chinner<dchinner at redhat.com>
>>>
>>> When _xfs_buf_find is passed an out of range address, it will fail
>>> to find a relevant struct xfs_perag and oops with a null
>>> dereference. This can happen when trying to walk a filesystem with a
>>> metadata inode that has a partially corrupted extent map (i.e. the
>>> block number returned is corrupt, but is otherwise intact) and we
>>> try to read from the corrupted block address.
>>>
>>> In this case, just fail the lookup. If it is readahead being issued,
>>> it will simply not be done, but if it is real read that fails we
>>> will get an error being reported. Ideally this case should result
>>> in an EFSCORRUPTED error being reported, but we cannot return an
>>> error through xfs_buf_read() or xfs_buf_get() so this lookup failure
>>> may result in ENOMEM or EIO errors being reported instead.
>>>
>>> Signed-off-by: Dave Chinner<dchinner at redhat.com>
>>> ---
>>> fs/xfs/xfs_buf.c | 18 ++++++++++++++++++
>>> 1 file changed, 18 insertions(+)
>>>
>>> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
>>> index a80195b..16249d9 100644
>>> --- a/fs/xfs/xfs_buf.c
>>> +++ b/fs/xfs/xfs_buf.c
>>> @@ -487,6 +487,7 @@ _xfs_buf_find(
>>> struct rb_node *parent;
>>> xfs_buf_t *bp;
>>> xfs_daddr_t blkno = map[0].bm_bn;
>>> + xfs_daddr_t eofs;
>>> int numblks = 0;
>>> int i;
>>>
>>> @@ -498,6 +499,23 @@ _xfs_buf_find(
>>> ASSERT(!(numbytes< (1<< btp->bt_sshift)));
>>> ASSERT(!(BBTOB(blkno)& (xfs_off_t)btp->bt_smask));
>>>
>>> + /*
>>> + * Corrupted block numbers can get through to here, unfortunately, so we
>>> + * have to check that the buffer falls within the filesystem bounds.
>>> + */
>>> + eofs = XFS_FSB_TO_BB(btp->bt_mount, btp->bt_mount->m_sb.sb_dblocks);
>>> + if (blkno>= eofs || blkno + numblks> eofs) {
>> ^^^^^^^^^^^^^^^^^^^^^^
>>
>> That looks suspect to me. I think you need to go over each buffer
>> individually.
>
> I'm not trying to validate every single part of a buffer here -
> there is no need to do that as the block numbers are validated
> against device overruns during IO. i.e. we'll get an EIO and a log
> message telling us an attempt to access beyond the end of the device
> occurring during IO.
>
> I.e. we aren't doing validity checks on whether a buffer has a sane
> block number or not (that's up to the caller), what we are
> avoiding is attempting to look up a buffer that is outside of the
> range of the cache indexing. i.e. it's validating the cache index we
> are about to use, not passing judgement on whether the caller has
> asked for a valid set of blocks or not.
I did not like the second part of the if statement because first block
number in a "discontiguous" buffer does not have to be the lowest block
number.
The first half of the if statement alone would prevent the oops. It
seems to me that if a length check is desired to see if the first
segment is valid, then the correct thing is to use the first segment
length; something like:
if (blkno >= eofs || blkno + map[0].bm_len >= eofs)
...
>
>> I bounced it off Mark and this was his suggestion:
>>
>> for (i = 0; i< nmaps; i++) {
>> if (map[i].bm_bn>= eofs ||
>> map[i].bm_bn + map[i].bm_len>= eofs)
>> ...
>
> Sure, that would work, but we really don't care about the secondary
> block numbers here - there are completely unused by the buffer cache
> except for when IO is issued. And given that _xfs_buf_find is
> probably the hottest function in the XFS code base, avoiding
> unnecessary checks is somewhat important...
>
> Cheers,
>
> Dave.
--Mark.
More information about the xfs
mailing list