[PATCH] xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end

Mark Tinguely tinguely at sgi.com
Mon Jan 7 09:10:12 CST 2013


On 01/03/13 16:02, Dave Chinner wrote:
> On Thu, Jan 03, 2013 at 03:22:22PM -0600, Ben Myers wrote:
>> Dave,
>>
>> On Wed, Dec 19, 2012 at 09:43:45AM +1100, Dave Chinner wrote:
>>> From: Dave Chinner<dchinner at redhat.com>
>>>
>>> When _xfs_buf_find is passed an out of range address, it will fail
>>> to find a relevant struct xfs_perag and oops with a null
>>> dereference. This can happen when trying to walk a filesystem with a
>>> metadata inode that has a partially corrupted extent map (i.e. the
>>> block number returned is corrupt, but is otherwise intact) and we
>>> try to read from the corrupted block address.
>>>
>>> In this case, just fail the lookup. If it is readahead being issued,
>>> it will simply not be done, but if it is real read that fails we
>>> will get an error being reported.  Ideally this case should result
>>> in an EFSCORRUPTED error being reported, but we cannot return an
>>> error through xfs_buf_read() or xfs_buf_get() so this lookup failure
>>> may result in ENOMEM or EIO errors being reported instead.
>>>
>>> Signed-off-by: Dave Chinner<dchinner at redhat.com>
>>> ---
>>>   fs/xfs/xfs_buf.c |   18 ++++++++++++++++++
>>>   1 file changed, 18 insertions(+)
>>>
>>> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
>>> index a80195b..16249d9 100644
>>> --- a/fs/xfs/xfs_buf.c
>>> +++ b/fs/xfs/xfs_buf.c
>>> @@ -487,6 +487,7 @@ _xfs_buf_find(
>>>   	struct rb_node		*parent;
>>>   	xfs_buf_t		*bp;
>>>   	xfs_daddr_t		blkno = map[0].bm_bn;
>>> +	xfs_daddr_t		eofs;
>>>   	int			numblks = 0;
>>>   	int			i;
>>>
>>> @@ -498,6 +499,23 @@ _xfs_buf_find(
>>>   	ASSERT(!(numbytes<  (1<<  btp->bt_sshift)));
>>>   	ASSERT(!(BBTOB(blkno)&  (xfs_off_t)btp->bt_smask));
>>>
>>> +	/*
>>> +	 * Corrupted block numbers can get through to here, unfortunately, so we
>>> +	 * have to check that the buffer falls within the filesystem bounds.
>>> +	 */
>>> +	eofs = XFS_FSB_TO_BB(btp->bt_mount, btp->bt_mount->m_sb.sb_dblocks);
>>> +	if (blkno>= eofs || blkno + numblks>  eofs) {
>> 			^^^^^^^^^^^^^^^^^^^^^^
>>
>> That looks suspect to me.  I think you need to go over each buffer
>> individually.
>
> I'm not trying to validate every single part of a buffer here -
> there is no need to do that as the block numbers are validated
> against device overruns during IO. i.e. we'll get an EIO and a log
> message telling us an attempt to access beyond the end of the device
> occurring during IO.
>
> I.e. we aren't doing validity checks on whether a buffer has a sane
> block number or not (that's up to the caller), what we are
> avoiding is attempting to look up a buffer that is outside of the
> range of the cache indexing. i.e. it's validating the cache index we
> are about to use, not passing judgement on whether the caller has
> asked for a valid set of blocks or not.

I did not like the second part of the if statement because first block 
number in a "discontiguous" buffer does not have to be the lowest block 
number.

The first half of the if statement alone would prevent the oops. It 
seems to me that if a length check is desired to see if the first 
segment is valid, then the correct thing is to use the first segment 
length; something like:

	if (blkno >= eofs || blkno + map[0].bm_len >= eofs)
		...
>
>> I bounced it off Mark and this was his suggestion:
>>
>>   for (i = 0; i<  nmaps; i++) {
>>           if (map[i].bm_bn>= eofs ||
>> 		     map[i].bm_bn + map[i].bm_len>= eofs)
>> 	                 ...
>
> Sure, that would work, but we really don't care about the secondary
> block numbers here - there are completely unused by the buffer cache
> except for when IO is issued. And given that _xfs_buf_find is
> probably the hottest function in the XFS code base, avoiding
> unnecessary checks is somewhat important...
>
> Cheers,
>
> Dave.


--Mark.



More information about the xfs mailing list