xfs
[Top] [All Lists]

Re: Unexpected xfs_buf_item_init initialization and inflexible xfs_buf_i

To: Shailendra Tripathi <stripathi@xxxxxxxxx>
Subject: Re: Unexpected xfs_buf_item_init initialization and inflexible xfs_buf_item_zone
From: Timothy Shimmin <tes@xxxxxxx>
Date: Wed, 11 Jan 2006 19:19:59 +1100
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <43BA04AF.2060703@agami.com>
References: <14BC3454F4B4614FBC4F3FB19A84C372050C5F@FPNYEXCBE01.opus-i.corp> <43BA04AF.2060703@agami.com>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Thunderbird 1.5 (X11/20051201)
Hi Shailendra,

Thanks for the report.

Shailendra Tripathi wrote:
Section A:
----------
xfs_buf_item_init()

/*
* chunks is the number of XFS_BLI_CHUNK size pieces
* the buffer can be divided into. Make sure not to
* truncate any pieces. map_size is the size of the
* bitmap needed to describe the chunks of the buffer.
*/
chunks = (int)((XFS_BUF_COUNT(bp) + (XFS_BLI_CHUNK - 1)) >> XFS_BLI_SHIFT);
map_size = (int)((chunks + NBWORD) >> BIT_TO_WORD_SHIFT); ---> 2


For 4096 block page (linux default), one map is required per page. Given this, line 2 appears odd. Instead, it should be

map_size = (int)((chunks + NBWORD -1) >> BIT_TO_WORD_SHIFT);

Yes I agree, that the map_size is going to be unnecessarily large
(potentially by 1).
where N = number of total bits in map = number of 128 byte chunks ("chunks")
and  n = number of bits in an int ("NBWORD")
map_size = ((N + (n-1))/n)  number of ints

The existing calculation of ((N + n)/n) will be wrong when N is a multiple of n,
it will go over by 1. i.e. it will be wrong when N is a multiple of 32.
The question, however, is if this will cause a problem and not just
a slight waste of space.


Some concerns that spring to mind:

1) Will the allocation of space for the blf_data_map be matched by the accesses
of the map?
The allocation size is set in the setting up of the xfs_buf_item_zone.
It does this to:
sizeof(xfs_buf_log_item_t) + XFS_MAX_BLOCK_SIZE / XFS_BLI_CHUNK / NBWORD * sizeof(int)
= sizeof(xfs_buf_log_item_t) + 64K / 128 / 32 * 4
And this basically sizes it for a buffer of the largest block (64K) and
for 1 extra in xfs_buf_log_item_t's blf_data_map[1].
So we do seem to actually cater for 1 extra here.
And so the accesses which use xfs_next_bit & blf_map_size when they go over
what covers the buffer, they will not go over the zone limit.


2) Will the oversized map cause us to access beyond the buffer and its pages?
Well, looking at xfs_buf_item_format() and xfs_buf_item_size(), they call
upon xfs_buf_offset(). However, it looks like they would do so only for a bit
in the map which was turned on (1). And the extra bits at the end of the map
will be zeroed at xfs_buf_item_init() stage using kmem_zone_zalloc().
And should never be turned on by xfs_buf_item_log() as the byte offsets
given would always be within the buffer.
So I think xfs_next_bit() will just scan over these final zeroes and then
return -1.
But maybe I've missed something?
Also, chunks calculation may go well off the mark of the blocks are not page aligned. The code does not block alignment check. It only checks for sectoral alignment. For example, lets say I need 32768 pb_buffer_length. However, the offset is not 4096 aligned. Lets say it starts at, 2048. Given this, _pagebuf_lookup_pages will prepare 8 + 1 pages.
page_count = page_buf_btoc(end) - page_buf_btoct(bp->pb_file_offset);


In this case, chunks calculated will suffice for only 8 pages. In this particular case, the statement 2 makes amend for it and allocates correct map size. But, it oversteps when the offset is block (4096) aligned. That is, if offset was 0, still it will allocate 9 pages.

I'm not that au fait with the page buf code.
But I do not understand your page alignment concern. The map is just storing
which parts of the buffer (down to a 128 byte level) need to be logged.
(During xfs_buf_item_format() when it transforms the map into vectors it
will look up the regions to see if the adjacent bits are actually
contiguous (and not in another page for example)
and if not then it will use an extra vector)
It affects the xfs_next_bit called in xfs_buf_item_size and xfs_buf_item_format. If last chunk of the 8 blocks allocated are dirty, the code will over step into 9th (which might be not even cached) page and potentially return in "unable to handle paging request". This is because the xfs_next_bit will not return -1 as it does not see the end due to map size overstepping.
But I would think it won't overstep because this extra end part of the map will
be zeroed out.
Have I missed something?


Thanks muchly,
Tim.


<Prev in Thread] Current Thread [Next in Thread>