Hi Shailendra,
Thanks for the report. I'll get back to you tomorrow on this when I look
more at the code.
Regards,
Tim.
Shailendra Tripathi wrote:
Hi,
While changing the (current default) number of inode allocated at
once, I found two things a bit unexpected:
Section A:
----------
xfs_buf_item_init()
/*
* chunks is the number of XFS_BLI_CHUNK size pieces
* the buffer can be divided into. Make sure not to
* truncate any pieces. map_size is the size of the
* bitmap needed to describe the chunks of the buffer.
*/
chunks = (int)((XFS_BUF_COUNT(bp) + (XFS_BLI_CHUNK - 1)) >>
XFS_BLI_SHIFT);
map_size = (int)((chunks + NBWORD) >> BIT_TO_WORD_SHIFT); ---> 2
For 4096 block page (linux default), one map is required per page.
Given this, line 2 appears odd. Instead, it should be
map_size = (int)((chunks + NBWORD -1) >> BIT_TO_WORD_SHIFT);
Also, chunks calculation may go well off the mark of the blocks are not
page aligned. The code does not block alignment check. It only checks
for sectoral alignment. For example, lets say I need 32768
pb_buffer_length. However, the offset is not 4096 aligned. Lets say it
starts at, 2048. Given this, _pagebuf_lookup_pages will prepare 8 + 1 pages.
page_count = page_buf_btoc(end) - page_buf_btoct(bp->pb_file_offset);
In this case, chunks calculated will suffice for only 8 pages. In this
particular case, the statement 2 makes amend for it and allocates
correct map size. But, it oversteps when the offset is block (4096)
aligned. That is, if offset was 0, still it will allocate 9 pages.
It affects the xfs_next_bit called in xfs_buf_item_size and
xfs_buf_item_format. If last chunk of the 8 blocks allocated are dirty,
the code will over step into 9th (which might be not even cached) page
and potentially return in "unable to handle paging request". This is
because the xfs_next_bit will not return -1 as it does not see the end
due to map size overstepping.
Instead, the calculation should rely on page count directly as
XFS_PAGE_COUNT(bp) bp->bp_page_count.
Section B:
---------
xfs_init()
.....
/*
* The size of the zone allocated buf log item is the maximum
* size possible under XFS. This wastes a little bit of memory,
* but it is much faster.
*/
xfs_buf_item_zone =
kmem_zone_init((sizeof(xfs_buf_log_item_t) +
(((XFS_MAX_BLOCKSIZE / XFS_BLI_CHUNK) /
NBWORD) * sizeof(int))),
"xfs_buf_item");
This restricts the Maximum Possible blocks which can be dirtied in a
buf item to 17 pages (xfs_buf_log_item_t contains one map and + part
yields 16 maps). It appears very inflexible scheme.
I am trying to allocate inodes in data stripe width at a time. One
representative stripe width is 32 blocks (4096). However, due to this
limitation (which I noticed after very late), it looks impossible.
I will really appreciate if one can comment if I have misunderstood
something.
Regards,
Shailendra
|