xfs
[Top] [All Lists]

Unexpected xfs_buf_item_init initialization and inflexible xfs_buf_item_

To: linux-xfs@xxxxxxxxxxx
Subject: Unexpected xfs_buf_item_init initialization and inflexible xfs_buf_item_zone
From: Shailendra Tripathi <stripathi@xxxxxxxxx>
Date: Tue, 03 Jan 2006 10:29:27 +0530
In-reply-to: <14BC3454F4B4614FBC4F3FB19A84C372050C5F@xxxxxxxxxxxxxxxxxxxxxxx>
References: <14BC3454F4B4614FBC4F3FB19A84C372050C5F@xxxxxxxxxxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla Thunderbird 0.9 (X11/20041127)
Hi,
While changing the (current default) number of inode allocated at once, I found two things a bit unexpected:

Section A:
----------
xfs_buf_item_init()

       /*
         * chunks is the number of XFS_BLI_CHUNK size pieces
         * the buffer can be divided into. Make sure not to
         * truncate any pieces.  map_size is the size of the
         * bitmap needed to describe the chunks of the buffer.
         */
chunks = (int)((XFS_BUF_COUNT(bp) + (XFS_BLI_CHUNK - 1)) >> XFS_BLI_SHIFT);
        map_size = (int)((chunks + NBWORD) >> BIT_TO_WORD_SHIFT); ---> 2

For 4096 block page (linux default), one map is required per page. Given this, line 2 appears odd. Instead, it should be

map_size = (int)((chunks + NBWORD -1) >> BIT_TO_WORD_SHIFT);

Also, chunks calculation may go well off the mark of the blocks are not page aligned. The code does not block alignment check. It only checks for sectoral alignment. For example, lets say I need 32768 pb_buffer_length. However, the offset is not 4096 aligned. Lets say it starts at, 2048. Given this, _pagebuf_lookup_pages will prepare 8 + 1 pages.
page_count = page_buf_btoc(end) - page_buf_btoct(bp->pb_file_offset);

In this case, chunks calculated will suffice for only 8 pages. In this particular case, the statement 2 makes amend for it and allocates correct map size. But, it oversteps when the offset is block (4096) aligned. That is, if offset was 0, still it will allocate 9 pages.

It affects the xfs_next_bit called in xfs_buf_item_size and xfs_buf_item_format. If last chunk of the 8 blocks allocated are dirty, the code will over step into 9th (which might be not even cached) page and potentially return in "unable to handle paging request". This is because the xfs_next_bit will not return -1 as it does not see the end due to map size overstepping.

Instead, the calculation should rely on page count directly as XFS_PAGE_COUNT(bp) bp->bp_page_count.

Section B:
---------
xfs_init()
.....           
        /*
         * The size of the zone allocated buf log item is the maximum
         * size possible under XFS.  This wastes a little bit of memory,
         * but it is much faster.
         */
        xfs_buf_item_zone =
                kmem_zone_init((sizeof(xfs_buf_log_item_t) +
                                (((XFS_MAX_BLOCKSIZE / XFS_BLI_CHUNK) /
                                  NBWORD) * sizeof(int))),
                               "xfs_buf_item");

This restricts the Maximum Possible blocks which can be dirtied in a buf item to 17 pages (xfs_buf_log_item_t contains one map and + part yields 16 maps). It appears very inflexible scheme.

I am trying to allocate inodes in data stripe width at a time. One representative stripe width is 32 blocks (4096). However, due to this limitation (which I noticed after very late), it looks impossible.

I will really appreciate if one can comment if I have misunderstood something.

Regards,
Shailendra




<Prev in Thread] Current Thread [Next in Thread>