| To: | David Chinner <dgc@xxxxxxx> |
|---|---|
| Subject: | Re: [RFC 0/3] Convert XFS inode hashes to radix trees |
| From: | Shailendra Tripathi <stripathi@xxxxxxxxx> |
| Date: | Tue, 14 Nov 2006 17:09:03 -0800 |
| Cc: | xfs-dev@xxxxxxx, xfs@xxxxxxxxxxx |
| In-reply-to: | <20061003060610.GV3024@melbourne.sgi.com> |
| References: | <20061003060610.GV3024@melbourne.sgi.com> |
| Sender: | xfs-bounce@xxxxxxxxxxx |
| User-agent: | Thunderbird 1.5.0.8 (X11/20061025) |
|
Hi David, I regret for making comments and questions on this quite late (somehow I missed to email). It does appear to me that using this approach can potentially help in cluster hash list related manipulations. However, this appears (to me) to be at the cost of regular inode lookup. As of now, each of the hash buckets have their own lock. This helps in not making the xfs_iget operations hot. I have not seen of xfs_iget anywhere on the top in my profiling of Linux for SPECFS. With this code, the number of hash buckets can be appropriately sized (based upon memory availability). However, it appears to be that radix tree (even with 15) can become a bottleneck. Lets assume that there are 600K inodes on a reasonably big end system and assuming fare distribution, each of the radix tree will have 600K/15 ~ 40K inodes per hash tree. Insertion and deletion to the list have to take writer_lock and given their frequency, both readers (lookups) and writers will be affected. That means, if one tree is locked for insertion or deletion, remaining 40K inodes will be just serialized. However, in current design, by sacrificing little extra memory, we can allocate more hash buckets and eventually the locked down inodes can be made pretty small. My knowledge on radix tree is little limited, but I think, increasing the number of trees would be much more costly in memory terms. Given less memory usage and performance, I tend to believe that hash table is more scalable than radix tree for inode tables. Have you done any performance testing with these patches. I am quite curious to know the results. If not, may be I can try do some perf. testing with these changes albeit on a old kernel tree. Am I missing something here ? Please let me know. Thanks and Regards, Shailendra David Chinner wrote: One of the long standing problems with XFS on large machines and filesystems is the sizing of the inode cache hashes used by XFS to index the xfs_inode_t structures. The mount option ihashsize became a necessity because the default calculations simply can't get it right for all situations. |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: xfs_bmap_add_extent_delay_real: Uninited r[3] corrupts startoff, Shailendra Tripathi |
|---|---|
| Next by Date: | Re: xfs_bmap_add_extent_delay_real: Uninited r[3] corrupts startoff, Lachlan McIlroy |
| Previous by Thread: | Re: xfs_bmap_add_extent_delay_real: Uninited r[3] corrupts startoff, Lachlan McIlroy |
| Next by Thread: | Re: [RFC 0/3] Convert XFS inode hashes to radix trees, David Chinner |
| Indexes: | [Date] [Thread] [Top] [All Lists] |