(warning: crystal ball engaged to parse from the quoted mail snippets.
Maybe missing context. I'm just reading netfilter-devel)
> On Mon, 2002-07-29 at 13:56, Andi Kleen wrote:
> > here is a patch for 2.4 that just makes it use get_free_pages to test the
> > TLB theory.
I presume this is about the vmalloc()ed hash bucket table? If yes, it's
certainly an interesting experiment to try making it allocated from an
area without TLB issues. We can expect a TLB miss on every packet with
the current setup, allocating the bucket table from large-TLB memory
would be a clear win of one memory roundtrip.
The netfilter hook statistics patch I mentioned in the other mail,
should be able to show the difference. If my guess is right, you
could see a 5-10% improvement on the ip_conntrack hook functions.
> > Another obvious improvement would be to not use list_heads
> > for the hash table buckets - a single pointer would likely suffice and
> > it would cut the hash table in half, saving cache, TLB and memory.
> Martin Josefsson wrote:
> I think the list_heads are used for only one thing currently, for the
> early eviction in case of overload,
Don't forget the nonscanning list_del(), called whenever a conntrack
is unhashed at it's death. However, with a suitable bucket number,
i.e. low chain lengths, the scan on conntrack removal would be OK.
The early_drop() scanning, if it wants to work backward, may as well
work forward, keeping a "last unreplied found" pointer, and returning
that when falling off the single list end.
Thus, I also think that the list could be simple.
>From the top of my head, here are other fields that we could get rid off:
- the ctrack backpointer in each tuple.
- the protocol field in each tuple.
- the 20 byte infos array in ip_conntrack.
- we could out-of-line ip_nat_info.
With the current layout, when lists must be walked on a 32-byte-cacheline
box, we are sure to always read two cache lines for the skipped-over
> I know I've had plans on rewriting the locking in conntrack which is
> quite frankly horrible, one giant rwlock used for almost everything
> (including the hashtable).
I'd like to see lockmeter statistics before this change. When you split
the one lock into a sectored lock: each conntrack is hashed twice, so
you need to be careful with lock order when adding or removing.
(well, there is another possibility, but I won't go into that now)
> One idea that has come to mind is using RCU
I don't see RCU solving hash link list update problems. Care to explain
how that would work?
> And this eviction which occurs at overload needs to be redone, we can't
> go around dropping one unreplied connection at a time, we need
> gang-eviction of unreplied connections.
I propose to put them all on a seperate LRU list, and reap the oldest.