netdev
[Top] [All Lists]

Re: [BUG] overflow in net/ipv4/route.c rt_check_expire()

To: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
Subject: Re: [BUG] overflow in net/ipv4/route.c rt_check_expire()
From: Eric Dumazet <dada1@xxxxxxxxxxxxx>
Date: Sat, 02 Apr 2005 11:21:30 +0200
Cc: davem@xxxxxxxxxxxxx, netdev@xxxxxxxxxxx, Robert.Olsson@xxxxxxxxxxx
In-reply-to: <E1DHdsP-0003Lr-00@xxxxxxxxxxxxxxxxxxxxxxxx>
References: <E1DHdsP-0003Lr-00@xxxxxxxxxxxxxxxxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: Mozilla Thunderbird 1.0 (Windows/20041206)
Herbert Xu a écrit :
Eric Dumazet <dada1@xxxxxxxxxxxxx> wrote:

OK this patch includes everything...

- Locking abstraction
- rt_check_expire() fixes
- New gc_interval_ms sysctl to be able to have timer gc_interval < 1 second
- New gc_debug sysctl to let sysadmin tune gc
- Less memory used by hash table (spinlocks moved to a smaller table)
- sizing of spinlocks table depends on NR_CPUS
- hash table allocated using alloc_large_system_hash() function
- header fix for /proc/net/stat/rt_cache


This patch is doing too many things.  How about splitting it up?

For instance the spin lock stuff is pretty straightforward and
should be in its own patch.

The benefits of the GC changes are not obvious to me.  rt_check_expire
is simply meant to kill off old entries.  It's not really meant to be
used to free up entries when the table gets full.

Well, I began my work because of the overflow bug in rt_check_expire()...
Then I realize this function could not work as expected. On a loaded machine, 
one timer tick is 1 ms.
During this time, number of chains that are scanned is ridiculous.
With the standard timer of 60 second, fact is rt_check_expire() is useless.


rt_garbage_collect on the other hand is designed to free entries
when it is needed.  Eric raised the point that rt_garbage_collect
is pretty expensive.  So what about amortising its cost a bit more?

Yes. rt_garbage_collect() has serious problems. But this function is sooo complex I dont want to touch it and let experts do it if they want. But then one may think why we have two similar functions that are doing basically the same thing : garbage collection.

One of a production machine rtstat -i 1 output is :

rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|rt_cache|
entries| in_hit|in_slow_|in_slow_|in_no_ro| in_brd|in_marti|in_marti| out_hit|out_slow|out_slow|gc_total|gc_ignor|gc_goal_|gc_dst_o|in_hlist|out_hlis| | | tot| mc| ute| | an_dst| an_src| | _tot| _mc| | ed| miss| verflow| _search|t_search| 2618087| 28581| 7673| 0| 0| 0| 0| 0| 1800| 1450| 0| 0| 0| 0| 0| 37630| 4783| 2618689| 25444| 4918| 0| 0| 0| 0| 0| 2051| 1699| 0| 0| 0| 0| 0| 27741| 5461| 2619369| 25000| 4567| 0| 0| 0| 0| 0| 1860| 1304| 0| 0| 0| 0| 0| 26606| 4563| 2618396| 24830| 4633| 0| 0| 0| 0| 0| 1959| 1492| 0| 0| 0| 0| 0| 26643| 4930|


Without serious tuning, this machine could not handle this load, or even half 
of it.

Crashes usually occurs when secret_interval interval is elapsed : 
rt_cache_flush(0); is called, and the whole machine begins to die.


For instance, we can set a new threshold that's lower than gc_thresh
and perform GC on the chain being inserted in rt_intern_hash if we're
above that threshold.

We could also try to perform GC on L1_CACHE_SIZE/sizeof(struct rt_hash_bucket) chains, not only the 'current chain', to fully use the cache miss.


Cheers,

Thank you

<Prev in Thread] Current Thread [Next in Thread>