netdev
[Top] [All Lists]

Re: 2.6.0-test11: dst_cache_overflow causing unresponsive box

Subject: Re: 2.6.0-test11: dst_cache_overflow causing unresponsive box
From: "Francois Baligant" <francois@xxxxxxxxxxxx>
Date: Wed, 3 Dec 2003 01:15:42 +0100
Cc: <netdev@xxxxxxxxxxx>
References: <072501c3b874$2542ae70$15fea8c0@fortress><16332.27919.502097.988522@robur.slu.se> <20031202032606.28db927b.davem@redhat.com>
Sender: netdev-bounce@xxxxxxxxxxx
Thanks all for your suggestions.

Actually I have noticed that with 90k establish TCP sessions, I have around
the double
amount of entries in the routing cache. For each TCP session there is an
inbound and outbond
cache entry like this:

39.125.111.131  81.64.64.96     39.125.111.129         1500 0          0
eth0
81.64.64.96   39.125.111.131  39.125.111.131  l         0 0          0 lo

This system has accepted that many sessions before when running 2.4 and this
problem surfaced
with 2.6. Now, I can't be sure that traffic pattern are exactly the same so
Im not drawing conclusions
about 2.6

I will try to raise gc_tresh and keep you informed.

Thanks,
Francois

----- Original Message ----- 
From: "David S. Miller" <davem@xxxxxxxxxx>
To: "Robert Olsson" <Robert.Olsson@xxxxxxxxxxx>
Cc: <francois@xxxxxxxxxxxx>; <netdev@xxxxxxxxxxx>
Sent: Tuesday, December 02, 2003 12:26 PM
Subject: Re: 2.6.0-test11: dst_cache_overflow causing unresponsive box


> On Tue, 2 Dec 2003 11:44:31 +0100
> Robert Olsson <Robert.Olsson@xxxxxxxxxxx> wrote:
>
> > No experience with 90k TCP-flows but it seems GC is not able to free
some
> > the dst-entries for some reason. This will slowly kill your box with
> > symptoms you describe. We have ask TCP-experts for timer settings to
avoid
> > pending sessions etc. Also check slab for any other objects growing as
> > dst cache overflow is most likely secondary effect in your case. rtstat
> > looks sane expect for the high number of dst-entries. Tuning is another
> > story.
>
> Let us assume, for the sake of back of the envelope calculations, that
> all 90k TCP connections speak to unique destinations.  Let us further
> assume that all of them have at least one packet in flight.
>
> This means the routing cache must be able to hold at least 90k entries.
> All of these routing cache entires will be referenced by the packets
> in the TCP retransmission queues of all the sockets, and thus the
> entries are unreclaimable.
>
> You are setting net.ipv4.route.max_size to 655360 which should be more
> than enough.  But you also have to make the net.ipv4.route.gc_thresh
> more reasonable as well, perhaps 90K as a test.
>
> If net.ipv4.route.gc_thresh is lower than 90K and my assertions above
> hold, then the kernel will try to garbage collect too early, all the
> routing cache entries will be in use and therefore uncollectable,
> and you'll get the message you're seeing.
>
> Try to pump up gc_thresh and see if that helps.
>


<Prev in Thread] Current Thread [Next in Thread>