kuznet@xxxxxxxxxxxxx wrote:
>
> Hello!
>
> > * tcp_init() wants to set sysctl_tcp_max_tw_buckets to 180,000. This
> > seems too high (Andi says 22 megs). I think the patch here is more
> > consistent.
>
> For machine with 256MB it is not very big number.
>
> Failure to create tw bucket is hard bug. Default value is to be so high
> as possible.
So with 180k connections and a 60 second TCP_TIMEWAIT_LEN, the machine
is limited to a maximum sustained rate of 3,000 connections per second?
> > * tcp_twkill() can consume a huge amount of time if it has enough
> > connections to deal with. When running lmbench I have observed it
> > killing 2,500 connections in a single pass, which means we spend 15
> > milliseconds in the timer handler. This is crazy.
>
> If you expect of machine doing 700 conn/sec superb latency,
> you expect an impossible thing.
OK, this isn't a very important issue for scheduling latency. Can be
lived with. But...
> We do much more particularly because the work is batched, when it is possible.
>
> If you want to do sound recording on your web server,
> it is better to increase granularity of tw calendar.
>
> > So I just kill a hundred and then reschedule the timer to run in a
> > couple of jiffies time. The downside: this limits the tw reaping to
> > 2,500 connections per second.
>
> You simply limited your power by 2500 conn/sec, not more.
> We must destroy buckets with the rate, which they are created.
Tell me if this is wrong:
A uniprocessor server is handling 3,000 connections per second (probably
not possible. What is the maximum 2.4 can do?)
That server will reap the timed-wait connections once per 7.5 seconds.
So a single timer handler will kill 22,500 tcp_tw_buckets in a single
pass.
So that timer handler will not return for 0.14 seconds.
But the machine is under heavy interrupt load - the timer handler will
probably take 0.3 seconds or more.
So once per seven seconds:
- the machine's backlog queue will fill up and it will drop
around two thousand packets. Subsequent client retransmits
will add even more load.
- No timer handlers or softirqs will run for 0.3 seconds.
- Process accounting will stop.
- Other scary things :)
- The machine probably won't reach backlog equilibrium for
half a second or more.
This all sounds pretty bad and suggests that either TCP_TWKILL_SLOTS is
far too small or I've missed something obvious :)
Increasing TCP_TWKILL_SLOTS to 256 would smooth things out, but I
suggest that something which is load-adaptive is more efficient and easy
to code.
|