> The disparity between this test and the available send window is the
> cause of the bursts.
The analysis is correct, I think.
> this behaviour. Following patch changes wmem_alloc to only include
> the actual data and it seems to work. This is a hackish approach at
> best though.
It is not hackish, it is rather buggish. 8)
You cannot mangle truesize, eben with good intention.
Try better to tune tcp_min_write_space(). I want to think it is tunable.
Also, you could select larger sndbuf for such funny links.
Combination of ridiculously low MSS with utterly high cwnd
is highly non-standard situation.
> Our second problem with this disparity is on the receive side. The scenario
> is essentially the same but with an unreliable link (read wireless) which
> drops packets. In case of packet drop receiver keeps building an
> out-of-order queue which grows to the limit of the receive buffer
> quite quickly. However sender keeps sending more because of the difference
> between advertised window and the actual allocated space. This triggers
> tcp_input.c:prune_queue() which purges the whole out-of-order queue to
> free up space, thus killing the TCP performance quite effectively.
TCP performance is killed not by pruning, but rather by packet drop. 8)
Yes, pruning should be a bit less aggressive. I will repair this.
> The fix in our internal use is similar to the rmem_alloc case. I do think
> both of these situations are quite valid. I am not so sure about the correct
> fix though.
Common fix is not to allow cwnd to grow to such huge values on lossy links.
In any case, try net-xxyyzz.dif.gz from ftp://ftp.inr.ac.ru/ip-routing/.
It will not be better, I think, but at least you will discover when
it is worse. 8) Finer pruning will appear tomorrow, I hope.