On Mon, 27 Sep 2004 18:38:42 -0400 (EDT)
John Heffner <jheffner@xxxxxxx> wrote:
> On Thu, 23 Sep 2004, David S. Miller wrote:
>
> >
> > I think I know what may be going on here.
> >
> > Let's say that we even get the congestion window openned up
> > so that we can build 64K TSO frames, that's around 43 or 44
> > 1500 mtu frames.
> >
> > That means as the window fills up, we have to see 44 ACKs
> > before we are able to send the next TSO frame. Needless to
> > say that breaks ACK clocking completely.
>
> More specifically, I think it is an interaction with delayed ack (acking
> less than 1 virtual segment), and the small cwnd. This works for me, but
> I'm not sure that aren't some lurking problems still.
Yes, this is supposed to work around the problem, but:
1) It is a hack :-)
2) It doesn't help Andi's case, and I think I know why.
The reason Andi Kleen didn't see any improvements from limiting
'factor' is that he is using short lived connections. If you
have a connection up for long enough, this allows the congestion
window to grow and then it doesn't matter.
Something like the following is what I have been talking about.
I am able to reproduce the problem here locally and the following
makes it go away.
Andi, Anton, and niv, can you confirm it does so for you too?
If tcp_clean_rtx_queue() doesn't return DATA
acked then no congestion growth is allowed to occur. So we only
get a snd_cwnd bump once for every tso_factor frames, that stinks :)
This is not the final fix. I need to do something like record the
upper-most virtual ACK within a TSO frame so we don't say DATA acked
for dup-acks that happen to fall in the middle of a TSO frame.
===== net/ipv4/tcp_input.c 1.75 vs edited =====
--- 1.75/net/ipv4/tcp_input.c 2004-09-27 12:00:32 -07:00
+++ edited/net/ipv4/tcp_input.c 2004-09-27 15:35:12 -07:00
@@ -2373,8 +2373,12 @@
* discard it as it's confirmed to have arrived at
* the other end.
*/
- if (after(scb->end_seq, tp->snd_una))
+ if (after(scb->end_seq, tp->snd_una)) {
+ if (scb->tso_factor &&
+ after(tp->snd_una, scb->seq))
+ acked |= FLAG_DATA_ACKED;
break;
+ }
/* Initial outgoing SYN's get put onto the write_queue
* just like anything else we transmit. It is not
|