Summary:
--------
Linux 2.4.9 seems to have a bug in TCP where it does not handle duplicate
retransmissions properly, causing TCP transfers to stall.
Short description:
------------------
The attached tcpdump extract shows part of an HTTP download, right before
the TCP connection stalled. Client was on PPP dialup. This kind of stall
has been repeatable for me for some time. Turning off SACK through /proc
fixed the problem.
Long description:
-----------------
I've been having TCP transfers stall on me for no good reason for a long
time, with various versions of the kernel. So I usually use wget for
transfers, and I keep hitting Ctrl-C when it stalls, and restart the
transfer. Stalls happen about every 50-200K.
Today I tracked down what seems to be the problem. The attached tcpdump
extract shows a transfer between my machine (my_dialup_ip, which is a
dual-processor machine running Linux 2.4.9 SMP) and a web server
(web_server). Except for the IPs nothing has been edited. The following
sequence happens:
- The transfer is proceeding normally, until segment 157498 gets dropped.
The next segment, 158946, is received.
- My machine ACKs up to 157498, with a SACK block for the 158946 segment.
- Two more data segments arrive, creating another hole in the segment
space.
- My machine now ACKs 157498, with two SACK blocks.
- The server's retransmission (to fill the first hole) arrives.
- My machine fills the first hole, ACKs up to 161842 with one SACK block
(for the second hole).
- My machine receives a repeat of the server's retransmission filling the
hole at 157498.
- My machine now adds a SACK block for 157498, even though it is less than
the sequence number in the ACK field. This is definitely wrong.
- The server goes crazy retransmitting the 157498 segment, with increasing
backoffs. This looks like a bug too, on the server side. (I don't know
what OS the server is running.)
- The transfer totally stalls at this point.
I repeated the transfer many times with SACK turned off, and the stalls
went away.
Sorry I don't have the time to track this through the source code further.
Do let me know if you need further information or testing (I'm not on
these lists any more, so please cc: me directly).
Thanks,
-Vijay
15:10:51.924697 < web_server.www > my_dialup_ip.34186: P
156050:157498(1448) ack 135 win 10136 <nop,nop,timestamp 78787524 4838464>
(DF)
15:10:51.925015 > my_dialup_ip.34186 > web_server.www: . 135:135(0) ack
157498 win 63712 <nop,nop,timestamp 4838514 78787524> (DF)
15:10:52.004706 < web_server.www > my_dialup_ip.34186: P
158946:160394(1448) ack 135 win 10136 <nop,nop,timestamp 78787546 4838486>
(DF)
15:10:52.004795 > my_dialup_ip.34186 > web_server.www: . 135:135(0) ack
157498 win 63712 <nop,nop,timestamp 4838522 78787524,nop,nop, sack 1
{158946:160394} > (DF)
15:10:52.114694 < web_server.www > my_dialup_ip.34186: P
160394:161842(1448) ack 135 win 10136 <nop,nop,timestamp 78787555 4838495>
(DF)
15:10:52.114769 > my_dialup_ip.34186 > web_server.www: . 135:135(0) ack
157498 win 63712 <nop,nop,timestamp 4838533 78787524,nop,nop, sack 1
{158946:161842} > (DF)
15:10:52.254671 < web_server.www > my_dialup_ip.34186: P
163290:164202(912) ack 135 win 10136 <nop,nop,timestamp 78787567 4838505>
(DF)
15:10:52.254744 > my_dialup_ip.34186 > web_server.www: . 135:135(0) ack
157498 win 63712 <nop,nop,timestamp 4838547 78787524,nop,nop, sack 2
{163290:164202}{158946:161842} > (DF)
15:10:52.824666 < web_server.www > my_dialup_ip.34186: .
157498:158946(1448) ack 135 win 10136 <nop,nop,timestamp 78787636 4838514>
(DF)
15:10:52.824746 > my_dialup_ip.34186 > web_server.www: . 135:135(0) ack
161842 win 59368 <nop,nop,timestamp 4838604 78787636,nop,nop, sack 1
{163290:164202} > (DF)
15:10:54.864609 < web_server.www > my_dialup_ip.34186: .
157498:158946(1448) ack 135 win 10136 <nop,nop,timestamp 78787839 4838514>
(DF)
15:10:54.864699 > my_dialup_ip.34186 > web_server.www: . 135:135(0) ack
161842 win 63712 <nop,nop,timestamp 4838808 78787636,nop,nop, sack 2
{157498:158946}{163290:164202} > (DF)
15:10:58.914469 < web_server.www > my_dialup_ip.34186: .
157498:158946(1448) ack 135 win 10136 <nop,nop,timestamp 78788245 4838514>
(DF)
15:10:58.914560 > my_dialup_ip.34186 > web_server.www: . 135:135(0) ack
161842 win 63712 <nop,nop,timestamp 4839213 78787636,nop,nop, sack 2
{157498:158946}{163290:164202} > (DF)
15:11:07.034200 < web_server.www > my_dialup_ip.34186: .
157498:158946(1448) ack 135 win 10136 <nop,nop,timestamp 78789057 4838514>
(DF)
15:11:07.034287 > my_dialup_ip.34186 > web_server.www: . 135:135(0) ack
161842 win 63712 <nop,nop,timestamp 4840025 78787636,nop,nop, sack 2
{157498:158946}{163290:164202} > (DF)
15:11:23.273663 < web_server.www > my_dialup_ip.34186: .
157498:158946(1448) ack 135 win 10136 <nop,nop,timestamp 78790682 4838514>
(DF)
15:11:23.273741 > my_dialup_ip.34186 > web_server.www: . 135:135(0) ack
161842 win 63712 <nop,nop,timestamp 4841649 78787636,nop,nop, sack 2
{157498:158946}{163290:164202} > (DF)
15:11:55.792591 < web_server.www > my_dialup_ip.34186: .
157498:158946(1448) ack 135 win 10136 <nop,nop,timestamp 78793933 4838514>
(DF)
15:11:55.792672 > my_dialup_ip.34186 > web_server.www: . 135:135(0) ack
161842 win 63712 <nop,nop,timestamp 4844901 78787636,nop,nop, sack 2
{157498:158946}{163290:164202} > (DF)
|