Opened 7 years ago

Closed 7 years ago

#13704 closed bug (fixed)

TCP window rescale fails to trigger data sending

Reported by: waddlesplash Owned by: axeld
Priority: normal Milestone: Unscheduled
Component: Network & Internet/TCP Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

This only started happening after the TCP changes were merged.

Essentially it manifests itself as just a total stop in data transmission. I can have it happen 100% of the time when doing a large push to GitHub. Packet inspection shows that the transmission stops immediately after a window rescale.

Analysis from mmlr on IRC:

6:24 PM <•mmlr> the window collapses on the github side, which isn't really our fault
6:24 PM <•mmlr> but then there's the window update that resets it back to the full 64k
6:24 PM <•mmlr> but for 600ms nothing happens
6:24 PM <•mmlr> then the server side seems to give up by closing the connection
6:25 PM <•mmlr> so looks like there's an issue with the window update not triggering data sending

mmlr and jessicah have the pcap files from me; I'm OK with giving them to any other developer with commit access who'd like to work on it.

Attachments (1)

0001-tcp-fixed-no-response-from-window-update-removed-ide.patch (7.8 KB ) - added by a-star 7 years ago.

Download all attachments as: .zip

Change History (5)

comment:1 by waddlesplash, 7 years ago

Reverted the offending commit in hrev51419. It should be fixed and reapplied, though.

comment:2 by a-star, 7 years ago

patch: 01

comment:3 by a-star, 7 years ago

The problem is exactly what mmlr analyzed. There is no response to a window update message.

The reason for the erratic behavior is that the tcp implementation silently drops window update messages after noting the update but without triggering any data send event. Before the new TCP patches were applied, the implementation relied on a retransmission timeout to trigger a send event after a window update. One of the new patches dealing with the ideal timer changed the semantic of the restransmit function call and caused the behavior witnessed.

But a retransmission timeout is not the correct solution to window update. In fact a retransmission is not a desired effect of window update. So in the patch attached, I have changed the behavior of the implementation to immediately acknowledge the window update (along with data from SendQueue) and thus solving the problem of complete halt in data transmission.

The patch also has the changes re-implemented that were reverted back but had nothing to do with the issue at hand. For the time being, I have also removed the "ideal timer" part from the patch (although it wasn't creating any conflict). I initially decided to implement the ideal timer using the same timer used for retransmission to avoid adding an additional timer. But as I have seen, it can be problematic. So I will be re-implementing the ideal timer and thus it was not included in this patch.

The patch solved the problem for me. If it doesn't workout for anybody, kindly let me know and it would be great if you could provide the pcap file for the same.

comment:4 by waddlesplash, 7 years ago

Resolution: fixed
Status: newclosed

Applied in hrev51540. Thanks!

Note: See TracTickets for help on using tickets.