Opened 7 years ago

Closed 3 years ago

Last modified 2 years ago

#13681 closed enhancement (fixed)

tcp: 2018: implementing TCP SACK option

Reported by: a-star Owned by: axeld
Priority: normal Milestone: R1/beta3
Component: Network & Internet/TCP Version: R1/Development
Keywords: tcp, sack Cc:
Blocked By: Blocking:
Platform: All

Description

Problem: Multiple packet losses from a window of data can have a catastrophic effect on TCP throughput/ generally cause TCP to lose its ACK clock.

Solution: Selective Acknowledgement (SACK) - the data receiver can inform the sender about all the segments that have arrived successfully, so the sender need restransmit only the segments that have actually been lost.

Two TCP options are used:

1) SACK-permitted option at the time of handshake (sent with the SYN segment) to indicate SACK option can be used. (Kind = 4; Length = 2)

2) SACK option - can be used once permission has been given by SACK-permitted. (Kind = 5; Length = 8*n + 2 where n in the number of sack blocks)

Total bytes available in TCP header for options = 40 bytes. Therefore max value of n is 4. In presence of the timestamp option it reduces to 3.

If it has received permission, the data receiver MAY choose to generate SACK options. Remember if the receiver generates SACK under any circumstance, it SHOULD generate them under all circumstances. Also if sent at all, SACK option SHOULD be included in all ACKs which do not ACK the highest seq number in receiver's queue.

The SACK option SHOULD be filled out by repeating the most recently reported SACK blocks (based on first SACK blocks in previous SACK options) that are not subsets of a SACK block already included in the SACK option being constructed. For examples, checkout rfc 2018 (https://www.ietf.org/rfc/rfc2018.txt) section 7.

Note: The SACK option is advisory. While it notifies the data sender extended ack information, the data receiver is permitted to later discard data which has been reported in the SACK option. This is known as "Data reneging".

The use of time-outs as a fall-back mechanism for detecting dropped packets is unchanged by the SACK option. Because the data receiver is allowed to discard SACKed data, when a retransmit timeout occurs the data sender SHOULD ignore prior SACK information in determining which data to retransmit. Linux doesn't follow this rule as mentioned in the errata for the rfc.

Attachments (1)

0008-tcp-rfc-2018-implemented-SACK-option.patch (7.6 KB ) - added by a-star 7 years ago.

Download all attachments as: .zip

Change History (16)

comment:1 by a-star, 7 years ago

patch: 01

comment:2 by a-star, 7 years ago

Implemented the Data Receiver behavior completely.

Regarding the Data Sender behavior, some parts of it are implemented. Will look into Linux source code to study how it deals with data reneging. Currently reading more about the different possibilities for the sender before implementing the data sender's behavior fully.

comment:3 by axeld, 7 years ago

Have you measured the throughput differences with your test application yet?

comment:4 by axeld, 7 years ago

First, the coding style issues:

  • BufferQueue.cpp, line 438: Two blank lines between functions
  • Line 440: We use camel case for variables: sack_count -> sackCount.
  • Line 443: "ptr" is a bad variable name. It should be "buffer" in this case, as it refers to a net_buffer.
  • Line 447: Extraneous parenthesis.
  • TCPEndpoint.cpp, line 2035: Operators go to the next line.

Then

  • If the loop in BufferQueue.cpp line 446 ends with sack_count == maxSackCount, the code in line 462 may overwrite memory behind the array.
  • Since the PopulateSackInfo() method does not have a return value yet, I'd prefer to return the sack count instead of setting it via a by-reference variable.
  • In TCPEndpoint.cpp 2039, the code does not handle a failed allocation gracefully; in fact, it'll crash trying to access segment.sacks later on.

comment:5 by a-star, 7 years ago

Fixed coding style issues. Actually the name sack_count was already there so I didn't think about changing it.

Fixed the sack_count issue with maxSackCount. Handled failed allocation check at line 2039.

And about the throughput differences, I made the comparisons against patches till the ideal timer patch. I wrote about it in my last weekly report. I yet to do the same for the NewReno and this SACK patch. Actually I am reworking on my test setup to make it more stable. But I did perform black box testing on these two patches using my packet injector tool.

comment:6 by jessicah, 7 years ago

Resolution: fixed
Status: newclosed

Applied in hrev51385.

comment:7 by korli, 6 years ago

Resolution: fixed
Status: closedreopened

I reverted the patch in hrev51460. Just reverting this one patch restores TCP functionality on a local box wired to a wlan repeater.

comment:8 by pulkomandy, 6 years ago

patch: 10

comment:9 by pulkomandy, 6 years ago

patch: 0

Patch migrated to Gerrit: https://review.haiku-os.org/53

comment:10 by korli, 3 years ago

Milestone: UnscheduledR1/beta3
Resolution: fixed
Status: reopenedclosed

Reapplied with fixes in hrev55087

comment:11 by Skik, 2 years ago

TCP functionality breaks with TCP SACK option due to my company firewall (SonicWall) thinking its an invalid Option #4 and dropping the packet. Looking at reference packet captures https://packetlife.net/media/captures/TCP_SACK.cap and local Linux packet captures. The SACK option is sent before the TCP Timestamp option and used as padding instead of the two NOP Options.

I reordered the operations and also added padding when there is no SACK_PERMITTED. And TCP works now with my company firewall. This is attached as a reference. I am sure there is a better way to handle it.

diff --git a/src/add-ons/kernel/network/protocols/tcp/tcp.cpp b/src/add-ons/kernel/network/protocols/tcp/tcp.cpp
index 5ad4014dfc..5449a35b2e 100644
--- a/src/add-ons/kernel/network/protocols/tcp/tcp.cpp
+++ b/src/add-ons/kernel/network/protocols/tcp/tcp.cpp
@@ -106,13 +106,25 @@ add_options(tcp_segment_header &segment, uint8 *buffer, size_t bufferSize)
 		bump_option(option, length);
 	}
 
+	if ((segment.options & TCP_SACK_PERMITTED) != 0
+		&& length + 2 <= bufferSize) {
+		option->kind = TCP_OPTION_SACK_PERMITTED;
+		option->length = 2;
+		bump_option(option, length);
+	}
+
 	if ((segment.options & TCP_HAS_TIMESTAMPS) != 0
-		&& length + 12 <= bufferSize) {
-		// two NOPs so the timestamps get aligned to a 4 byte boundary
+		&& (segment.options & TCP_SACK_PERMITTED) == 0
+		&& length + 2 <= bufferSize) {
+		// two NOPs so the timestamps get aligned to a 4 byte boundary when no SACK
 		option->kind = TCP_OPTION_NOP;
 		bump_option(option, length);
 		option->kind = TCP_OPTION_NOP;
 		bump_option(option, length);
+	}
+
+	if ((segment.options & TCP_HAS_TIMESTAMPS) != 0
+		&& length + 10 <= bufferSize) {
 		option->kind = TCP_OPTION_TIMESTAMP;
 		option->length = 10;
 		option->timestamp.value = htonl(segment.timestamp_value);
@@ -132,13 +144,6 @@ add_options(tcp_segment_header &segment, uint8 *buffer, size_t bufferSize)
 		bump_option(option, length);
 	}
 
-	if ((segment.options & TCP_SACK_PERMITTED) != 0
-		&& length + 2 <= bufferSize) {
-		option->kind = TCP_OPTION_SACK_PERMITTED;
-		option->length = 2;
-		bump_option(option, length);
-	}
-
 	if (segment.sackCount > 0) {
 		int sackCount = ((int)(bufferSize - length) - 4) / sizeof(tcp_sack);
 		if (sackCount > segment.sackCount)
@@ -435,15 +440,15 @@ tcp_options_length(tcp_segment_header& segment)
 	if (segment.max_segment_size > 0)
 		length += 4;
 
+	if (segment.options & TCP_SACK_PERMITTED)
+		length += 2;
+		
 	if (segment.options & TCP_HAS_TIMESTAMPS)
-		length += 12;
+		length += (segment.options & TCP_SACK_PERMITTED) ? 10 : 12;
 
 	if (segment.options & TCP_HAS_WINDOW_SCALE)
 		length += 4;
 
-	if (segment.options & TCP_SACK_PERMITTED)
-		length += 2;
-
 	if (segment.sackCount > 0) {
 		int sackCount = min_c((int)((kMaxOptionSize - length - 4)
 			/ sizeof(tcp_sack)), segment.sackCount);

comment:12 by pulkomandy, 2 years ago

We should do something better to manage the padding/alignment for each option. There is too much guessing and manual updating of length and pointers in this code.

comment:13 by Coldfirex, 2 years ago

To prevent this from being lost should this ticket be reopened, or a new one opened.

comment:15 by korli, 2 years ago

Patch was merged in hrev55533

Note: See TracTickets for help on using tickets.