Opened 7 years ago

Closed 5 years ago

Last modified 5 years ago

#13769 closed bug (fixed)

git-push failure following TCP changes

Reported by: waddlesplash Owned by: axeld
Priority: normal Milestone: R1/beta2
Component: Network & Internet/TCP Version: R1/Development
Keywords: Cc: a-star
Blocked By: Blocking:
Platform: All

Description

I have a git-push that always fails about 10% (a few megabytes or so) the way through. The most recent TCP patch I just applied (#13704) didn't fix it, and the pcap dump seems to indicate that it's another window problem.

CC'ing Ayush, and also emailing him a copy of the pcap file. If anyone else would like to investigate, I can email them the pcap file also.

Attachments (1)

0001-tcp-fixed-RTO-update-and-dup-ACKs-generation.patch (5.8 KB ) - added by a-star 7 years ago.

Download all attachments as: .zip

Change History (20)

comment:1 by taos, 7 years ago

In addition to git push failing, I'm also experiencing problems cloning the haiku git repository:

git clone https://git.haiku-os.org/haiku -v
Cloning into 'haiku'...
POST git-upload-pack (chunked)
error: RPC failed; HTTP 408 curl 22 The requested URL returned error: 408 Request Timeout
fatal: The remote end hung up unexpectedly

comment:2 by a-star, 7 years ago

I have been real busy with exams and MS admissions.

I have started to look into the problem and found two possible issues. I will start to work on them in a day or two. Can expect a patch by the end of the week.

Sorry for the wait.

comment:3 by waddlesplash, 7 years ago

That's alright. Thanks for looking into this!

comment:4 by a-star, 7 years ago

Made some changes:

i) there was an integer promotion problem in updating the retransmission timeout : a signed int was being divided by an unsigned int. This was causing the values to overflow. Thus leading to huge values for timeout which manifested in the perception of pause in data flow.

ii) for an ack to be recognised as a duplicate ack, the advertised window must remain same. This was not taken care of in the code so I added it.

I tried my latest patch with files as large as 10mb (both git pull and git push) and everything worked fine. I hope the issue will be resolved now.

comment:5 by a-star, 7 years ago

patch: 01

comment:6 by waddlesplash, 7 years ago

That seems to improve things slightly: now instead of permanently stalling around 18%, it instead halts there for about 20 seconds and then jumps to 21% before stalling again. I'll send you another pcap file.

comment:7 by waddlesplash, 7 years ago

Committed the patch in hrev51655.

comment:8 by taos, 7 years ago

With hrev51703 the error message during git push changed for me to:

remote: fatal: early EOF

Error for git clone is still the same.

comment:9 by pulkomandy, 7 years ago

patch: 10

comment:10 by pulkomandy, 7 years ago

patch: 0

@waddlesplash: please remember to set patches to "obsolete" if applying them and leaving the ticket open.

comment:11 by taos, 7 years ago

It seems that - at least for me - the problems with git disappeared with (or around the time of) the migration of the repositories to the new server.

comment:12 by waddlesplash, 7 years ago

patch: 0

The git push in question was to GitHub, so it hasn't changed.

comment:13 by waddlesplash, 7 years ago

I have another git-push failure that looks extremely similar to this one. Any developer who wants the pcap file can have it...

comment:14 by waddlesplash, 6 years ago

Resolution: fixed
Status: newclosed

This specific git-push succeeded, so I don't have a way to reproduce this now.

comment:15 by mmu_man, 6 years ago

Resolution: fixed
Status: closedreopened

I still have this issue here, trying to push to qtkeychains.

I did some tcpdump recording, and I believe we confuse the receiver with our Zero Window Probe, because we don't resent even though it only ACKs the previous segment, and we keep sending things after it:

  • we send the last segment before window is closed, next seg = N,
  • we get ACK for N, with window=0
  • we send Zero Window Probe with 1 byte, next seg = N+1,
  • we eventually get a window>0 ACK still at N,
  • we start sending stuff again, but starting from N+1,
  • receiver keeps ACKing N because it never accepted it.

It seems sending either 1 or 0 byte is valid for a ZWP, although I'm not entirely sure. At least it's easier to handle 0 than 1, and it seems to work. Wireshark shows them as duplicate ACKs, but they get the thing going.

References:

comment:17 by a-star, 6 years ago

If you want, I can look into the issue - it would be helpful if you could provide the pcap file as well. There might be some other issue underlying here since, as you mentioned, "we start sending stuff again, but starting from N+1" which is weird and not a desirable behavior.

Version 1, edited 6 years ago by a-star (previous) (next) (diff)

comment:18 by waddlesplash, 5 years ago

Resolution: fixed
Status: reopenedclosed

Patch merged in hrev53831.

comment:19 by nielx, 5 years ago

Milestone: UnscheduledR1/beta2

Assign tickets with status=closed and resolution=fixed within the R1/beta2 development window to the R1/beta2 Milestone

Note: See TracTickets for help on using tickets.