Opened 23 months ago

Last modified 7 months ago

#13769 reopened bug

git-push failure following TCP changes

Reported by: waddlesplash Owned by: axeld
Priority: normal Milestone: Unscheduled
Component: Network & Internet/TCP Version: R1/Development
Keywords: Cc: a-star
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

I have a git-push that always fails about 10% (a few megabytes or so) the way through. The most recent TCP patch I just applied (#13704) didn't fix it, and the pcap dump seems to indicate that it's another window problem.

CC'ing Ayush, and also emailing him a copy of the pcap file. If anyone else would like to investigate, I can email them the pcap file also.

Attachments (1)

0001-tcp-fixed-RTO-update-and-dup-ACKs-generation.patch (5.8 KB ) - added by a-star 22 months ago.

Download all attachments as: .zip

Change History (18)

comment:1 by taos, 22 months ago

In addition to git push failing, I'm also experiencing problems cloning the haiku git repository:

git clone https://git.haiku-os.org/haiku -v
Cloning into 'haiku'...
POST git-upload-pack (chunked)
error: RPC failed; HTTP 408 curl 22 The requested URL returned error: 408 Request Timeout
fatal: The remote end hung up unexpectedly

comment:2 by a-star, 22 months ago

I have been real busy with exams and MS admissions.

I have started to look into the problem and found two possible issues. I will start to work on them in a day or two. Can expect a patch by the end of the week.

Sorry for the wait.

comment:3 by waddlesplash, 22 months ago

That's alright. Thanks for looking into this!

comment:4 by a-star, 22 months ago

Made some changes:

i) there was an integer promotion problem in updating the retransmission timeout : a signed int was being divided by an unsigned int. This was causing the values to overflow. Thus leading to huge values for timeout which manifested in the perception of pause in data flow.

ii) for an ack to be recognised as a duplicate ack, the advertised window must remain same. This was not taken care of in the code so I added it.

I tried my latest patch with files as large as 10mb (both git pull and git push) and everything worked fine. I hope the issue will be resolved now.

comment:5 by a-star, 22 months ago

Has a Patch: set

comment:6 by waddlesplash, 22 months ago

That seems to improve things slightly: now instead of permanently stalling around 18%, it instead halts there for about 20 seconds and then jumps to 21% before stalling again. I'll send you another pcap file.

comment:7 by waddlesplash, 22 months ago

Committed the patch in hrev51655.

comment:8 by taos, 21 months ago

With hrev51703 the error message during git push changed for me to:

remote: fatal: early EOF

Error for git clone is still the same.

comment:9 by pulkomandy, 21 months ago

Has a Patch: unset

comment:10 by pulkomandy, 21 months ago

Has a Patch: unset

@waddlesplash: please remember to set patches to "obsolete" if applying them and leaving the ticket open.

comment:11 by taos, 21 months ago

It seems that - at least for me - the problems with git disappeared with (or around the time of) the migration of the repositories to the new server.

comment:12 by waddlesplash, 21 months ago

Has a Patch: unset

The git push in question was to GitHub, so it hasn't changed.

comment:13 by waddlesplash, 18 months ago

I have another git-push failure that looks extremely similar to this one. Any developer who wants the pcap file can have it...

comment:14 by waddlesplash, 13 months ago

Resolution: fixed
Status: newclosed

This specific git-push succeeded, so I don't have a way to reproduce this now.

comment:15 by mmu_man, 7 months ago

Resolution: fixed
Status: closedreopened

I still have this issue here, trying to push to qtkeychains.

I did some tcpdump recording, and I believe we confuse the receiver with our Zero Window Probe, because we don't resent even though it only ACKs the previous segment, and we keep sending things after it:

  • we send the last segment before window is closed, next seg = N,
  • we get ACK for N, with window=0
  • we send Zero Window Probe with 1 byte, next seg = N+1,
  • we eventually get a window>0 ACK still at N,
  • we start sending stuff again, but starting from N+1,
  • receiver keeps ACKing N because it never accepted it.

It seems sending either 1 or 0 byte is valid for a ZWP, although I'm not entirely sure. At least it's easier to handle 0 than 1, and it seems to work. Wireshark shows them as duplicate ACKs, but they get the thing going.

References:

comment:17 by a-star, 7 months ago

If you want, I can look into the issue - it would be helpful if you could provide the pcap file as well. There might be some other underlying issue(s) as well.

Last edited 7 months ago by a-star (previous) (diff)
Note: See TracTickets for help on using tickets.