Opened 19 months ago

Last modified 2 months ago

#13769 reopened bug

git-push failure following TCP changes

Reported by: waddlesplash Owned by: axeld
Priority: normal Milestone: Unscheduled
Component: Network & Internet/TCP Version: R1/Development
Keywords: Cc: a-star
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

I have a git-push that always fails about 10% (a few megabytes or so) the way through. The most recent TCP patch I just applied (#13704) didn't fix it, and the pcap dump seems to indicate that it's another window problem.

CC'ing Ayush, and also emailing him a copy of the pcap file. If anyone else would like to investigate, I can email them the pcap file also.

Attachments (1)

0001-tcp-fixed-RTO-update-and-dup-ACKs-generation.patch (5.8 KB) - added by a-star 18 months ago.

Download all attachments as: .zip

Change History (18)

comment:1 Changed 18 months ago by taos

In addition to git push failing, I'm also experiencing problems cloning the haiku git repository:

git clone https://git.haiku-os.org/haiku -v
Cloning into 'haiku'...
POST git-upload-pack (chunked)
error: RPC failed; HTTP 408 curl 22 The requested URL returned error: 408 Request Timeout
fatal: The remote end hung up unexpectedly

comment:2 Changed 18 months ago by a-star

I have been real busy with exams and MS admissions.

I have started to look into the problem and found two possible issues. I will start to work on them in a day or two. Can expect a patch by the end of the week.

Sorry for the wait.

comment:3 Changed 18 months ago by waddlesplash

That's alright. Thanks for looking into this!

comment:4 Changed 18 months ago by a-star

Made some changes:

i) there was an integer promotion problem in updating the retransmission timeout : a signed int was being divided by an unsigned int. This was causing the values to overflow. Thus leading to huge values for timeout which manifested in the perception of pause in data flow.

ii) for an ack to be recognised as a duplicate ack, the advertised window must remain same. This was not taken care of in the code so I added it.

I tried my latest patch with files as large as 10mb (both git pull and git push) and everything worked fine. I hope the issue will be resolved now.

comment:5 Changed 18 months ago by a-star

Has a Patch: set

comment:6 Changed 18 months ago by waddlesplash

That seems to improve things slightly: now instead of permanently stalling around 18%, it instead halts there for about 20 seconds and then jumps to 21% before stalling again. I'll send you another pcap file.

comment:7 Changed 18 months ago by waddlesplash

Committed the patch in hrev51655.

comment:8 Changed 17 months ago by taos

With hrev51703 the error message during git push changed for me to:

remote: fatal: early EOF

Error for git clone is still the same.

comment:9 Changed 17 months ago by pulkomandy

Has a Patch: unset

comment:10 Changed 17 months ago by pulkomandy

Has a Patch: unset

@waddlesplash: please remember to set patches to "obsolete" if applying them and leaving the ticket open.

comment:11 Changed 16 months ago by taos

It seems that - at least for me - the problems with git disappeared with (or around the time of) the migration of the repositories to the new server.

comment:12 Changed 16 months ago by waddlesplash

Has a Patch: unset

The git push in question was to GitHub, so it hasn't changed.

comment:13 Changed 14 months ago by waddlesplash

I have another git-push failure that looks extremely similar to this one. Any developer who wants the pcap file can have it...

comment:14 Changed 8 months ago by waddlesplash

Resolution: fixed
Status: newclosed

This specific git-push succeeded, so I don't have a way to reproduce this now.

comment:15 Changed 2 months ago by mmu_man

Resolution: fixed
Status: closedreopened

I still have this issue here, trying to push to qtkeychains.

I did some tcpdump recording, and I believe we confuse the receiver with our Zero Window Probe, because we don't resent even though it only ACKs the previous segment, and we keep sending things after it:

  • we send the last segment before window is closed, next seg = N,
  • we get ACK for N, with window=0
  • we send Zero Window Probe with 1 byte, next seg = N+1,
  • we eventually get a window>0 ACK still at N,
  • we start sending stuff again, but starting from N+1,
  • receiver keeps ACKing N because it never accepted it.

It seems sending either 1 or 0 byte is valid for a ZWP, although I'm not entirely sure. At least it's easier to handle 0 than 1, and it seems to work. Wireshark shows them as duplicate ACKs, but they get the thing going.

References:

comment:17 Changed 2 months ago by a-star

If you want, I can look into the issue - it would be helpful if you could provide the pcap file as well. There might be some other underlying issue(s) as well.

Last edited 2 months ago by a-star (previous) (diff)
Note: See TracTickets for help on using tickets.