Opened 9 years ago

Closed 3 years ago

Last modified 3 years ago

#12414 closed enhancement (fixed)

[package kit] Package downloads not retried / resumeable

Reported by: waddlesplash Owned by: leavengood
Priority: critical Milestone: R1/beta3
Component: Kits/Package Kit Version: R1/Development
Keywords: Cc:
Blocked By: Blocking: #11637, #12691, #13816, #14948, #15312, #16387
Platform: All

Description

While downloading a very large system update, my internet cut out for a few seconds. This was long enough for the cURL package download to cut out, which stopped the download entirely.

There should be a few retries (at a minimum) before giving up, and there should be a way to resume if there's a transaction directory with some packages already downloaded.

Change History (22)

comment:1 by bonefish, 9 years ago

Owner: changed from bonefish to nobody
Status: newassigned

comment:2 by diver, 9 years ago

Blocking: 11637 added

comment:3 by diver, 7 years ago

Blocking: 12691 added

comment:4 by diver, 6 years ago

Blocking: 13816 added

comment:5 by diver, 5 years ago

Blocking: 14948 added

comment:6 by leavengood, 4 years ago

Owner: changed from nobody to leavengood

I'll look at this within the next few weeks, hopefully.

comment:7 by leavengood, 4 years ago

I started looking over the code for this. I think downloads could be retried fairly easily in the FetchFileJob::Execute method, but "resuming" a previous broken update seems more difficult.

There would need to be some way for the package kit to determine if a previous transaction directory represents an incomplete update, in which case it could try again using that directory. I also don't yet know if the entire directory is deleted when there is a download failure.

I am thinking over the options, but I will need to do some testing and may need to experiment.

comment:8 by waddlesplash, 4 years ago

Blocking: 16387 added

comment:9 by bitigchi, 4 years ago

leavengood, would this be available for beta3? This is currently my biggest frustration about package system, I don't get high speed downloads from Haiku servers, and the connection tends to brake very often.

comment:10 by pulkomandy, 4 years ago

Milestone: UnscheduledR1/beta3

I would approach this in the opposite direction: when doing an update, before downloading files, check if they are already stored in a transaction folder, and in that case, don't re-download them.

I think that would work well enough? The packages in the repos don't change this often, after all, and it's likely that the downloads from a previous attempt a few minutes earlier are not obsolete yet?

comment:11 by pulkomandy, 4 years ago

Blocking: 15312 added

comment:12 by pulkomandy, 4 years ago

Priority: normalcritical

Raising priority because this is really annoying with unstable connection.

comment:14 by Starcrasher, 3 years ago

For now, a transaction fails as soon as a package fails to download. The transaction folder contains all packages already downloaded completely and the incomplete download. We know that the corrupted file is the last in time. When the transaction is aborted, could we delete that file?

It would ensure that all packages files in the transaction folder are complete and reusable.

As an option, we could even try to proceed with files already downloaded.

Example:
I use SoftwareUpdater. There's Haiku new revision and KWrite to update. Let's say Kwrite package failed to download, it would still allow to update Haiku.

Additionally, we could put a text file in transaction folder indicating the reason of the abnormal end of the transaction.

Example:
Transaction cancelled by user. (only if some packages were already downloaded)
Transaction aborted, package name_of_the_guilty.hpkg failed to download completely.

comment:15 by stippi, 3 years ago

I have worked on this patch. Multiple retries are not implemented on the level of a transaction, though there is code in the FetchFileJob now to retry requests in certain error situations. That part is untested. What is tested and working is that all downloads of a previously failed transaction are re-used, including partial ones. So manually re-trying an update will pick up exactly where the last try aborted.

comment:16 by stippi, 3 years ago

Resolution: fixed
Status: assignedclosed

Fixed in hrev54860. Please re-open if this is not yet working satisfactorily. For example automatic retries for unattended updates. I've only tested manually stopping a transaction and manually re-trying it. For that situation, see comment:15 above.

comment:17 by waddlesplash, 3 years ago

stippi: Did you test disconnecting the network cable? IIRC I tried that with an earlier version of the patch and it caused strange errors and was not automatically recoverable.

in reply to:  17 comment:18 by stippi, 3 years ago

Replying to waddlesplash:

stippi: Did you test disconnecting the network cable? IIRC I tried that with an earlier version of the patch and it caused strange errors and was not automatically recoverable.

I will try that. If it doesn't work correctly, the problem will not be with the code that this patches touches, though, but rather with the existing error handler. The only change I've made in the error handling was that HTTP status 206 (partial data) is actually not an error, but the expected response when completing a range-request.

comment:19 by stippi, 3 years ago

When I unplug the cable during a transfer, the download progress in the Terminal output just stalls. When I re-plug the cable, the download continues for a little bit, then stops prematurely. I've added debug output to the complete error handling in FetchFileJob. The BHttpRequest calls the RequestCompleted() hook with status B_OK and success "true". The hook then checks the HTTP status from the response, that also indicates everything went fine. The transaction then fails because the package checksum does not match the repo information. The situation is made worse by my recent changes, because due to the BHttpRequest finishing with B_OK, the download is marked as complete. The user has to manually delete the incomplete package from the transaction folder in order to have a chance at re-trying.

I will now add some debug output to the transfer loop in BHttpRequest in order to dig deeper.

The Package Kit should delete downloads that fail the checksum check. Maybe I can provide a patch for that as well.

comment:20 by pulkomandy, 3 years ago

The request should not be complete in that case. It seems there is a bug in BSecureSocket which I have fixed here: https://review.haiku-os.org/c/haiku/+/3620

It should result in the request failing if that call times out. Additionally we should probably set a timeout to make sure this happens when the connection drops. Otherwise, openssl will keep trtrying forever, and only fail when the connection comes back and it notices that the server has closed the connection.

comment:21 by stippi, 3 years ago

https://review.haiku-os.org/c/haiku/+/3621 results in the desired retry behavior when unplugging the cable.

https://review.haiku-os.org/c/haiku/+/3620 does not (yet) result in a change of behavior. "bytesRead" is used to generate the error via BSecureSocket::Private::ErrorCode(int returnValue), but "0" maps to "SSL_NO_ERROR" and is mapped to "B_NO_ERROR". This will result in BSecureSocket::Read() still returning "0".

comment:22 by stippi, 3 years ago

With https://review.haiku-os.org/c/haiku/+/3635 retries should finally work.

Note: See TracTickets for help on using tickets.