Opened 2 years ago
Closed 2 years ago
#18115 closed bug (fixed)
"Checksum error" while attempting to update packages from the Haiku repo.
Reported by: | bipolar | Owned by: | nobody |
---|---|---|---|
Priority: | high | Milestone: | R1/beta4 |
Component: | - General | Version: | R1/beta4 |
Keywords: | Cc: | ||
Blocked By: | Blocking: | ||
Platform: | All |
Description (last modified by )
Not sure if this ticket belongs in here, or in Haiku/infrastructure.
This is on beta4 hrev56578+31 (64 bits).
After SoftwareUpdater failed with an 'Refreshing repository "Haiku" failed' message, I ran pkgman full-sync
, and got the following output:
> pkgman full 100% repochecksum-1 [65 bytes] Validating checksum for BeSly Software Solutions...done. 100% repochecksum-1 [65 bytes] Validating checksum for Haiku...done. 100% repocache-2 [2.37 KiB] Checksum error: *** expected '65f1d2b073bc31d6cc87b3930ca0b927c47f6cc983be69c73c346d929b3be79c' *** got 'b834dc91ee4fbab736d1252e90ca42198511bc402d45fc9ce13e4f9d1cdff550'Refreshing repository "Haiku" failedValidating checksum for Haiku...: Bad data 100% repochecksum-1 [64 bytes] Validating checksum for HaikuPorts...done. The following changes will be made: in system: upgrade package harfbuzz-4.0.0-1 to 4.0.0-3 from repository HaikuPorts upgrade package libwebp-1.2.2-1 to 1.2.4-2 from repository HaikuPorts Continue? [yes/no] (yes) : n
Change History (14)
comment:1 by , 2 years ago
comment:2 by , 2 years ago
Description: | modified (diff) |
---|
comment:3 by , 2 years ago
I'm seeing reports of these issues from Haiku *and* Haikuports repos.
- Haiku repos are served via wasabi s3 via a redirector on our infrastructure.
- Populated by concourse
- Haikuports repos are served via our infrastructure directly.
- Populated by haikuporter buildmaster
As you're getting unexpected checksums from haikuports (aka from Wasabi through the HTTP 302), and nielx reported a similar issue with the haiku repositories... i kinda suspect a potential regression in our network stack.
comment:4 by , 2 years ago
nevermind.. i see now. There's a checksum error above those haikuports packages. Investigating.
comment:5 by , 2 years ago
Think I figured it out. seeing a mix of dates in the build directory.
# ls -la total 80182 drwxr-x--- 1 root root 0 Dec 31 1969 . drwxr-x--- 1 root root 0 Dec 31 1969 .. -rw-r----- 1 root root 2838530 Dec 3 01:46 haiku_datatranslators-r1~beta4_hrev56578_46-1-x86_64.hpkg -rw-r----- 1 root root 4020479 Dec 3 01:46 haiku_devel-r1~beta4_hrev56578_46-1-x86_64.hpkg -rw-r----- 1 root root 141049 Dec 3 01:46 haiku_extras-r1~beta4_hrev56578_46-1-x86_64.hpkg -rw-r----- 1 root root 285960 Dec 1 17:04 haiku_loader-r1~beta4_hrev56578_46-1-x86_64.hpkg -rw-r----- 1 root root 40922287 Dec 3 01:46 haiku-r1~beta4_hrev56578_46-1-x86_64.hpkg -rw-r----- 1 root root 31569778 Dec 3 01:46 haiku_source-r1~beta4_hrev56578_46-1-any.hpkg -rw-r----- 1 root root 9743 Dec 2 01:55 makefile_engine-r1~beta4_hrev56578_46-1-any.hpkg -rw-r----- 1 root root 507748 Dec 3 01:46 netfs-r1~beta4_hrev56578_46-1-x86_64.hpkg -rw-r----- 1 root root 392981 Dec 3 01:46 userland_fs-r1~beta4_hrev56578_46-1-x86_64.hpkg -rw-r----- 1 root root 1413430 Dec 3 01:46 webpositive-r1~beta4_hrev56578_46-1-x86_64.hpkg
I bet I need a flag to "force overwrite" files. Multiple builds were likely done for this hrev making the repo not match the packages
comment:6 by , 2 years ago
ack. ok. Yeah, found the bug. the rclone copy command is weird.
https://rclone.org/commands/rclone_copy/
Copy the source to the destination. Does not transfer files that are identical on source and destination, testing by size and modification time or MD5SUM. Doesn't delete files from the destination. If you want to also delete files from destination, to make it match source, use the [sync](/commands/rclone_sync/) command instead.
At face value that sounds good.. however reading it a few times highlights some problems. We're seeing multiple builds not updating the same older files.
If those files change... it won't delete them / overwrite them to update them? Super confused.
I'm digging more, but definitely know "where" the issue is. Definitely due to the pipeline rework I did a few days ago
comment:7 by , 2 years ago
Milestone: | Unscheduled → R1/beta4 |
---|---|
Priority: | normal → high |
comment:8 by , 2 years ago
Slight side-track here but... notice how pkgman output/error messages are a bit confusing here.
I've notice two things:
1- There are some missing \n
between:
a- the last
Checksum error:
line.
b- the
Refreshing repository "Haiku" failed
message.
c- the
Validating checksum for Haiku...: Bad data
message.
A quick look at that code makes it seem (to me) that the messages expect the last '\n' to be added by the code actually doing the output (as is done by UserInteractionHandler::Warn()
implementation on pkgman's PackageManager::Warn()
).
But either I get lost too easily (most likely), or I couldn't find where/why it fails to add those '\n' as it should.
2- There's seems to be a Validating checksum for <reponame>
message for two different stages of validation, the one for the repository checksum (performed by BRefreshRepositoryRequest::CreateInitialJobs()
) and for the "repository cache" checksum (done by BRefreshRepositoryRequest::_FetchRepositoryCache()
.
I guess the second should be changed, so in this case, instead of getting:
Validating checksum for Haiku...: Bad data
,
we should have something like:
Validating checksum for Haiku repository cache...: Bad data
no?
comment:9 by , 2 years ago
Yeah, the wording is a bit confusing which threw me off.
The problem it was complaining about is the repo.sha256 not matching the sha256sum of the repo file.
I did some digging on our ci/cd pipelines, and rclone is attempting to use size + modified time to detect changes in files. the size of already present repo file varies, but the size of an already present repo.sha256 file is consistent (as the checksums and filenames are always the same length)
More digging, and I found this awesomely described flag for rclone:
-c, --checksum Skip based on checksum (if available) & size, not mod-time & size
So, i'm hopeful that will solve it lol. Doing a test build now.
comment:10 by , 2 years ago
Here's some evidence backing up the above from build 10 (i had to crank up debug to see it)
2022-12-04 19:50:30 DEBUG : x86_64/r1~beta4_hrev56578_46/repo.sha256: Sizes identical 2022-12-04 19:50:30 DEBUG : x86_64/r1~beta4_hrev56578_46/repo.sha256: Unchanged skipping . . * x86_64/r1~beta4_hrev56578_46/repo:100% /2.375Ki, 0/s, -
if repo changed, you know the sha256 *should* have changed.
comment:11 by , 2 years ago
sigh. I found the bug. The concourse resource implementing rclone is making a really bad assumption: https://github.com/warricksothr/concourse-rclone-resource/blob/master/assets/out#L114
I've removed that and am running a new build now (and i've enabled check-summing for good measure)
comment:12 by , 2 years ago
Ok, the bug has been fixed via https://github.com/kallisti5/concourse-rclone-resource/commit/c11f22d705dd177a1fc089d772371a57e5691e0a
I've triggered r1beta4 builds by hand which should fix those repos. The nightly build will fix any remaining nightly repo issues.
I'll follow up tomorrow morning and confirm things go as expected this evening.
comment:13 by , 2 years ago
Thanks a lot kallisti5. I just could update successfully again without checksum errors
comment:14 by , 2 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Thanks for confirming outsidecontext! Glad this one is resolved *and* we're now running rclone.
Another user on IRC had the same problem yesterday, with the 32 bit repos: https://0x0.st/okqf.png