Opened 2 years ago

Closed 2 years ago

#18115 closed bug (fixed)

"Checksum error" while attempting to update packages from the Haiku repo.

Reported by: bipolar Owned by: nobody
Priority: high Milestone: R1/beta4
Component: - General Version: R1/beta4
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description (last modified by bipolar)

Not sure if this ticket belongs in here, or in Haiku/infrastructure.

This is on beta4 hrev56578+31 (64 bits).

After SoftwareUpdater failed with an 'Refreshing repository "Haiku" failed' message, I ran pkgman full-sync, and got the following output:

> pkgman full
100% repochecksum-1 [65 bytes]
Validating checksum for BeSly Software Solutions...done.
100% repochecksum-1 [65 bytes]
Validating checksum for Haiku...done.
100% repocache-2 [2.37 KiB]
Checksum error:
*** expected '65f1d2b073bc31d6cc87b3930ca0b927c47f6cc983be69c73c346d929b3be79c'
*** got      'b834dc91ee4fbab736d1252e90ca42198511bc402d45fc9ce13e4f9d1cdff550'Refreshing repository "Haiku" failedValidating checksum for Haiku...: Bad data
100% repochecksum-1 [64 bytes]
Validating checksum for HaikuPorts...done.
The following changes will be made:
  in system:
    upgrade package harfbuzz-4.0.0-1 to 4.0.0-3 from repository HaikuPorts
    upgrade package libwebp-1.2.2-1 to 1.2.4-2 from repository HaikuPorts
Continue? [yes/no] (yes) : n

Change History (14)

comment:1 by bipolar, 2 years ago

Another user on IRC had the same problem yesterday, with the 32 bit repos: https://0x0.st/okqf.png

comment:2 by bipolar, 2 years ago

Description: modified (diff)

comment:3 by kallisti5, 2 years ago

I'm seeing reports of these issues from Haiku *and* Haikuports repos.

  • Haiku repos are served via wasabi s3 via a redirector on our infrastructure.
    • Populated by concourse
  • Haikuports repos are served via our infrastructure directly.
    • Populated by haikuporter buildmaster

As you're getting unexpected checksums from haikuports (aka from Wasabi through the HTTP 302), and nielx reported a similar issue with the haiku repositories... i kinda suspect a potential regression in our network stack.

Last edited 2 years ago by kallisti5 (previous) (diff)

comment:4 by kallisti5, 2 years ago

nevermind.. i see now. There's a checksum error above those haikuports packages. Investigating.

comment:5 by kallisti5, 2 years ago

Think I figured it out. seeing a mix of dates in the build directory.

# ls -la 
total 80182
drwxr-x--- 1 root root        0 Dec 31  1969 .
drwxr-x--- 1 root root        0 Dec 31  1969 ..
-rw-r----- 1 root root  2838530 Dec  3 01:46 haiku_datatranslators-r1~beta4_hrev56578_46-1-x86_64.hpkg
-rw-r----- 1 root root  4020479 Dec  3 01:46 haiku_devel-r1~beta4_hrev56578_46-1-x86_64.hpkg
-rw-r----- 1 root root   141049 Dec  3 01:46 haiku_extras-r1~beta4_hrev56578_46-1-x86_64.hpkg
-rw-r----- 1 root root   285960 Dec  1 17:04 haiku_loader-r1~beta4_hrev56578_46-1-x86_64.hpkg
-rw-r----- 1 root root 40922287 Dec  3 01:46 haiku-r1~beta4_hrev56578_46-1-x86_64.hpkg
-rw-r----- 1 root root 31569778 Dec  3 01:46 haiku_source-r1~beta4_hrev56578_46-1-any.hpkg
-rw-r----- 1 root root     9743 Dec  2 01:55 makefile_engine-r1~beta4_hrev56578_46-1-any.hpkg
-rw-r----- 1 root root   507748 Dec  3 01:46 netfs-r1~beta4_hrev56578_46-1-x86_64.hpkg
-rw-r----- 1 root root   392981 Dec  3 01:46 userland_fs-r1~beta4_hrev56578_46-1-x86_64.hpkg
-rw-r----- 1 root root  1413430 Dec  3 01:46 webpositive-r1~beta4_hrev56578_46-1-x86_64.hpkg

I bet I need a flag to "force overwrite" files. Multiple builds were likely done for this hrev making the repo not match the packages

Last edited 2 years ago by kallisti5 (previous) (diff)

comment:6 by kallisti5, 2 years ago

ack. ok. Yeah, found the bug. the rclone copy command is weird.

https://rclone.org/commands/rclone_copy/

Copy the source to the destination.  Does not transfer files that are
identical on source and destination, testing by size and modification
time or MD5SUM.  Doesn't delete files from the destination. If you
want to also delete files from destination, to make it match source,
use the [sync](/commands/rclone_sync/) command instead.

At face value that sounds good.. however reading it a few times highlights some problems. We're seeing multiple builds not updating the same older files.

If those files change... it won't delete them / overwrite them to update them? Super confused.

I'm digging more, but definitely know "where" the issue is. Definitely due to the pipeline rework I did a few days ago

comment:7 by pulkomandy, 2 years ago

Milestone: UnscheduledR1/beta4
Priority: normalhigh

comment:8 by bipolar, 2 years ago

Slight side-track here but... notice how pkgman output/error messages are a bit confusing here.

I've notice two things:

1- There are some missing \n between:

a- the last Checksum error: line.

b- the Refreshing repository "Haiku" failed message.

c- the Validating checksum for Haiku...: Bad data message.

A quick look at that code makes it seem (to me) that the messages expect the last '\n' to be added by the code actually doing the output (as is done by UserInteractionHandler::Warn() implementation on pkgman's PackageManager::Warn()).

But either I get lost too easily (most likely), or I couldn't find where/why it fails to add those '\n' as it should.

2- There's seems to be a Validating checksum for <reponame> message for two different stages of validation, the one for the repository checksum (performed by BRefreshRepositoryRequest::CreateInitialJobs()) and for the "repository cache" checksum (done by BRefreshRepositoryRequest::_FetchRepositoryCache().

I guess the second should be changed, so in this case, instead of getting:

Validating checksum for Haiku...: Bad data,

we should have something like:

Validating checksum for Haiku repository cache...: Bad data

no?

Last edited 2 years ago by bipolar (previous) (diff)

comment:9 by kallisti5, 2 years ago

Yeah, the wording is a bit confusing which threw me off.

The problem it was complaining about is the repo.sha256 not matching the sha256sum of the repo file.

I did some digging on our ci/cd pipelines, and rclone is attempting to use size + modified time to detect changes in files. the size of already present repo file varies, but the size of an already present repo.sha256 file is consistent (as the checksums and filenames are always the same length)

More digging, and I found this awesomely described flag for rclone:

  -c, --checksum                             Skip based on checksum (if available) & size, not mod-time & size

So, i'm hopeful that will solve it lol. Doing a test build now.

Last edited 2 years ago by kallisti5 (previous) (diff)

comment:10 by kallisti5, 2 years ago

Here's some evidence backing up the above from build 10 (i had to crank up debug to see it)

https://ci.haiku-os.org/teams/r1beta4/pipelines/r1beta4-x86_64/jobs/compile-r1beta4-x86_64/builds/10

2022-12-04 19:50:30 DEBUG : x86_64/r1~beta4_hrev56578_46/repo.sha256: Sizes identical
2022-12-04 19:50:30 DEBUG : x86_64/r1~beta4_hrev56578_46/repo.sha256: Unchanged skipping
.
.
*             x86_64/r1~beta4_hrev56578_46/repo:100% /2.375Ki, 0/s, -

if repo changed, you know the sha256 *should* have changed.

Last edited 2 years ago by kallisti5 (previous) (diff)

comment:11 by kallisti5, 2 years ago

sigh. I found the bug. The concourse resource implementing rclone is making a really bad assumption: https://github.com/warricksothr/concourse-rclone-resource/blob/master/assets/out#L114

I've removed that and am running a new build now (and i've enabled check-summing for good measure)

comment:12 by kallisti5, 2 years ago

Ok, the bug has been fixed via https://github.com/kallisti5/concourse-rclone-resource/commit/c11f22d705dd177a1fc089d772371a57e5691e0a

I've triggered r1beta4 builds by hand which should fix those repos. The nightly build will fix any remaining nightly repo issues.

I'll follow up tomorrow morning and confirm things go as expected this evening.

comment:13 by outsidecontext, 2 years ago

Thanks a lot kallisti5. I just could update successfully again without checksum errors

comment:14 by kallisti5, 2 years ago

Resolution: fixed
Status: newclosed

Thanks for confirming outsidecontext! Glad this one is resolved *and* we're now running rclone.

Note: See TracTickets for help on using tickets.