Opened 8 months ago

Closed 7 months ago

Last modified 6 months ago

#15474 closed bug (fixed)

Stable CI/CD

Reported by: kallisti5 Owned by: kallisti5
Priority: critical Milestone: R1/beta2
Component: Build System Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Platform: All

Description

We use buildbot at the moment for our releases. While it has worked over the years, we have been slowly outgrowing it. We have 1000+ lines of python code using the Buildbot API.

Some limitations of our current buildbot:

  • Branch selection broken. Unable to inject multiple codebases into a build without a major refactor.
  • Continuous series of CVE's we have to keep bumping the buildbot version to resolve.
  • (opinion) buildbot is lacking maintainers. A lot of major bugs have crept in recent releases which have been breaking our builds.

Here are some of the pain points from a design standpoint:

  • buildtools / toolchains are compiled on each builder and stored locally. It's possible to get a random toolchain depending on your random builder assignment.
  • Since buildtools / toolchains are cached, we don't support "selecting buildtools branches"
  • Secrets don't work in all API objects preventing their proper usage.
  • We can't send artifacts to buildbot using the native API's. Our artifacts are too big and break the buildbot web ui during the upload. (builders timeout)

Change History (8)

comment:2 by kallisti5, 8 months ago

To help solve this issue, i've been playing with 'concourse-ci' for Haiku. It's vastly different and solves a *lot* of the problems we have with buildbot. It also creates a bunch of new issues :-|

Better:

  • Container based builds. We build our toolchain into a versioned container, then all workers use the *exact* same toolchain on all workers.
  • Simple YAML configuration of pipelines.
  • Pretty modern public web ui
  • CLI control and monitoring of builds
  • Wide range of "plugins" which are just containers expecting special inputs/outputs
  • Support for branches based on multiple code branches. (r1beta1 toolchain, r1beta1 branch of haiku, etc)

Worse:

  • With remote workers, getting random network outages within build containers
  • With remote workers, getting random DNS resolution failures within build containers.
  • WebUI chokes on larger log outputs. I've worked with concourse folks to help reduce the pain here.
  • WebUI can't be put at a different prefix. (build.haiku-os.org/concourse)
  • S3 plugin resource is clunky and expects you to use versioned s3 artifacts.

https://ci.haiku-os.org is our current deployment.

There's a lot of promise here, but the network / DNS issues are greatly slowing the rollout

comment:3 by kallisti5, 8 months ago

Owner: changed from bonefish to kallisti5
Status: newassigned

comment:4 by kallisti5, 8 months ago

Priority: highcritical

While this one is important (and we need to figure it out, it really isn't a blocker to R1 Beta2 at this point). Our infrastructure is stable at the moment and anything that needs done in the scope of repos can be done manually until the automation improves.

Building R1 Beta1 images working since Oct: https://ci.haiku-os.org/teams/r1beta1/pipelines/r1beta1-x86_64/jobs/image-r1beta1-x86_64/builds/17

These go to s3 buckets, we could hand adjust whatever other release stuff we want in them.

comment:5 by pulkomandy, 7 months ago

I think concourse is good enough for this now? Even if there is the occasional build failure, it allows us to run multiple branches and platforms in parallel in a more sane way, which is all we need here?

comment:6 by kallisti5, 7 months ago

Resolution: fixed
Status: assignedclosed

Agree. The new build system is in place. There is only one worker at the moment (a Dell i7 in my office), but it is a pretty strong machine with lots of RAM and an NVMe. (Donated by Haiku, Inc.) Waiting on Geist to get some freetime so we can walk through configuring his systems.

We can build branches (and have branch-based buildtools, etc). All workers use an identical set of binary buildtools from docker, and anyone is free to pull the pre-compiled buildtools and use them on their desktop. (I could see folks now doing Windows / OS X builds within these toolchain containers as well.. only requirement is Docker)

All the pipelines are in our infrastructure git, buildbot has been disabled, links have been updated. Branch images upload to their associated S3 buckets at Wasabi which cuts down on costs and speeds downloads. Images (and the Haiku repo itself) is signed now as well with minisign (.minisig)

There are the occasional DNS issues, Concourse is working to move to plain containerd which will hopefully solve the issues.

Note: See TracTickets for help on using tickets.