Opened 5 weeks ago

Last modified 4 weeks ago

#15474 assigned bug

Stable CI/CD

Reported by: kallisti5 Owned by: kallisti5
Priority: critical Milestone: R1/beta2
Component: Build System Version: R1/Development
Keywords: Cc:
Blocked By: Blocking:
Has a Patch: no Platform: All

Description

We use buildbot at the moment for our releases. While it has worked over the years, we have been slowly outgrowing it. We have 1000+ lines of python code using the Buildbot API.

Some limitations of our current buildbot:

  • Branch selection broken. Unable to inject multiple codebases into a build without a major refactor.
  • Continuous series of CVE's we have to keep bumping the buildbot version to resolve.
  • (opinion) buildbot is lacking maintainers. A lot of major bugs have crept in recent releases which have been breaking our builds.

Here are some of the pain points from a design standpoint:

  • buildtools / toolchains are compiled on each builder and stored locally. It's possible to get a random toolchain depending on your random builder assignment.
  • Since buildtools / toolchains are cached, we don't support "selecting buildtools branches"
  • Secrets don't work in all API objects preventing their proper usage.
  • We can't send artifacts to buildbot using the native API's. Our artifacts are too big and break the buildbot web ui during the upload. (builders timeout)

Change History (4)

comment:2 by kallisti5, 5 weeks ago

To help solve this issue, i've been playing with 'concourse-ci' for Haiku. It's vastly different and solves a *lot* of the problems we have with buildbot. It also creates a bunch of new issues :-|

Better:

  • Container based builds. We build our toolchain into a versioned container, then all workers use the *exact* same toolchain on all workers.
  • Simple YAML configuration of pipelines.
  • Pretty modern public web ui
  • CLI control and monitoring of builds
  • Wide range of "plugins" which are just containers expecting special inputs/outputs
  • Support for branches based on multiple code branches. (r1beta1 toolchain, r1beta1 branch of haiku, etc)

Worse:

  • With remote workers, getting random network outages within build containers
  • With remote workers, getting random DNS resolution failures within build containers.
  • WebUI chokes on larger log outputs. I've worked with concourse folks to help reduce the pain here.
  • WebUI can't be put at a different prefix. (build.haiku-os.org/concourse)
  • S3 plugin resource is clunky and expects you to use versioned s3 artifacts.

https://ci.haiku-os.org is our current deployment.

There's a lot of promise here, but the network / DNS issues are greatly slowing the rollout

comment:3 by kallisti5, 5 weeks ago

Owner: changed from bonefish to kallisti5
Status: newassigned

comment:4 by kallisti5, 4 weeks ago

Priority: highcritical

While this one is important (and we need to figure it out, it really isn't a blocker to R1 Beta2 at this point). Our infrastructure is stable at the moment and anything that needs done in the scope of repos can be done manually until the automation improves.

Building R1 Beta1 images working since Oct: https://ci.haiku-os.org/teams/r1beta1/pipelines/r1beta1-x86_64/jobs/image-r1beta1-x86_64/builds/17

These go to s3 buckets, we could hand adjust whatever other release stuff we want in them.

Note: See TracTickets for help on using tickets.