Wikimedia Release Engineering Team/CI Futures WG/Phase2

THIS IS A DRAFT UNDER DISCUSSION

"I" in the text below is Lars.

---

Here are some initial thoughs on evaluation criteria for the next phase on CI engine evaluation. The overall approach is to install and use at least one of the candidates (Argo, GitLab, Zuul v3).

We should do just about everything during evaluation that we'll want to do once we choose one to use for real. About the only thing we don't need to do, is deploy to production.

My thinking about the pipeline for evaluation is currently as follows:

Code review happens on Gerrit, but we might also want to try the merge-request style that GitLab provides natively, even if that means hosting the code on a GitLab instance. I'd be interested in the GitLab approach, since I have the feeling our developers would like it, but I realise it's a big change to status quo. Possibly too big.

When a change is pushed to Gerrit, CI will trigger the pipeline, automatically, without human intervention. This means the pipeline will need to listen on Gerrit notifications. (If we skip Gerrit, pushing the change or opening a MR should still trigger the pipeline.)
- Except for Zuul, we may need to write a component that does the Gerrit event handling and CI triggering. I don't know how tricky that will be, but should be fairly isolated.

For the evaluation, the pipeline will have two stages: commit and acceptance test. For real use, we'll want more stages, but this will do for evaluation. Since we'll want more than one stage in real use, I feel it's best to have at least two for evaluation.

For both stages, some of the actions should be specified by the repository for the project, and additional actions should be specified globally by RelEng. What the project specifies should be in the same repository as the source code, something like .gitlab-ci.yml although the name isn't important. What RelEng specifies should also be in a git repository somewhere.

The commit phase builds the project, producing binaries and other artifacts that will be used later for deployment. The commit stage also runs unit tests, and other tests that can be run quickly without deploying the software anywhere.
- In addition, commit stage will run some tests globally specified by RelEng. These will include things like code health checks.
- The project is required to specify a Blubber file (.pipeline/blubber.yaml or similar), and RelEng will provide the details of how the commit stage builds a Docker image and publishes it to the WMF Docker registry or other suitable location.
- If the commit stage fails in any way, due to errors during building or while running the tests, the pipeline fails, and if it fails, the build log ("console output" in Jenkins lingo) is the only output of the commit stage. If the output succeeds, the output also includes the Docker image.
- Additionally, a notification is sent to the developer who pushed the change to Gerrit or GitLab with the result of the commit stage (PASS/FAIL), and a link to the build log. For the evaluation, these need not go back to Gerrit and can communicated via other means.

The acceptance test stage takes the Docker image built by the commit stage, and runs it under Docker (or Minikube or Kubernetes, depending on which makes the most sense for the evaluation), and runs tests against the image. The tests are specified by a file in the repository.
- Additionally, RelEng will specify tests that get run against all projects. These will test things that all Docker images deployed to production will need to pass, such as checking that the service in the container is healthy.
- The output of acceptance test stage is the build log only.
- Additionally, a notification is sent to the developer who pushed the change to Gerrit or GitLab with the result of the acceptance test stage (PASS/FAIL), and a link to the build log.

The goal is to implement the above pipeline on at least one of the candidates, and use it in anger for a couple of carefully chosen projects, for long enough that we get a good feel for how the candidate is like in real use.
- I propose Blubber as one of the test projects, since it is simple and entirely under our own control, plus at least one more.
- I'm not familiar enough with our various projects to suggest which, but maybe RelEng has suggestions? An ambitious choice would be MediaWiki + Quibble, but that might be too difficult to get working. Suggestions?

In addition to merely getting builds to work, we should consider at least the following:

How much work is it to install and upgrade and generally maintain the CI engine? How much of that can be automated, preferably using Puppet?

Can we run the CI engine entirely in Kubernets containers, modulo databases and other persistence? If we need to run it in a bare metal host or in a VM, can we do that on WMF hardware? What hardware resources (CPU, RAM, disk space, disk I/O, network I/O) does the engine need?

What is the account management like? We will need to control access similar to like we do on Gerrit, can we do that? Special emphasis on dealing with exciting situations that require revoking access or handling other unpleasant situations.

Can we handle security updates using the candidate? This would probably require being able to keep some source code changes and resulting builds secret during the embargo period.

Is there a global dashboard view of what's happening in the CI engine? What is building, what's in the queue waiting to build?

Is it possible to stop a running build, if it seems stuck? Can this be done automatically, based on a timer?

Can builds be triggered based on a schedule?

Can a build be re-triggered manually? In case there there's a temporary external failure.

How much manual cleanup between builds is needed? This includes things like killing off runaway processes and removing unwanted files from the workspace.

How can artifacts other then build logs and Docker image be stored? Eventually we will want that. For example, a project might want to build minified JS files or language packs and store them for deployment to various environments for acceptance tests and production, and we will need a place for them.

How fast is the CI engine? If we make a minimal change to a project, how long until the whole pipeline has run? In other words, what is the minimum cycle time?

Can we collect other interesting metrics, reasonably easily?

Can we deploy and test the CI engine automatically, using itself?