As of February/March 2019 the Release Engineering team at the Wikimedia Foundation is taking a look at what tooling the deployment pipeline software should be like. As the first step, we're collecting requirements. Feel free to add yours here; we are also reaching out to interview the most important stakeholders.

Developer satisfaction survey had some answers related to this.

Requirements that are better than our current situation are labelled (NEW!) for clarity.

Very hard requirements

Most CI software will meet most hard requirements. This section helps to quickly rule out some solutions. These are meant to be easy and quick to check for each candidate.

Must be free software / open source. "Open core" like GitLab might be good enough.
Must be hostable by the Foundation. It's not acceptable to rely on outside services.
Must support git.
Must have a version we can easily use for evaluation.
Must be understandable without too much effort to our developers so that they can use CI/CD productively.
(NEW!) Must support self-serve CI, meaning we don't block people if they want CI for a new repo.

Hard requirements

Must be fast enough that it isn't perceived as a bottleneck by developers.
Must make its status and what-is-going-on visible so that its operation can be monitored and so that our developers can check the status of their builds themselves.
Must provide feedback to the developers as early as possible for the various stages of a build, especially the early stages ("can get source from git", "can build", "can run unit tests", etc.).
Must be secure enough that we can open it to community developers to use without too much supervision.
Must be maintained and supported upstream.
Must be able to handle the number of repositories, projects, builds, and deployments that we have, and will have in the foreseeable future.
Must enable us to instrument it to get metrics for CI use and effectiveness as we need. Things like cycle times, build times, build failures, etc.
Must empower our developers and remove Release Engineering team as a bottleneck for their productivity.
Must work with Gerrit as well as other self-hostable code-review systems (e.g., GitLab), if we decide to move to that later.
Must enable us to have a short cycle time (from idea to running in production). CI is not the only thing that affects this, but it is an important factor.
Must promote (copy) Docker images and other build artifacts from "testing" to "staging" to "production", rather than rebuilding them, since rebuilding takes time and can fail.
Must allow developer to replicate locally the tests that CI runs. This is necessary to allow lower friction in development, as well as to aid debugging.
Must allow deployment to be fully automated.
Must be automatically deployable by us or SRE, using puppet, onto a fresh server. I don't know if that means .deb packages are needed or not, but it certainly wouldn't be a bad thing.
Must be horizontally scalable: we need to be able to add more hardware easily to get more capacity.
Must be able to support all programming languages we currently support.
Must support linking to build results for easier reference and discussion.
Must support saving of build artifacts.
Must keep configuration in version control.
Must support gating / pre-merge testing.
Must support periodic / scheduled testing.
Must support post-merge testing.
Must support tooling to do the merging, instead of developers.
Must support storing tests in version control.
Must support collection of relevant metrics: cycle time, job run time, wait time, etc.
Must support reporting to Gerrit, IRC, and Phabricator.
Must have some way to declare dependent repositories / software needed for testing.
Must support services for tests — i.e., some PHPUnit tests require MySQL.
Must allow changing git repository, code review, and ticketing systems from Gerrit and Phabricator.
Must protect production by detecting problems before they're deployed, and must in general support a sensible CI/CD pipeline.
Must allow Release Engineering team to enforce tests on top of what a self-serving developer specifies, to allow us to set minimal technical standards.
Must support dependency caching – we have castor, maybe we could do better? Maybe some CI systems have this figured out?

Softer requirements

Should have a hosted instance we can play with for evaluation to avoid having to install everything from scratch.
(NEW!) Should not require software development from the Foundation.
Should allow builds to happen in K8s containers, but probably should also support running jobs on bare metal or VMs. For example, building Docker containers can't happen inside Docker containers.
(NEW!) Should allow the developers to define or declare at least parts of the pipeline jobs in the repository: what commands to run for building, testing, etc.
Should be highly available - can restart any component without disrupting service.
Should be fast at checking out code and running tests to give quick feedback to developers.
Should have live console output of build.
Should have build timeouts.
Should support secure storage of credentials / secrets.
Should provide a clean workspace for each test run - either a clean VM or container.
Should allow archiving build logs and possible artifacts for a long period, to allow extracting metrics from a long time period.
(NEW!) Should have rate limiting - one user/project can not take over most/all resources.
(NEW!) Should support validation and creation of GPG/PGP-signed git commits

Would be nice

(NEW!) Would be nice for Release Engineering team to not be a bottleneck: we should not be required to approve a change to how a job runs for a repo.
(NEW!) Would be nice for tests not to have to be written in shell script.
Would be nice for tests not to have to be written in Groovy script.
(NEW!) Would be nice for tests not (likely) be written in a "language" but an abstraction.
- The abstraction should be easily extended in a "real" language – more than shell, ideally.
Would be nice for test abstractions to limit boiler-plate, i.e., all of our services are tested roughly the same way.
Would be nice to prioritize jobs.
- Use case: if there is a queue of jobs, there should be some mechanism of jumping that queue for jobs that have a higher priority.
- We currently have a Gating queue that is a higher priority than periodic jobs that calculate Code Coverage.
Would be nice to support isolation / sandboxing.
- Jobs should be isolated from one another.
- Jobs should be able to install apt-packages without affecting dependencies of other jobs.
(NEW!) Would be nice to have configurable job requirements/affinity.
- Be able to schedule a job only on nodes that have at least X available disk space/ram/cpu/whatever OR try to schedule on nodes where a current build of this job isn't already running.
Would be nice to build artifacts suitable for production.
- Currently we only do container images in a limited way – nice to haves: deb packages, java jars, go binaries, packagist downloads.
Would be nice to make it easy for developers to recreate failures locally.
(NEW!) Would be nice to post-merge git-bisect to find patch that caused a particular problem with a Selenium test.
(NEW!) Would be nice to have a mechanism for deployment to staging, production, pypi, packagist, toollabs.
(NEW!) Would be nice to have efficient matrix builds.
- E.g., we currently run phpunit tests and browser tests for the Cartesian product of [PHP7 PHP7.1 PHP7.2 HHVM][MySQL, SQLite, PostgreSQL][Composer, MediaWiki vendor], but we preform setup/git clone for all of those tests. Doing that in a space and time efficient way would be good.
(NEW!) Developers should have an option to ssh to VM/container that CI used to run the tests for debugging.
(NEW!) Would be nice to support building and testing mobile applications (at minimum for iOS and Android).
(NEW!) Would be nice to be able to run for secret/security patches.

Wikimedia Release Engineering Team/CI Futures WG/Requirements

Contents

Very hard requirements

Hard requirements

Softer requirements

Would be nice