Continuous integration/Docker

As of August 2017, Docker containers can be used in Jenkins jobs in Wikimedia CI. We hope that in the future, these images could run on a Kubernetes cluster instead.

OverviewEdit

Administrative tasks are currently handled solely by Jenkins. The containers you create should be self-sufficient and leave nothing behind, except logs. The behaviour of the container should rely only on environment variables provided by Jenkins (ZUUL_URL, ZUUL_REF, etc).

The Docker images for CI are published at docker-registry.wikimedia.org under the releng namespace.

These images are created from Dockerfiles in the integration/config.git repository.

Build images locallyEdit

docker-pkg is a python3 program that is used to build Docker images, with Jinja for additional templating.

Installing docker-pkgEdit

Clone the code from operations/docker-images/docker-pkg and install via pip3:

$ git clone ssh://gerrit.wikimedia.org:29418/operations/docker-images/docker-pkg
$ cd docker-pkg
$ pip3 install --local -e .

Clone the integration/config project:

$ git clone https://gerrit.wikimedia.org/r/integration/config

At this point, the docker-pkg command should be available. By default, it will build all images you don't yet have cached in your docker installation. To get started you have three options:

  1. Run it normally and let it build all images in the CI system (this may take several hours, and ± 40GB disk space).
  2. Or, download the latest versions of these images from wikimedia.org instead (if your connection is 20MB/s or better then this is probably faster, will download ± 40 GB).
  3. Or, use the --select when building images, for example just the image you're changing or wanting to test. The "select" option takes a GLOB parameter that applies to the full reference name of the image. For example "docker-registry.wikimedia.org/releng/node10-test:0.3.0". As such, the value node10-test would not match any image, but */node10-test:* would. Usage:
    $ docker-pkg -c dockerfiles/config.yaml build --select '*/mediawiki-tarball:*' dockerfiles/
    

Build the imagesEdit

This will scan the dockerfiles/ folder. For each one, it will find the last version tag in changelog, and then if you don't have that version present in your local Docker registry, it will build start building it from the Dockerfile.

$ cd path/to/integration/config
$ docker-pkg -c dockerfiles/config.yaml build dockerfiles

Example output:

== Step 0: scanning dockerfiles ==
Will build the following images:
* docker-registry.wikimedia.org/releng/ci-stretch:0.1.0
* docker-registry.wikimedia.org/releng/operations-puppet:0.1.0
* docker-registry.wikimedia.org/releng/ci-jessie:0.3.0
== Step 1: building images ==
=> Building image docker-registry.wikimedia.org/releng/ci-stretch:0.1.0
=> Building image docker-registry.wikimedia.org/releng/operations-puppet:0.1.0
=> Building image docker-registry.wikimedia.org/releng/ci-jessie:0.3.0
== Step 2: publishing ==
NOT publishing images as we have no auth setup
== Build done! ==
You can see the logs at ./docker-pkg-build.log

TroubleshootingEdit

Could not find a suitable TLS CA certificate bundleEdit

The following error is known to affect macOS: gerrit:500417

ERROR - Could not load image in integration/config/dockerfiles/…: Could not find a suitable TLS CA certificate bundle, invalid path: /etc/ssl/certs/ca-certificates.crt (builder.py:244)
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 228, in cert_verify
    "invalid path: {}".format(cert_loc))
OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /etc/ssl/certs/ca-certificates.crt

To workaround this issue, run the following in your Terminal window before running a docker-pkg command:

$ export REQUESTS_CA_BUNDLE=;
$ docker-pkg …

No images are builtEdit

If you don't see any output in between "Step 1: building images" and "Step 2: publishing", this means docker-pkg did not find any images or did not find images that have unbuilt newer versions. Review the following:

  • "git status" in integration/config should show a change to the changelog file for the image you are updating.
  • Make sure that the name used in the changelog file is correct, and matches your intended image name.
  • Look for any errors in "docker-pkg-build.log".
  • Make sure that you ran "docker-pkg -c dockerfiles/config.yaml dockerfiles" and not "docker-pkg -c dockerfiles/config.yaml dockerfiles/path-to-image"; docker-pkg will figure out which images to build by detecting modifications to the changelog.

Manifest not foundEdit

When adding a new version of an image, and also incrementing versions of dependant images, you may encounter the following error:

Build failed: manifest for example/my-parent-image:0.4.0 not found

This happens because docker-pkg (by default) fetches parent images from wikimedia.org and only rebuilds the child locally. If you are updating a parent from 0.3.0 to 0.4.0 and also incrementing the child images' versions, the parent will update fine, but then it will fail to build the children, despite the newer versions existing locally.

To mitigate this, pass --no-pull. Like so:

docker-pkg --no-pull -c dockerfiles/config.yaml dockerfiles

Adjusting an imageEdit

Sometimes you need to edit an image, e.g. to add a new dependency or to update an existing one.

To do this, make your changes to the image's Dockerfile.template file, and then run the following command:

$ docker-pkg \
   -c dockerfiles/config.yaml \
   --info \
   update \
     --reason "Briefly explain your change" \
     --version NewImageVersion \
   ImageYou'reChanging \
   dockerfiles/

This will add a properly-formatted entry in the changelog of the image you're changing, and all dependent images. You can then locally build the images to check that they build correctly, and the debug-image command to check that it works as intended. Once you're happy with your fix, bundle the changes into a git commit for review and merging.

Download imagesEdit

To download any and all CI Docker images from wikimedia.org not yet in your local registry, run the following command (source):

$ cd integration/config
$ ack -o -h -s 'docker-registry.*:[.\d]+' jjb/ | sort | uniq | xargs -n1 docker pull

Manage local imagesEdit

List local images:

$ docker images

Remove local images from wikimedia.org (source):

$ docker rmi $(docker images --format '{{.Repository}}:{{.Tag}}' | grep 'wikimedia.org')

Deploy imagesEdit

Deploying a change to CI Dockerfiles requires shell access to the Docker registry on contint1001.wikimedia.org (shell group: contint-docker). Ask Release Engineering team for help :).

The change to integration/config repository should first be merged in Gerrit.

After that, deploy it to the CI infrastructure. To do this, in the integration/config directory, run: ./fab deploy_docker. This connects to the cont1001.wikimedia.org server and instruct it to build newer versions of Docker images in integration/config.

Testing new imagesEdit

Test an image locallyEdit

Use the below steps to test a docker image locally. This can be unpublished image you've built locally with docker-pkg, or one that was pulled from the wikimedia.org repository.

Note that the below uses urls for the names of the images, but these refer to the ones you have locally (either created or pulled), they do not need to have been deployed or uploaded there yet. You can list the images you have locally using the docker images command.

$ cd my-gerrit-project
$ mkdir -m 777 cache log
$ docker run \
    --rm --tty \
    --volume /"$(pwd)"/log://var/lib/jenkins/log \
    --volume /"$(pwd)"/cache://cache \
    --volume /"$(pwd)"://src \
    docker-registry.wikimedia.org/releng/node10-test:0.3.0

Debug an image locallyEdit

The debug-image script can be used to run a RelEng docker image locally.

$ cd integration-config/dockerfiles
$ ./debug-image node10-test

nobody@bfee2a999b20:/src$ 

The default behaviour for docker run is to start the container and execute the entrypoint/cmd specified in the Dockerfile. To inspect the container instead, specify -i to make it interactive, and override --entrypoint to a shell (such as /bin/bash). For example:

$ cd my-gerrit-project/
$ docker run \
    --rm --tty \
    --interactive --entrypoint /bin/bash \
    docker-registry.wikimedia.org/releng/node10-test:0.3.0

nobody@5f4cdb0ab167:/src$
nobody@5f4cdb0ab167:/src$ env
CHROMIUM_FLAGS=--no-sandbox
XDG_CACHE_HOME=/cache

Test an image in CIEdit

Once the new image is pushed to docker hub it should be tested on one of the integration-agent-docker-100x machines. As of August 2017 there are 4 such machines: integration-agent-docker-100[1:4].

To test

  1. ssh to one of the integration-agent-docker machines and su to the jenkins-deploy user.
    you@laptop:~$ ssh integration-agent-docker-1004
    you@integration-agent-docker:~$ sudo su - jenkins-deploy
    
  2. Create a new directory and an environment file that contains the information passed from Jenkins in the form of ZUUL_* variables
    jenkins-deploy@integration-agent-docker:~$ mkdir docker-test && cd docker-test
    jenkins-deploy@integration-agent-docker:docker-test$ printf "ZUUL_PROJECT=operations/puppet\nZUUL_URL=git://contint2001.wikimedia.org\nZUUL_REF=refs/zuul/production/Ze59ae894f02248d9888835dbaa14dfdf\nZUUL_COMMIT=045fcb14e9fd7885957d900b9a97c883fc5cd26d\n" > .env
    
  3. Run the new docker image with the environment file and ensure that it runs correctly
    jenkins-deploy@integration-agent-docker:docker-test$ mkdir log
    jenkins-deploy@integration-agent-docker:docker-test$ docker run --rm -it --env-file .env --volume "$(pwd)"/log:/var/lib/jenkins/log contint/operations-puppet
    
  4. If everything is working as anticipated, update JJB with the Dockerfile version that has been pushed to the Wikimedia Docker registry.


Jenkins AgentEdit

To create an additional Jenkins node that can run Docker-based Jenkins jobs.

  • Create a new VM instance in Horizon with a name following the pattern 'integration-agent-docker-100X'.
  • Wait for the first puppet run to complete and log in.
  • Run the following to finish switching to the integration puppet master:
sudo rm -fR /var/lib/puppet/ssl
sudo mkdir -p /var/lib/puppet/client/ssl/certs
sudo puppet agent -tv
sudo cp /var/lib/puppet/ssl/certs/ca.pem /var/lib/puppet/client/ssl/certs
sudo puppet agent -tv
  • Add the 'role::ci::slave::labs::docker' class to the instance in horizon
    • For larger instance types (m1.xlarge and bigram) specify true for the docker_lvm_volume parameter.
  • Run a final update for puppet 'sudo puppet agent -tv'
  • Pull an initial set of docker images onto the host (using latest tags) to avoid doing this in test runs:
sudo docker pull docker-registry.wikimedia.org/releng/castor:latest
sudo docker pull docker-registry.wikimedia.org/releng/quibble-stretch:latest
sudo docker pull docker-registry.wikimedia.org/wikimedia-stretch:latest
  • Add the agent in the jenkins UI