Continuous integration/Data center switch

The core of the CI infrastructure is hosted on two production machines, one in each datacenter. Most services are active solely on one of the hosts, the other host acting as a cold spare. When doing hardware maintenance or operating system upgrades, we move the services and their data from an host to another one. This document describe the steps needed to do the swap.

Hosts and services Edit

We have two bare metal hosts and, one in each of our primary datacenters. They hosts a variety of services:

Switching over Edit

The general idea is to synchronize Jenkins files from the primary to the spare server before anything else. Once done the sequence overview is:

  • synchronize build artifacts
  • Stop all services on the primary
  • rsync data and states
  • change DNS for
  • change primary in Puppet / Hiera
  • Start Jenkins
  • Start Zuul scheduler

synchronize build artifacts Edit

This step should be made ahead of time since it takes hours to transfer.

The Jenkins builds history and their artifacts are solely on the primary Jenkins and located in /srv/jenkins/builds. It is in the magnitude of hundred of gigabytes and million of files and directories.

On the spare server, ensure /srv/jenkins/builds is empty.

TODO: check or MariaDB/ImportTableSpace.

Stop all services Edit

On the primary: systemctl stop jenkins systemctl stop zuul

rsync data and states Edit

Using rsync over ssh as root:

  • refresh /srv/jenkins/builds from the artifacts from the primary to the spare.

Transfer the Jenkins and Zuul states:

  • /var/lib/jenkins , jobs configurations, build indice, plugins etc
  • /var/lib/zuul/times , duration of functions execution used to speculate an ETA of each build

change DNS Edit

The Varnish/ATS layer points to the backend via the DNS entry which in turns point to the primary host.

change primary in Puppet / Hiera Edit

TODO: find the changes that need to happen. Ideally should just be a role change.

Start services Edit

On the new primary: systemctl stop jenkins systemctl stop zuul