Deployment tooling/Notes/Deployment system requirements

Requirements for the next iterations of deployment tooling projects at WMF

edit
Requirement MediaWiki Parsoid etc Elasticsearch
provision artifacts on multiple servers comprising a cluster MUST MUST MUST
manage publication of at least 2 versions of mediawiki-core MUST NON-ISSUE NON-ISSUE
keep old versions of mediawiki-core around to support bits servers (30 days?) MUST NON-ISSUE NON-ISSUE
manage l10n cache files on target servers MUST NON-ISSUE NON-ISSUE
support publication of patches to internal servers that cannot be shared publicly MUST MUST MUST
complete in a timely manner [1] MUST MUST NON-ISSUE
Time-to-complete for a deploy is proportional to magnitude of change MUST MUST NON-ISSUE
be able to update/provision a new server joining the cluster without forcing update of all other servers (server pull) MUST MUST MUST
handle rollbacks easily MUST MUST NON-ISSUE
handle downgrades (of packages) easily MUST MUST (including dependencies) NON-ISSUE
have equal or greater atomicity to rsync --delay-updates MUST ? ?
be able to lock (to prevent folks overlapping) MUST MUST SHOULD
support multiple datacenters MUST SHOULD SHOULD
have documentation for use and maintenance MUST MUST MUST
Handle security patches cleanly but separate from released versions MUST MUST MUST
support eventual consistency (servers update to latest version on reboot) MUST SHOULD ?
work well with Puppet-controlled config files (especially for ES and Parsoid) MUST MUST MUST
support a variable number of versions SHOULD ? ?
allow generation of caches and other content post-deploy on target servers SHOULD ? NON-ISSUE
prevent concurrent modification of source SHOULD ? ?
track versions installed on each target host SHOULD MUST MUST
record errors and informational events publicly SHOULD SHOULD SHOULD
record errors and informational events durably for use in troubleshooting SHOULD SHOULD SHOULD
allow "canary"/rolling deploys where a sub-set of the cluster is updated SHOULD MUST MUST
allow multiple production clusters (privates vs non-private, and wikitech.wmf getting out of date) SHOULD SHOULD SHOULD
rely on SSH agent forwarding SHOULD NOT SHOULD NOT SHOULD NOT
require root on the target hosts (privledge separation/access control) SHOULD NOT SHOULD NOT NON-ISSUE
allow multi-masters (especially for multi-datacenter) SHOULD SHOULD SHOULD
be easily auditable (e.g. verifying the git commit hash on the deploy host) SHOULD SHOULD SHOULD
  1. MW: 10 mins basic, 30 full i10n included
    Parsoid: 10 min
    ES:

Copy/paste requirements from the etherpad while table massaging...

Parsoid / Mathoid / Rashomon / PDF renderer deploys

edit
  • MUST make it easy to test packaging (init scripts, log rotation etc) outside the cluster -- debs? make scap/whatever be the thing that deploys beta cluster?
  • MUST use rolling deploys: not all nodes at once to avoid taking down the cluster
  • SHOULD allow non-roots to upgrade / downgrade / restart individual nodes for testing (ex: permissions by group membership)
  • SHOULD NOT create much additional overhead over normal debian packaging
  • SHOULD handle dependencies with system packages/libraries and other packages cleanly (including downgrades)
  • SHOULD be able to split packaging further (separate packages for library dependencies for example) -- support n versioned dependencies
  • SHOULD use known systems / avoid "unnecessary" complexity wherever possible
  • SHOULD make it easy for third parties to track regular (non-security) deployed code / be directly usable for third parties (Vagrant, labs, hosted VMs etc)
  • SHOULD support timely third-party security upgrades with systems like unattended-upgrades once the security release is published

ES deploys

edit

Deploys are rare, less than once a month. Deploys are time consuming due to rolling restarts of a data store. We don't have many folks with the expertise to recover from unexpected failure. I think of it more like MySQL then like MW or Parsoid.

  • MUST allow user to force Elasticsearch to be installed at the version on the rest of the Elasticsearch cluster
    • work around reprepo bug :)
  • MUST configure Elasticsearch settings
    • yaml files be stuff'd
  • MUST be able to provision a new server without restarting current servers
    • targetted deploys, canary deploys
  • MUST support deploying Elasticsearch plugins
  • MUST support ES plugin presence assurance
  • MUST verify that the Elasticsearch plugins are genuine (hash or something)
  • MUST NOT upgrade Elasticsearch without manual intervention
    • no "require => latest"
  • SHOULD be compatible with Elasticsearch's debian packages
  • SHOULD coordinate versions of Elasticsearch plugins deployed so they are compatible with Elasticsearch server
    • plugin compatibility matrix?
  • SHOULD NOT allow plugin undeployment. That'd break things.
  • NONISSUE locks aren't important because very few people are conserned with upgrading Elasticsearch and the upgrades themselves are pretty rare