Deployment tooling/Notes/Deployment system requirements

Requirements for the next iterations of deployment tooling projects at WMF

Requirement	MediaWiki	Parsoid etc	Elasticsearch
provision artifacts on multiple servers comprising a cluster	MUST	MUST	MUST
manage publication of at least 2 versions of mediawiki-core	MUST	NON-ISSUE	NON-ISSUE
keep old versions of mediawiki-core around to support bits servers (30 days?)	MUST	NON-ISSUE	NON-ISSUE
manage l10n cache files on target servers	MUST	NON-ISSUE	NON-ISSUE
support publication of patches to internal servers that cannot be shared publicly	MUST	MUST	MUST
complete in a timely manner ^[1]	MUST	MUST	NON-ISSUE
Time-to-complete for a deploy is proportional to magnitude of change	MUST	MUST	NON-ISSUE
be able to update/provision a new server joining the cluster without forcing update of all other servers (server pull)	MUST	MUST	MUST
handle rollbacks easily	MUST	MUST	NON-ISSUE
handle downgrades (of packages) easily	MUST	MUST (including dependencies)	NON-ISSUE
have equal or greater atomicity to rsync --delay-updates	MUST	?	?
be able to lock (to prevent folks overlapping)	MUST	MUST	SHOULD
support multiple datacenters	MUST	SHOULD	SHOULD
have documentation for use and maintenance	MUST	MUST	MUST
Handle security patches cleanly but separate from released versions	MUST	MUST	MUST
support eventual consistency (servers update to latest version on reboot)	MUST	SHOULD	?
work well with Puppet-controlled config files (especially for ES and Parsoid)	MUST	MUST	MUST
support a variable number of versions	SHOULD	?	?
allow generation of caches and other content post-deploy on target servers	SHOULD	?	NON-ISSUE
prevent concurrent modification of source	SHOULD	?	?
track versions installed on each target host	SHOULD	MUST	MUST
record errors and informational events publicly	SHOULD	SHOULD	SHOULD
record errors and informational events durably for use in troubleshooting	SHOULD	SHOULD	SHOULD
allow "canary"/rolling deploys where a sub-set of the cluster is updated	SHOULD	MUST	MUST
allow multiple production clusters (privates vs non-private, and wikitech.wmf getting out of date)	SHOULD	SHOULD	SHOULD
rely on SSH agent forwarding	SHOULD NOT	SHOULD NOT	SHOULD NOT
require root on the target hosts (privledge separation/access control)	SHOULD NOT	SHOULD NOT	NON-ISSUE
allow multi-masters (especially for multi-datacenter)	SHOULD	SHOULD	SHOULD
be easily auditable (e.g. verifying the git commit hash on the deploy host)	SHOULD	SHOULD	SHOULD

↑ MW: 10 mins basic, 30 full i10n included
Parsoid: 10 min
ES:

Copy/paste requirements from the etherpad while table massaging...

Parsoid / Mathoid / Rashomon / PDF renderer deploys

MUST make it easy to test packaging (init scripts, log rotation etc) outside the cluster -- debs? make scap/whatever be the thing that deploys beta cluster?
MUST use rolling deploys: not all nodes at once to avoid taking down the cluster
SHOULD allow non-roots to upgrade / downgrade / restart individual nodes for testing (ex: permissions by group membership)
SHOULD NOT create much additional overhead over normal debian packaging
SHOULD handle dependencies with system packages/libraries and other packages cleanly (including downgrades)
SHOULD be able to split packaging further (separate packages for library dependencies for example) -- support n versioned dependencies
SHOULD use known systems / avoid "unnecessary" complexity wherever possible
SHOULD make it easy for third parties to track regular (non-security) deployed code / be directly usable for third parties (Vagrant, labs, hosted VMs etc)
SHOULD support timely third-party security upgrades with systems like unattended-upgrades once the security release is published

ES deploys

Deploys are rare, less than once a month. Deploys are time consuming due to rolling restarts of a data store. We don't have many folks with the expertise to recover from unexpected failure. I think of it more like MySQL then like MW or Parsoid.

MUST allow user to force Elasticsearch to be installed at the version on the rest of the Elasticsearch cluster
- work around reprepo bug :)
MUST configure Elasticsearch settings
- yaml files be stuff'd
MUST be able to provision a new server without restarting current servers
- targetted deploys, canary deploys
MUST support deploying Elasticsearch plugins
MUST support ES plugin presence assurance
MUST verify that the Elasticsearch plugins are genuine (hash or something)
MUST NOT upgrade Elasticsearch without manual intervention
- no "require => latest"
SHOULD be compatible with Elasticsearch's debian packages
SHOULD coordinate versions of Elasticsearch plugins deployed so they are compatible with Elasticsearch server
- plugin compatibility matrix?
SHOULD NOT allow plugin undeployment. That'd break things.
NONISSUE locks aren't important because very few people are conserned with upgrading Elasticsearch and the upgrades themselves are pretty rare

[1] MW: 10 mins basic, 30 full i10n included
Parsoid: 10 min
ES:

[1]