Deployment tooling/Notes/Deployment system requirements
This page is currently a draft.
|
Requirements for the next iterations of deployment tooling projects at WMF
editRequirement | MediaWiki | Parsoid etc | Elasticsearch |
provision artifacts on multiple servers comprising a cluster | MUST | MUST | MUST |
manage publication of at least 2 versions of mediawiki-core | MUST | NON-ISSUE | NON-ISSUE |
keep old versions of mediawiki-core around to support bits servers (30 days?) | MUST | NON-ISSUE | NON-ISSUE |
manage l10n cache files on target servers | MUST | NON-ISSUE | NON-ISSUE |
support publication of patches to internal servers that cannot be shared publicly | MUST | MUST | MUST |
complete in a timely manner [1] | MUST | MUST | NON-ISSUE |
Time-to-complete for a deploy is proportional to magnitude of change | MUST | MUST | NON-ISSUE |
be able to update/provision a new server joining the cluster without forcing update of all other servers (server pull) | MUST | MUST | MUST |
handle rollbacks easily | MUST | MUST | NON-ISSUE |
handle downgrades (of packages) easily | MUST | MUST (including dependencies) | NON-ISSUE |
have equal or greater atomicity to rsync --delay-updates | MUST | ? | ? |
be able to lock (to prevent folks overlapping) | MUST | MUST | SHOULD |
support multiple datacenters | MUST | SHOULD | SHOULD |
have documentation for use and maintenance | MUST | MUST | MUST |
Handle security patches cleanly but separate from released versions | MUST | MUST | MUST |
support eventual consistency (servers update to latest version on reboot) | MUST | SHOULD | ? |
work well with Puppet-controlled config files (especially for ES and Parsoid) | MUST | MUST | MUST |
support a variable number of versions | SHOULD | ? | ? |
allow generation of caches and other content post-deploy on target servers | SHOULD | ? | NON-ISSUE |
prevent concurrent modification of source | SHOULD | ? | ? |
track versions installed on each target host | SHOULD | MUST | MUST |
record errors and informational events publicly | SHOULD | SHOULD | SHOULD |
record errors and informational events durably for use in troubleshooting | SHOULD | SHOULD | SHOULD |
allow "canary"/rolling deploys where a sub-set of the cluster is updated | SHOULD | MUST | MUST |
allow multiple production clusters (privates vs non-private, and wikitech.wmf getting out of date) | SHOULD | SHOULD | SHOULD |
rely on SSH agent forwarding | SHOULD NOT | SHOULD NOT | SHOULD NOT |
require root on the target hosts (privledge separation/access control) | SHOULD NOT | SHOULD NOT | NON-ISSUE |
allow multi-masters (especially for multi-datacenter) | SHOULD | SHOULD | SHOULD |
be easily auditable (e.g. verifying the git commit hash on the deploy host) | SHOULD | SHOULD | SHOULD |
- ↑ MW: 10 mins basic, 30 full i10n included
Parsoid: 10 min
ES:
Copy/paste requirements from the etherpad while table massaging...
Parsoid / Mathoid / Rashomon / PDF renderer deploys
edit- MUST make it easy to test packaging (init scripts, log rotation etc) outside the cluster -- debs? make scap/whatever be the thing that deploys beta cluster?
- MUST use rolling deploys: not all nodes at once to avoid taking down the cluster
- SHOULD allow non-roots to upgrade / downgrade / restart individual nodes for testing (ex: permissions by group membership)
- SHOULD NOT create much additional overhead over normal debian packaging
- SHOULD handle dependencies with system packages/libraries and other packages cleanly (including downgrades)
- SHOULD be able to split packaging further (separate packages for library dependencies for example) -- support n versioned dependencies
- SHOULD use known systems / avoid "unnecessary" complexity wherever possible
- SHOULD make it easy for third parties to track regular (non-security) deployed code / be directly usable for third parties (Vagrant, labs, hosted VMs etc)
- SHOULD support timely third-party security upgrades with systems like unattended-upgrades once the security release is published
ES deploys
editDeploys are rare, less than once a month. Deploys are time consuming due to rolling restarts of a data store. We don't have many folks with the expertise to recover from unexpected failure. I think of it more like MySQL then like MW or Parsoid.
- MUST allow user to force Elasticsearch to be installed at the version on the rest of the Elasticsearch cluster
- work around reprepo bug :)
- MUST configure Elasticsearch settings
- yaml files be stuff'd
- MUST be able to provision a new server without restarting current servers
- targetted deploys, canary deploys
- MUST support deploying Elasticsearch plugins
- MUST support ES plugin presence assurance
- MUST verify that the Elasticsearch plugins are genuine (hash or something)
- MUST NOT upgrade Elasticsearch without manual intervention
- no "require => latest"
- SHOULD be compatible with Elasticsearch's debian packages
- SHOULD coordinate versions of Elasticsearch plugins deployed so they are compatible with Elasticsearch server
- plugin compatibility matrix?
- SHOULD NOT allow plugin undeployment. That'd break things.
- NONISSUE locks aren't important because very few people are conserned with upgrading Elasticsearch and the upgrades themselves are pretty rare