Wikimedia Release Engineering Team/SpiderPig/Technical design document

“SpiderPig” will be a web-based user interface for deploying MediaWiki at Wikimedia. It is intended to be a user-friendly front-end for our command-line-only deployment tool, scap.

This work is part of WE6.2.3:

If we create a new deployment UI that provides more information to the deployer and reduce the amount of privilege needed to do deployment, it will make deployment easier and open deployments to more users as measured by the number of unique deployers and number of patches backported as a percentage of our overall deployments.

Goals and non-goals

edit

For the first phase of development, SpiderPig will perform the actions of the scap backport command, syncing MediaWiki code changes from our deployment servers to WikiKube and a small number of bare-metal MediaWiki servers.

Goals

edit
  • Simplify deployment interface
    • Provide deployers a simplified and automated experience
    • Deployer is required to perform the same tasks as when they attend a backport window:
      • Deployer handles: specify one or more patches for deployment, checking and confirming the changes on a test server which receives no user traffic, monitoring the user interface for any problems surfaced in the tool.
      • SpiderPig handles: merging patch(es), deployment to test servers, automated tests, prompting users for action, final deployment to production, rollback if requested by users.
  • Provide actionable feedback to deployers
    • Provide a button for deployers to indicate that their patch is ready to be deployed and they can proceed.
    • Prompt deployers when patches are on test servers.
    • Prompt deployers when there is a significant change in error rate.
    • Provide a means to rollback a change.
    • Provide a visual indication that a deployment is underway.
    • Provide a button for deployers to indicate that their deployment has completed successfully, unlocking deployment for other patchsets/deployers.
    • Display some simplified monitoring of the production environment during the deployment—a view of the error rate and other production metrics relevant to deployers.
  • Secure web-based tool
    • Use a secure method to authenticate users, integrate with existing Wikimedia authentication mechanisms.
    • Ensure users provide multi-factor authentication.
    • Ensure only authorized users are able to use the tooling.
    • Provide administrative means to immediately logout users and lock deployments from the tool.
    • Follow security best practices for preventing escalation of privilege and remote code execution (RCE) outside of the scope of the tool.
    • Protect secrets stored on the deployment server and within MediaWiki from accidental exposure (e.g., copy-pasting to a public phab task).
  • Support multiple concurrent users
    • Multiple users should be able to log into this tool at the same time.
    • Users should be able to request a deployment during another deployment, queuing their change until the previous deployments have finished.
  • Logging and monitoring
    • SpiderPig should keep high-fidelity logs of deployments.
    • Scap will continue to send logs to logstash
    • Additionally, SpiderPig will keep logs to send to deployers, these will need to be logrotated.

Non-goals

edit
  • Replace other deployment tools
    • SpiderPig is a wrapper for the command line scap tool rather than a replacement.
    • SpiderPig will not replace helm.
  • Replace monitoring tools
    • SpiderPig will include monitoring information from other tools, but is not intended to replace these tools.
  • Advanced deployment scenarios
    • For the first phase of development, SpiderPig will not provide access to lower-level scap subcommands that are infrequently used by advanced deployers; e.g., scap sync-wikiversions.
  • Support non-MediaWiki deployment; i.e., “scap deploy”
    • For the first phase of development, scap will not support service deployments for tools like Phabricator and Gerrit.

Design

edit

Necessary components

edit
  • DNS subdomain: deploy.wikimedia.org
  • TLS termination and certificate
  • Python wsgi daemon on deployment server
    • Monitoring, logging, logrotation
  • Websocket support
    • Traffic server mapping rules to route websocket requests
  • IDP integration
  • Multifactor authentication mechanism
    • Time-based one-time password
  • Data storage for queue, logs and deployment history.

User flow

edit
  1. User authenticates with IDP and with a second factor
  2. User passes Gerrit patch number(s) to be deployed
  3. SpiderPig queues the patch for processing and tells the user their place in the queue, continually updated
  4. Once the patch is up in the queue, SpiderPig asks if the user is ready to deploy
  5. If the user is ready to deploy (within the timeout period), SpiderPig executes scap backport on the deploy server
  6. Code is synchronized to test servers.
  7. Smoke tests (httpbb) executes on the test servers. If it fails, rollback test servers.
  8. Once the patch reaches the test server, the user is prompted to continue yes/no; if no, rollback.
  9. Code is synchronized to canary servers.
  10. Canary error rate check is executed on the deployment host, checking logstash. If the error rate spikes, rollback canary and test servers.
  11. Roll out to all of production.
  12. During production rollout, the log of the backport operation is streamed to the user. Best effort is made to mask security patch information to protect against copy-paste of logs to public forums. Full fidelity logs for the deployment are saved elsewhere, via scap’s logging mechanism (saves to logstash).
  13. After deploy, deployer is prompted to “finish the deployment”, freeing the next item in queue to deploy

Admin capabilities

edit

For the initial phase, a command line admin interface may suffice.

Admins should be able to:

  • Freeze deployments
  • Ban a SpiderPig user and log out a specific users
  • Logout all SpiderPig users
  • View and edit the queue

Web terminal view

edit

Web terminal view streams the output of scap commands to the browser of the user.

The web terminal consists of two components:

  • Server side:
    • Flask-SocketIO
  • Client side:
    • socket.io JS
    • Xterm.js

The “scap spiderpig” daemon listens for WebSocket connections at the /socket.io endpoint. From there a bi-directional connection is established. SpiderPig emits log lines through the WebSocket connection and these loglines are interpreted via Xterm.js.

Cross-cutting concerns

edit
  • Security: need to review web application before deployment
  • Observability: we’ll need to check in about including dashboard data
  • ServiceOps: consultation through design, build, and deployment phases
  • Infrastructure foundations: need to register an application to use idp