Wikimedia Release Engineering Team/Group -1

Group -1 is a multi-phase project that hopes to create an environment for both manual and automated integration and regression testing of MediaWiki changes in advance of those same changes progressing through the Deployment train process to the Wikimedia movement's content wikis like Wikidata and English Wikipedia.

The project has its roots in discussions about T215217: deployment-prep (beta cluster): Code stewardship request and the needs of the Wikimedia Quality and Test Engineering Team. The work is currently associated with the 2024-2025 Annual Plan's Wiki Experiences (WE) WE6 objective's WE6.2 key result:

By the end of Q4, enhance an existing project and perform at least two experiments aimed at providing maintainable, targeted environments moving us towards safe, semi-continuous delivery.

Hypotheses

edit

If we deploy MediaWiki multiple times per day into a contained area of production we will create the most "production like" environment for QTE staff to use for manual exploratory testing and automated regression checks prior to train deployment to major content wikis.

Implementation will progress via a series of smaller hypotheses which will eventually connect to realize the overarching hypothesis. This approach is being used to avoid attempting to track progress against a single "boil the ocean" hypothesis that could take a year or more to reach an easily measurable state.

[WE6.2.1] Publish pre-train single version containers

edit

If we publish a versioned build of MediaWiki, extensions, skins, and Wikimedia configuration at least once per day we will uncover new constraints and establish a baseline of wallclock time needed to perform a build.

Step 0 towards the long term goal of being capable of continuous delivery (CD) into production is being able to deliver faster than the current weekly train process. A daily process would be approximately 3-4 times faster than our current production delivery cadence. We currently envision our eventual capability goal as being able to deliver every 15 minutes. Setting the initial goal two orders of magnitude higher (once per 1440 minutes vs once per 15 minutes) will still expose us to a number of real-world constraints that are not addressed by current workflows. We expect to uncover more details about challenges that will arise in continuing to accelerate the pace of delivery without tipping too quickly into extreme difficulty that could endanger our ability to use an iterative development model.

We are explicitly not constraining where this publishing workflow will be measured at this time. We expect the SRE groups who will need to be involved in deploying into a wikikube environment to be occupied by other goals in the initial months of FY24/25. We are not currently certain what new capabilities would need to be produced to target the current beta cluster shared environment or that doing so would be of long term benefit to the goal, team, or projects. We do expect to deliver beyond a single user development environment and will be able to provide more details as design progresses narrowing the cone of uncertainty for the overall project.

Goals

edit

Workflow goals

edit
  • Enable testing of pre-train ("next") branches of MediaWiki, skins, and extensions in a stable environment where newly discovered defects are more likely to be the result of the next branches than problems with support services or configuration in the environment itself.

Technical goals

edit
  • Automate MediaWiki OCI image creation based on a timer or similar trigger.
  • Create an environment in the production network for running pre-train MediaWiki versions.
  • Automate MediaWiki OCI container deployment into the pre-train environment.
  • Enable overriding any staged wikiversions.json when scap is determining which MediaWiki versions need to be included in a container (allow, but do not require single version builds).
  • Enable overriding an in-container wikiversions.json with a hard coded MediaWiki version inside of the container.
  • Enable creation of MediaWiki containers from arbitrary staging directories so that a single deployment or CI server can be used to build as many variant containers as we find need for.

Out of scope

edit

Workflow out of scope

edit
  • Enabling testing of pre-release services and configuration not managed by scap is out of scope.

Technical out of scope

edit
  • Replacing the weekly train progression with continuous delivery to all wikis is out of scope.
  • Building an image for every commit to a Train deployed repo/submodule is out of scope.
  • Keeping deployment-prep working in the face of production's migration to Kubernetes and containers as the MediaWiki deployment and runtime solution is out of scope.
  • Building a chain of images starting from public files only to produce images that can be used outside of production is out of scope.
  • Reimagining how configuration is delivered into MediaWiki containers is out of scope.
  • Runtime support for single version images beyond minimum functionality needed to support building and operating Group -1 directly is out of scope.
edit

There are a number of active, planned, and imagined projects which have some intersectionality with the Group -1 concept and implementation. When possible we should try to avoid becoming a blocker to these projects. We should also avoid making systemic changes that will cause us future headaches we can foresee today.

  • WE6.2.5 Move multiversion routing outside of the MediaWiki containers to unblock single version containers
  • WE5.4.2 PHP runtime upgrade process in a containerized world

Unknowns / open questions

edit
  • Will QTE folks need any shell access or other special permissions to the containers running Group -1 wikis?
  • If we move existing wikis (testwiki, test2wiki, officewiki, mediawikiwiki, wikitech, etc) to Group -1, will mwscript whatever --wiki=movedwiki work transparently or will these wikis need to be addressed from a special place?

Reports

edit