Wikimedia Release Engineering Team/Group -1/Progress reports/2024-10-31

← week ending 2024-10-24 Group -1 progress reports week ending 2024-10-31

Report on activities in the Group -1 project for the week ending 2024-10-31.

[WE6.2.1] Publish pre-train single version containers

edit

If we publish a versioned build of MediaWiki, extensions, skins, and Wikimedia configuration at least once per day we will uncover new constraints and establish a baseline of wallclock time needed to perform a build.

Progress update
  • Stage: Complete
Confirm whether the hypothesis was supported or contradicted
This hypothesis is supported by our implementation and analysis.
  • We are publishing a new build daily.
  • We have established that the baseline wallclock time needed to perform this build and publish step is 64 minutes.
  • The implemented job pipeline will continue to run once per day for the foreseeable future. Each successful run will provide additional data on wallclock runtime. Each failed run will guide us towards identifying operational challenges for the pipeline.
Briefly describe what was accomplished during the hypothesis work
  • We defined initial workflow and technical goals for the overarching "Group -1" project
  • We implemented a build pipeline via Jenkins jobs that:
    • creates or updates a wmf/next branch in MediaWiki core, each extension, and each skin deployed to Wikimedia wikis
    • updates submodules in the MediaWiki core repo's wmf/next branch to point to the equivalent extension and skin branches
    • builds an OCI container image from the MediaWiki core repo's wmf/next plus configuration, security patches, and production secrets
    • publishes the new image to the Wikimedia production container registry
Technical goals achieved
  • Automate MediaWiki OCI image creation based on a timer or similar trigger.
  • Enable overriding any staged wikiversions.json when scap is determining which MediaWiki versions need to be included in a container (allow, but do not require single version builds).
  • Enable overriding an in-container wikiversions.json with a hard coded MediaWiki version inside of the container.
  • Enable creation of MediaWiki containers from arbitrary staging directories so that a single deployment or CI server can be used to build as many variant containers as we find need for.
Project artifacts
Metrics related to the hypothesis
  • MediaWiki branch and publish WMF single-version image job average wallclock runtime: 41 minutes
  • MediaWiki publish WMF single-version image job average wallclock runtime: 23 minutes
  • Total average pipeline runtime: 64 minutes
Outline the major lessons learned
  • This is more reinforced knowledge than a new lesson, but the fortnightly IC level sync meeting between folks from Release Engineering and SRE Service Operations that was established to support the MediaWiki on Kubernetes project continues to provide good value.
  • Raising perceived risks in weekly reports seems to help start conversations as hoped.
  • Runtime permissions differences for scap in various environments (local development, releases-jenkins, deployment.eqiad.wmnet, etc) and with various calling users complicates feature development.
  • "Gate-and-submit" tests are the major wall clock time cost for updating the wmf/next branch
Next steps
We expect to follow up with additional hypothesis work in the remaining weeks of FY24/25 Q2 to complete strategic and high level tactical planning focused on reaching the major project goal of creating the most "production like" environment for QTE staff to use for testing and automated regression checks prior to train deployment to major content wikis.