Wikimedia Release Engineering Team/Checkin archive/2024-03-13


2023-04-13

edit

πŸ† Wins/winterrogation

edit
https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
Mar 2024
  • Nightly security patch failures updating phabricator tasks merged, ready to release
  • Merged deploys-in-progress reset script
  • Two repos have patches for git-fat β†’ git-lfs
  • scap: replaced canary swagger checks with test server httpbb checks
  • Phorge integration with GitLab in its third round of review
  • GitLab webhooks also still going, looks like it'll go through
  • People like scap backport - more patches, fewer things typed into terminals.
  • Security patch notification now working!
  • GitLab webhooks have a more accurate regex for "Bug: TXX"
  • Foreachwiki in beta
  • Getting rid of the /srv/mediawiki/php symlink
  • Upgraded GitLab k8s/cloud cluster to new k8s version and documented the process

Stuff from last time

edit

πŸ“… Vacations/Important dates

edit
https://office.wikimedia.org/wiki/HR_Corner/Holiday_List#2024
https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar
https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off (page needs updating for Dayforce)

Mar 2024

edit
  • 29 Feb, 1st Mar, 4th Mar - 8th Mar - Antoine
  • 14 Mar–14 May: Dan
  • 29 Mar: Brennen, Jeena

Apr 2024

edit
  • Mon 22 Apr: Global holiday, all staff
  • 26 Apr: Brennen (tentative)
  • Fri 05 Apr–Fri 12 Apr -- Tyler, eclipse viewing

May 2024

edit
  • Mon 27 May: Memorial Day (US staff with reqs)

Future

edit
  • A few days around July 4: Brennen
  • 25 Aug - 03 Sep: Brennen

πŸ”₯πŸš‚ Train

edit
https://tools.wmflabs.org/versions/
https://train-blockers.toolforge.org/
https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar

Rotation

edit
  • 3 Dec – 1.42.0-wmf.8 – No Train offsite
  • 11 Dec – 1.42.0-wmf.9 – Brennen + Antoine (Jaime out)
  • 18 Dec – 1.42.0-wmf.10 – Ahmon + Brennen (Jaime out)
  • 25 Dec – 1.42.0-wmf.11 – No Train
  • 1 Jan – 1.42.0-wmf.12 – Dan + Ahmon (Jaime out)
  • 8 Jan – 1.42.0-wmf.13 – Jeena + Dan (Jaime out)
  • 15 Jan – 1.42.0-wmf.14 – Jaime + Jeena
  • 22 Jan – 1.42.0-wmf.15 – Antoine + Jaime
  • 29 Jan – 1.42.0-wmf.16 – Ahmon + Antoine(Brennen out Wed–Fri)
  • 05 Feb – 1.42.0-wmf.17 – Brennen + Ahmon
  • 12 Feb – 1.42.0-wmf.18 – Brennen+Antoine (Friday)
  • 19 Feb – 1.42.0-wmf.19 – Jeena+Brennnen
  • 26 Feb – 1.42.0-wmf.20 – Dan + Jeena
  • 04 Mar – 1.42.0-wmf.21 – Jaime + Dan (Antoine out)
People for train: Ahmon, Antoine, Brennen, Jeena, Jaime
  • 11 Mar – 1.42.0-wmf.22 – Antoine + Jaime (Dan out)
  • 18 Mar – 1.42.0-wmf.23 – Ahmon + Antoine
  • 25 Mar – 1.42.0-wmf.24 – Jeena + Ahmon
  • 1 Apr – 1.42.0-wmf.25 – Jaime + Jeena
  • 8 Apri – 1.42.0-wmf.26 – Antoine + Jaime
  • 15 Apr – 1.42.0-wmf.27 – Ahmon + Antoine
  • 22 Apr – 1.42.0-wmf.28 – Brennen + Ahmon (Global holiday Monday; Brennen out Friday)

Team Discussions

edit

Annual planning

edit

Meta page: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2024-2025/Goals

  • How this works: Goals β†’ Buckets β†’ Objectives β†’ KRs β†’ Hypotheses
  • Where our work fits: Infrastructure β†’ WikiExperiences β†’ WE6 Developer Services β†’ WE6.2
  • WE6:

Technical staff and volunteer developers have the tools they need to effectively support the Wikimedia projects

    • WE6.2: By Q4, complete an intervention and run an experiment each aimed at providing maintainable, targeted environments to serve developers' high-priority testing needs

Experiment: the goal is we learn some things Intervention: we make a thing based on stuff we learned

WE6.2 Long version

edit

Developers and users depend on the Wikimedia Beta Cluster (beta) to catch bugs before they affect users. Over time, the uses of beta have grown and come into conflictβ€”-the uses are too diverse to fit in a single environment. We will perform one intervention and conduct one experiment each aimed at replacing a single high-priority testing need currently fulfilled by beta with a maintainable alternative environment that better serves each use case's needs.

Hypotheses-areas:

  • Experiment: Group -1
  • Intervention: Catalyst

Discussion of our hypotheis (alongside ServiceOps):

  • Rollback faster
  • Smaller, single-version images
  • Wikiversions should be config rather than code (no deploy needed)
  • Continuous deployment to test wikis
    • ServiceOps open to the idea of testwikis being the victim here
    • We don't know how caching works when it's updated every minute
    • Social change here, working closely with developers to change expectations
    • User interface challenges: ssh to server, lots of output to interpret, we can present things to be less-scary, web-ui would be really awesome
    • What's scary about deploys now is what's happening in production and what do I have to do about it as a deployer?
    • Logging and monitoring and alerts exposed in a way for developers to feel confident deploying themselves vs speeding up
      • Something about making the summary of the state of production more visible

Framing that might make sense, post-discussion:

Hypothesis one: group -1

  • Lots of work falling in ServiceOps, our work is building single-version images (+ wikiversion/mw/config work)
  • Single version makes actual deployment faster
  • Wikiversions outside of code means fewer deploys (makes deployment faster)
  • Draft hypothesis: If we build a single version container image and experiment to move wiki-to-verison routing outside of code deployment, we'll

Hypothesis two: speeding time to deploy

  • Lots of work in our team, little work in the ServiceOps space
  • Making it less scary to deploy: rollback, web ui, giving deployers an easy way to see what's happening in production, making it obvious what to do about it