Wikimedia Release Engineering Team/Checkin archive/2024-03-13
2023-04-13
editπ Wins/winterrogation
edit- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
- Mar 2024
- Nightly security patch failures updating phabricator tasks merged, ready to release
- Merged deploys-in-progress reset script
- Two repos have patches for git-fat β git-lfs
- scap: replaced canary swagger checks with test server httpbb checks
- Phorge integration with GitLab in its third round of review
- GitLab webhooks also still going, looks like it'll go through
- People like scap backport - more patches, fewer things typed into terminals.
- Security patch notification now working!
- GitLab webhooks have a more accurate regex for "Bug: TXX"
- Foreachwiki in beta
- Getting rid of the /srv/mediawiki/php symlink
- Upgraded GitLab k8s/cloud cluster to new k8s version and documented the process
Stuff from last time
editπ Vacations/Important dates
edit- https://office.wikimedia.org/wiki/HR_Corner/Holiday_List#2024
- https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar
- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off (page needs updating for Dayforce)
Mar 2024
edit- 29 Feb, 1st Mar, 4th Mar - 8th Mar - Antoine
- 14 Marβ14 May: Dan
- 29 Mar: Brennen, Jeena
Apr 2024
edit- Mon 22 Apr: Global holiday, all staff
- 26 Apr: Brennen (tentative)
- Fri 05 AprβFri 12 Apr -- Tyler, eclipse viewing
May 2024
edit- Mon 27 May: Memorial Day (US staff with reqs)
Future
edit- A few days around July 4: Brennen
- 25 Aug - 03 Sep: Brennen
π₯π Train
edit- https://tools.wmflabs.org/versions/
- https://train-blockers.toolforge.org/
- https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar
Rotation
edit- 3 Dec β 1.42.0-wmf.8 β No Train offsite
- 11 Dec β 1.42.0-wmf.9 β Brennen + Antoine (Jaime out)
- 18 Dec β 1.42.0-wmf.10 β Ahmon + Brennen (Jaime out)
- 25 Dec β 1.42.0-wmf.11 β No Train
- 1 Jan β 1.42.0-wmf.12 β Dan + Ahmon (Jaime out)
- 8 Jan β 1.42.0-wmf.13 β Jeena + Dan (Jaime out)
- 15 Jan β 1.42.0-wmf.14 β Jaime + Jeena
- 22 Jan β 1.42.0-wmf.15 β Antoine + Jaime
- 29 Jan β 1.42.0-wmf.16 β Ahmon + Antoine(Brennen out WedβFri)
- 05 Feb β 1.42.0-wmf.17 β Brennen + Ahmon
- 12 Feb β 1.42.0-wmf.18 β Brennen+Antoine (Friday)
- 19 Feb β 1.42.0-wmf.19 β Jeena+Brennnen
- 26 Feb β 1.42.0-wmf.20 β Dan + Jeena
- 04 Mar β 1.42.0-wmf.21 β Jaime + Dan (Antoine out)
People for train: Ahmon, Antoine, Brennen, Jeena, Jaime
- 11 Mar β 1.42.0-wmf.22 β Antoine + Jaime (Dan out)
- 18 Mar β 1.42.0-wmf.23 β Ahmon + Antoine
- 25 Mar β 1.42.0-wmf.24 β Jeena + Ahmon
- 1 Apr β 1.42.0-wmf.25 β Jaime + Jeena
- 8 Apri β 1.42.0-wmf.26 β Antoine + Jaime
- 15 Apr β 1.42.0-wmf.27 β Ahmon + Antoine
- 22 Apr β 1.42.0-wmf.28 β Brennen + Ahmon (Global holiday Monday; Brennen out Friday)
Team Discussions
editAnnual planning
editMeta page: https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2024-2025/Goals
- How this works: Goals β Buckets β Objectives β KRs β Hypotheses
- Where our work fits: Infrastructure β WikiExperiences β WE6 Developer Services β WE6.2
- WE6:
Technical staff and volunteer developers have the tools they need to effectively support the Wikimedia projects
- WE6.2: By Q4, complete an intervention and run an experiment each aimed at providing maintainable, targeted environments to serve developers' high-priority testing needs
Experiment: the goal is we learn some things Intervention: we make a thing based on stuff we learned
WE6.2 Long version
editDevelopers and users depend on the Wikimedia Beta Cluster (beta) to catch bugs before they affect users. Over time, the uses of beta have grown and come into conflictβ-the uses are too diverse to fit in a single environment. We will perform one intervention and conduct one experiment each aimed at replacing a single high-priority testing need currently fulfilled by beta with a maintainable alternative environment that better serves each use case's needs.
Hypotheses-areas:
- Experiment: Group -1
- Intervention: Catalyst
Discussion of our hypotheis (alongside ServiceOps):
- Rollback faster
- Smaller, single-version images
- Wikiversions should be config rather than code (no deploy needed)
- Continuous deployment to test wikis
- ServiceOps open to the idea of testwikis being the victim here
- We don't know how caching works when it's updated every minute
- Social change here, working closely with developers to change expectations
- User interface challenges: ssh to server, lots of output to interpret, we can present things to be less-scary, web-ui would be really awesome
- What's scary about deploys now is what's happening in production and what do I have to do about it as a deployer?
- Logging and monitoring and alerts exposed in a way for developers to feel confident deploying themselves vs speeding up
- Something about making the summary of the state of production more visible
Framing that might make sense, post-discussion:
Hypothesis one: group -1
- Lots of work falling in ServiceOps, our work is building single-version images (+ wikiversion/mw/config work)
- Single version makes actual deployment faster
- Wikiversions outside of code means fewer deploys (makes deployment faster)
- Draft hypothesis: If we build a single version container image and experiment to move wiki-to-verison routing outside of code deployment, we'll
Hypothesis two: speeding time to deploy
- Lots of work in our team, little work in the ServiceOps space
- Making it less scary to deploy: rollback, web ui, giving deployers an easy way to see what's happening in production, making it obvious what to do about it