Wikimedia Release Engineering Team/Checkin archive/2023-09-06


edit

πŸ† Wins

edit
Aug '23 recap
https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
Sep '23 edition
  • Image published for Blubber that is native LLB, no dockerfile anymore
    • implications
      • dockerfile is unnecessary since no one sees the dockerfileβ€”we can customize each llb instruction and what it displays to the users: a name that corresponds to the blubber.yaml config
      • now we have the ability to create our own instructions
      • dockerfile2llb gone! No more external helper images that haven't been maintained just to copy files aroundβ€”no more cross-platform compatibility/emulation issues
      • llb gets new stuff firstβ€”ex: diffop/mergeop https://www.docker.com/blog/mergediff-building-dags-more-efficiently-and-elegantly/
  • Phorge working on the scap3∞ deployment environment
  • Landed 3 upstream phorge patches, 1 is one we've had for years the blocks some tasks rendering (T284397)
  • Patch for T&S could outputs the MediaWiki SUL account along with the phab username (T344303)

OKR update

edit

Last week

edit

The six questions I answer week-by-week about our work. This is pretty much all CTPO/VP/Director-types see for what we're doing. If there are specific things to call out here, let's do.

On track

  • Progress update on the hypothesis for the week
    • T345000 – Create a separate memory optimized GitLab runner pool for memory hungry jobs. We created a cpu-optimized and memory-optimized GitLab runners this week
  • In the process tweaked the size of our staging cluster to save cost
    • T300819 – Created UI to make stacked merge requests clearer (upstream)
    • T337570#9133281 – Local Gems for our GitLab instance in testing on our devtools instanceβ€” hopefully enables lots of UI customization.
  • Any new metrics related to the hypothesis
    • Repositories on Gerrit decreased (2022 last week β†’ 2020 this week)
  • Any emerging blockers or risks
    • Reached out/set up conversations about pulling apart/scheduling migrations of repos (for T344739 – Old Platform Team projects + T344733 – Metrics Platform as I believe they're unblocked)
  • Any unresolved dependencies - do you depend on another team that hasn’t already given you what you need? Are you on the hook to give another team something you aren’t able to give right now?
    • No
  • Have there been any new learnings from the hypothesis?
    • No
  • Are you working on anything else outside of this hypothesis? If so, what?
    • MediaWiki 1.41.0-wmf.24
      • 309 Patches β–β–β–‡β–ˆβ–‚
      • 0 Rollbacks β–ˆβ–ˆβ–β–β–
      • 0 Days of delay β–β–β–ˆβ–β–
      • 1 Blockers β–…β–ˆβ–…β–β–
    • T345458 – Refactor Blubber's BuildKit frontend gateway to use LLB directlyβ€”enables some nicer features in our docker image builds
    • T343967 – Bugfixes for scap backport deploying two stacked patches

This week

edit

Progress update on the hypothesis for the week

  • Β 

Any new metrics related to the hypothesis

  • Β 

Any emerging blockers or risks

Any unresolved dependencies - do you depend on another team that hasn’t already given you what you need?

  • Β 

Are you on the hook to give another team something you aren’t able to give right now?

  • Β 

Have there been any new learnings from the hypothesis?

  • Β 

Are you working on anything else outside of this hypothesis? If so, what?

  • Β 

🌻 Open source/Upstream contributions

edit
https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Upstream

😢 Let's keep these empty

edit

Code review

edit

Gerrit Access requests

edit

Private repo requests

edit

https://phabricator.wikimedia.org/search/query/E7t2_WXX01bB/#R

Gerrit repo requests

edit

GitLab Access requests

edit

High priority tasks

edit

πŸ“… Vacations/Important dates

edit
https://office.wikimedia.org/wiki/HR_Corner/Holiday_List#2023
https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar
https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off

September 2023

edit
  • 04 Sep: Labor day (US Staff with reqs)
  • 08 Sep: Tyler
  • 15, 18 Sep: Tyler
  • 26 Aug–05 Sep: Brennen (πŸ”₯)
  • 13 Weds–17 Sun: Brennen β†’ KS (approximate)

October 2023

edit
  • 2-16 Oct: Jaime

Future

edit
  • 15Jan - 15Mar: Andre


πŸ”₯πŸš‚ Train

edit
https://tools.wmflabs.org/versions/
https://train-blockers.toolforge.org/
https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar


  • 2 Jan - wmf.17 - Dan + Antoine (Jaime out)
  • 9 Jan - wmf.18 - Jeena + Dan (Jaime out)
  • 16 Jan - wmf.19 - Jaime + Jeena
  • 23 Jan - wmf.20 - Brennen + Jaime
  • 30 Jan - wmf.21 - Ahmon + Brennen
  • 6 Feb - wmf.22 - Chad + Ahmon
  • 13 Feb - wmf.23 – Dan + Chad
  • 20 Feb - wmf.24 – Antoine + Dan
  • 27 Feb - wmf.25 – Jaime + Antoine
  • 6 Mar – wmf.26 – Jeena + Jaime
  • 13 Mar – wmf.27 – Brennen + Jeena
  • 20 Mar – wmf.1 – Ahmon + Brennen
  • 27 Mar – wmf.2 – Chad Dan + Ahmon
  • 3 Apr – wmf.3 – Antoine + Dan
  • 10 Apr – wmf.4 – Chad + Antoine
  • 17 Apr – wmf.5 – Jaime + Chad
  • 24 Apr – wmf.6 – Jeena + Jaime
  • 1 May – wmf.7 – Brennen + Jeena
  • 8 May – wmf.8 – Antoine + Brennen (Ahmon out + Antoine Out 8th)
  • 15 May – wmf.9 – Ahmon + Antoine (Dan out + Chad out)
  • 22 May – wmf.10 – Chad + Ahmon (Dan out + Jeena out 26th)
  • 29 May – wmf.11 – Dan + Chad (Memorial Day 29th)
  • 5 Jun – wmf.12 – Jeena + Dan (Brennen out, Jaime out)
  • 12 Jun – wmf.13 – Jaime + Jeena
  • 19 Jun – wmf.15 – Cancelled for offsite
  • 26 Jun – wmf.16 – Brennen + Jaime (Jeena out)
  • 3 Jul – wmf.17 – Antoine + Brennen (3rd + 4th holidays)
  • 10 Jul – wmf.18 – Dan + Antoine (Ahmon out)
  • 17 Jul – wmf.19 – Ahmon+Dan (Brennen out Friday)
  • 24 Jul – wmf.20 – Jaime+Ahmon
  • 31 Jul – wmf.21 – Ahmon+Jaime (Jeena out, Antoine out) (Ahmon volunteered)
  • 7 Aug – wmf. 22 – No train
  • 14 Aug - wmf.23 – Ahmon+Jaime (Jeena out, Antoine out)
  • 21 Aug - wmf.24 – Dan(brennen out, Jeena out, Antoine out)
  • 28 Aug – wmf.25 – Jeena+Dan
  • 04 Sep – wmf.26 – Antoine+Jeena
  • 11 Sep – wmf.27 – Jaime+Antoine+Andre as lurker!
  • 18 Sep – wmf.28 – Brennen+Jaime
  • 25 Sep – wmf.29 – 

Team discussions

edit

Offsite!

edit
  • SF
  • Approved Arrival Date: December 4, 2023
  • Approved Departure Date: December 9, 2023
  • In Person Meeting Days: December 5, 6, 7, 8

Please complete the survey by September 19

DX Runs the train

edit
  • We want to work closer with others in Developer Experience
  • We're looking for short, well-defined projects to tackle together
  • Initially, the projects should be time-bound and simpleβ€”we're trying to build our process for doing this and learn how to do this together
  • Later they will be bigger and gnarlier

We should still make sure at least one of us is on call for the train like we do currently to offer support. In particular because I'm assuming we are still taking care of the pre-train automated processes that run late Monday/early Tuesday (branch cut + train presync) There's a few places where I think we could already review/improve the docs beforehand: Security patches: They fail relatively often. This is the only documentation I could find about patches. It would be useful to have something that explains who to contact/how to get and updated security patch when necessary Triage/Bug reporting: Especially relevant people/teams to tag: https://www.mediawiki.org/wiki/Developers/Maintainers Rollback/holding the train:I would consolidate the sections we have about breakage, holding/rolling back and where to monitor in a single place. I would add more direct links to the relevant dashboards in logstash and grafana and revise the criteria themselves too; for example, based on how we normally operate, this criterion sounds too draconian: "In general, if there is an unexplained error that occurs within 1 hour of a train deployment β€” always roll back the train": https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Breakage https://wikitech.wikimedia.org/wiki/Deployments/Holding_the_train#Issues_that_hold_the_train

  • Update on "Investigate whether issues, operations, wikis, etc. can be disabled globally on GitLab"

Β  https://phabricator.wikimedia.org/T264231

    • Antoine tried it!
  • Continuous delivery all the things
    • Have to work with SRE on this since this is the deployment-charts repo
      • Need a different way to keeping track of state
    • Access control?
    • We want GitLab to do it?
    • Why do we store the version?
      • Information needs to be stored if we need to rebuild the cluster
    • Need the image name to run, should be store somewhere
    • Git is a question of access: team A can bump versions for team B
      • Don't object to *a* git repo, but the mechanismβ€”should give them control in what's gets deployed
    • don't want something deployed, don't merge it
    • Image tags are currently not meaningful
    • Building on a tag (although this may not be the strictest definition of continuous deployment)
      • Enforcing main always deployable would constrain people
      • Keeping those decoupled would remove that constraint on folks
    • Agree having the mentality of main always deployable is a good mindset, but it's too restrictive if our goal is to make things easier for developers