Wikimedia Release Engineering Team/Checkin archive/2021-02-24


2020-02-24

edit

Vacations/Important dates

edit
https://office.wikimedia.org/wiki/HR_Corner/Holiday_List
How to do it
  • 15 Feb: Presidents' Day -- US staff with reqs
  • 22 Feb: Dan out


  • 29 Mar: US staff with reqs
  • 12 Apr: US staff with reqs
  • 22 Apr: Earth Day -- US staff with reqs

Train

edit
Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/query/s3KW8bpsXhYF/#R
https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Important_dates


  • 16 Nov - wmf.18 - Ahmon + Antoine
  • 23 Nov - wmf.19 - No Train - Thanksgiving Thurs/Fri https://phabricator.wikimedia.org/T263185
  • 30 Nov - wmf.20 - Antoine + Mukunda
  • 7 Dec - wmf.21 - Mukunda + Dan
  • 14 Dec - wmf.22 - Dan + Jeena
  • 21 Dec - wmf.23 - No Train
  • 28 Dec - wmf.24 - No Train
  • 4 Jan - wmf.25 - Jeena + Lars Antoine
    • NB: Lars is only back from holiday on Thursday Jan 7
  • 11 Jan - wmf.26 - Lars + Jeena
  • 18 Jan - wmf.27 - Brennen + Lars (Monday is a holiday)
  • 25 Jan - wmf.28 - Ahmon + Brennen
  • 1 Feb - wmf.29 - Antoine + Ahmon
  • 8 Feb - wmf.30 - Mukunda + Antoine
  • 15 Feb - wmf.31 - Dan + Mukunda (Monday is a holiday)
  • 22 Feb - wmf.31 - Jeena + Dan
  • 1 Mar - wmf.31 - Lars + Jeena
  • 8 Mar - wmf.31 - Brennen + Lars


Status

edit
https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Roles#Train_Conductor
  • 2019-08-14 onwards: Zeljko 🎸 🎷 \o/
  • 2020-08-26 onwards: Deb is in charge/SoS is async
  • 2020-11-25: Brennen
  • 2020-12-02: Ahmon
  • 2020-12-09: Tyler
  • 2020-12-16: Antoine
  • 2021-01-06: Tyler
  • 2021-01-13: Text only update
  • 2021-01-20: Mukunda
  • 2021-01-27: Text only update
  • 2021-02-03: Thcipriani
  • 2021-02-10: Thcipriani
  • 2021-02-24: Thcipriani

Outgoing

edit

Thanks

edit
  • Serviceops unsticking VMs for GitLab
  • Moritz, jbond, godog for input on GitLab things

Callouts

edit

Incoming

edit

Team Business

edit

Incoming/Needs attention

edit
  • TRAIN, train, train
    • Discuss the idea of having a Tuesday train checkin about current errors and whether to block/roll
    • Should it include the entire team?
    • Public, all WMF tech/product, or team private?
    • IRC or Slack?
      • Note from earlier discussion: Matters whether it's community-accessible.
    • Discussion notes
      • Question of whether signoff is meaningful before code rolls out
      • Question of timing for EU folks
      • Distinction between signoff for train to roll initially and log triage
      • Lars: Proposal to automate as far as deploy to group0
      • Tyler:
        • The pain of the process isn't that we have to deploy, it's that we have to care about other people's errors.
        • Want to make the overall process better, not just shift it around
        • Work to determine who knows what's going on is untracked
      • Jeena:
        • Re: Lars' proposal -
      • Lars: Augment previous proposal: How about we negotiate with Platform Engineering etc. to select a representative for each train. Go / no-go committee every week, we know who they are before the train starts
      • Brennen: fundamental problem -- if you have code going out you need to be watching logs -- however we get to that is how we make things better independent of the mechanisms of deploying
      • Jeena: Competing idea to go/no-go: Could have a RelEng partner for each product team that would help them do their CI and deployments in a more individualized way. So that they'd be on the hook but things could happen faster.
        • +1s from Lars, Mukunda
      • Tyler: Complementary to this idea, want to push forward https://wikitech.wikimedia.org/wiki/User:Thcipriani/Deployments/Patch_type_criteria
        • Please have a look at this.
        • Idea to get folks deploying their own code... People are relying more on the train. First step might be deciding whether a change should ride the train or be backported.
      • Tyler: Actions: Some sort of proposal. Is the representatvie a good first step?
      • Ahmon: are those folks looking at the logs?
      • Dan: Instead of representatives could we have individuals? All of the people who wrote patches.
      • Brennen: sometimes mechanisms are better than org structures. Formalized mechanisms rather than mandated meetings might work better. You have to push this button if you want your code to code to stay deployed.
      • Antoine: CR+2 is already sort of this
      • Brennen: I guess I'm advocating for something like +2 for verified-in-production
      • Antoine: Staging / beta
      • Lars: post deployment voting on patches -- how do I know that *my code* is working.
      • Jeena: Manual testing of new code after code rolls out to each group
      • Brennen: Add a comment to each patchset that your code has reached groupX
        • We have a release tagger bot, but it doesn't say anything about whether your patch is actually in production.

Book club/Lunch and Learn

edit

Monthly reflection on accomplishments - Feb '21 edition

edit
https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
Add as you have them!
  • PipelineLib fully working on releases-jenkins.wikimedia.org
  • Rust introduction talk (not strictly RelEng business)
  • logspam-watch minimum hits consolidation feature
  • Gearman plugin deployed. Merged bunch of pending changes + a fork from GoodData company which adds support for Pipeline jobs

Standup!

edit

Ahmon

edit
  • Updates:
    • Verified that rebuildLocalisationCache.php doesn't currently require DB access in production. Will attempt to turn that into a policy so that building l10n files can safely be treated as a fully offline operation.
    • Thinking about approaches to a no-etcd mode for mediawiki-config.
    • Design change: Include l10n files in the built MW images. It's just better.
  • Blocked by:
    • none
  • Blocking:
    • none


Antoine

edit
  • Updates:
  • Blocked by:
    • workflow-jobs are not registered. They are tied to `master` however it does not have any executor and thus no GearmanWorkerThread able to elect itself to register the function (reproduced locally).
  • Blocking:

Brennen

edit
  • Blocked by:
  • Blocking:
  • Updates:
    • Went to the airport. It was weird.
    • GitLab
      • Kickoff meeting yesterday.
      • "GitLab (Initialization)" milestone for the init project: https://phabricator.wikimedia.org/project/view/5212/
      • Wrote up some request numbers for Gerrit to give S&F a rough idea what kind of traffic they should test GitLab instance against.
      • Today: Finish bashing out a description of desired auth situation.
    • logspam-watch is crying out for emojis🎉
  • Blocked by:
    • Some docker container networking issue on releases1002
  • Blocking:
  • Updates:
    • Added a bunch of new features to pipelinelib to get m8s multiversion image build working
    • Releases jenkins can _almost_ build a multiversion image

Jeena

edit
  • Blocked by:
  • Blocking:
  • Updates:
    • Choo choo🚂
    • pet-expedition

Lars

edit
  • Blocked by:
    • Computers
  • Blocking:
    • Good things
  • Updates:
    • Fixing train-dev which broke since Friday

Mukunda

edit
  • Blocked by:
    • kibana is a bastard
  • Blocking:
  • Updates:
    • Have tried and failed at a bunch of different angles of recreating phatality. Latest idea is to pull data from the phabricator side instead of pushing from the kibana side. Details coming soon.

Tyler

edit