Wikimedia Release Engineering Team/Checkin archive/2024-10-16
2024-10-16
editπ Agenda
edit- Old TODOs/Reminders
- Wins/anti-wins
- Important dates
- Train
- Discussions
- AK: MediaWiki New Errors ECS
- Backport deployers sadness
- Triage
Old TODOs
edit- [x] TODO: team training setup 4 backports
- Post notes
- TODO: thcipriani add thoughts to private repo for PrivateSettings task (https://phabricator.wikimedia.org/T355026)
- TODO: thcipriani/andre: gerrit policy talk pageβfollowup
- https://www.mediawiki.org/wiki/Topic:Yczkcjnu474mmczl
- [ ] send to Wikitech
- TODO: screen/tmux for scap
- Update documentation
- Needed: email to wikitech
π Wins/winterrogation
edit- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
- Oct 2024
- Gave a 45min presentation at Wikicon NA on cloud services/toolforge https://www.mediawiki.org/wiki/File:What%27s_new_with_Wikimedia_Cloud_Services,_WikiConNA_2024.pdf
- Dan found a bug in localization syncing
- Fixed l10n CDB file handling on secondary masters:
- https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/457
- https://gerrit.wikimedia.org/r/c/operations/puppet/+/1076019
- Phabricator incoming mail works again! Broken since Feb.
- Phab deploy renaming "wikitech accounts" -> "ldap accounts"
- Updated users in bitergia database
- SpiderPig demo
- Toolforge standards commitee all through NDA!
- Volunteer NDA steps are reduced and clearer on the wiki docs
- Deleting branches via train-branch bot
- https://tools-static.wmflabs.org/jenkins-build-stats/
- Increased quota for integration -- https://phabricator.wikimedia.org/T376847
π Vacations/Important dates
edit- https://office.wikimedia.org/wiki/HR_Corner/Holiday_List#2024
- https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar
- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off
Next week
edit- Thu Oct 17: Antoine's Wikiversary
- Fri Oct 18: Jaime out; Bryan out
- Oct 01-11: Jeena
- Oct 01β02: Dan
- Oct 03-07: Bryan, Tyler @ WikiConNA
- Oct 03-06: WikiCon North America (Indianapolis)
- Oct 06: Dancy
- Oct 08 (Tue; only first half of UTC day): Andre
- Sept 9-Oct 11 Jeena
- Oct 11: Bryan
- Oct 14: Indigenous Peoples' Day (also Columbus Day) US Staff w/reqs
- Oct 18: Bryan
- Oct 18: Jaime
- Oct 25: Bryan
- Oct 28: Andre public holiday
- Nov: Likely three weeks for Andre once he has sorted out eviction dates and a new flat
- Nov 1: Bryan
- Nov 8: Bryan, Jeena
- Nov 11-19 or so maybe: Andre
- Nov 11 (Mon): Veteran's Day, US staff with reqs
- Nov 15: Bryan
- Nov 22: Bryan
- Nov 28β29 (Thu, Fri): Thanksgiving holiday, US staff with reqs
- Dec 23: Andre, Jeena
- Dec 24β31 (TueβTue): End of Year Holiday, Global Holiday
Future
editπ₯π Train
edit- https://versions.toolforge.org/
- https://train-blockers.toolforge.org/
- https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar
Rotation
editPAST
edit- 05 Aug (05-09) β 1.43.0-wmf.17 β Jaime + Brennen (Dan out, Global holiday Friday)
- 12 Aug (12β16) β 1.43.0-wmf.18 β Jeena + Jaime (Ahmon out, Antoine out)
- 19 Aug (19β23) β 1.43.0-wmf.19 β Andre + Jeena (Antoine out)
- 26 Aug (26β30) β 1.43.0-wmf.20 β Antoine + Andre (Brennen out)
- 02 Sep (02β06) β 1.43.0-wmf.21 β Ahmon + Antoine (US holiday Monday, Brennen out Tues)
- Group0 rollback due to warnings and errors
- Antoine backported fixes earlier + one going out now
- 09 Sep (09β13) β 1.43.0-wmf.22 β Dan + Ahmon
- 16 Sep (16β20) β 1.43.0-wmf.23 β Jaime + Dan (Brennen out)
- 23 Sep (23β27) β 1.43.0-wmf.24 β Brennen + Jaime (andre out)
- 30 Sep (30βOct 4) β 1.43.0-wmf.25 β Antoine, Brennen (Out: Jeena, Dan, Andre (Tue), Ahmon (Fri), Bryan, Tyler)
- 07 Oct (07β11) β 1.43.0-wmf.26 β Andre, Antoine (Out: Jeena, Bryan (Fri))
NOW and NEXT
edit- 14 Oct (14β18) β 1.43.0-wmf.27 β Jeena, Andre (Holiday: Mon (US only), Out: Bryan (Fri), Jaime (Fri))
- 21 Oct (21β25) β 1.43.0-wmf.28 β Ahmon, Jeena (Out: Bryan (Fri))
- 28 Oct (28βNov 1) β 1.44.0-wmf.1 β Dan, Ahmon
- 04 Nov (04β08) β 1.44.0-wmf.2 β Jaime, Dan
- 11 Nov (11β15) β 1.44.0-wmf.3 β Brennen, Jaime (Holiday: Mon (US only))
- 18 Nov (18β22) β 1.44.0-wmf.4 β
- 25 Nov (25β29) β 1.44.0-wmf.5 β (Out: <>, Holiday: Thu, Fri (US only))
- 02 Dec (02β06) β 1.44.0-wmf.6 β
- 09 Dec (09β13) β 1.44.0-wmf.7 β
- 16 Dec (16β20) β 1.44.0-wmf.8 β
- 23 Dec (23β27) β 1.44.0-wmf.9 β NO TRAIN (Holiday: TueβFri (Global))
- 30 Dec (30βJan 03) β 1.44.0-wmf.10 β (oh noes) NO TRAIN (Holiday: TueβFri (Global))
- 06 Jan (06β10) β 1.44.0-wmf.11 β
Team Discussions
editBackport deployer sadness
editBefore window: BP == backport, implies reviewed/ready for deploy, but unmerged ve [BP+2] operation/mwconfig [BP+2] Zuul runs gating tests: ve + operation/mwconfig if passes generate artifact if not passes skip that patch from artifact BP-1 notifys patch author so they can fix (maybe before the window!) Automated artifact generation IMG0 mw deployment image + ve IMG1 mw deployment image + ve + operation/mwconfigDuring window Scenarios: - I want to deploy in order one-at-a-time - I want to deploy all at the same time - do no want to deploy a change - need to deploy an emergency change first - late entry to backport - mwdebug check is bad (or automated check is bad/canaries) One at a time scap backport ve CR+2 ve no gate and submit merge automatically deploys: IMG0 scap backport mwconfig CR+2 mwconfig no gate and submit merge automatically deploys: IMG1 same time scap backport ve mwconfig CR+2 ve mwconfig no gate and submit merge both automatically deploys: IMG1
Additional thoughts:
- Maybe these should have a relation chain in gerrit
- Might require same repo
- who are the bp+2 people?
- it seems like the burden to approve backports is usually the deployers
- Currently: wmf-deployment (could be mediawiki group tho)
- Hope: folks with more intimate knowledge of the change
- This might be a replacement for CR+1 --- good since that's overloaded
- Could have bot ping people lacking BP+2
- Lowering cogantive load means more people willing to deploy
- TODO: write this up in phab/gather ideas from backporters
- TODO: set up another session
New Errors Dashboard
edit- AK: Am I supposed to accept it as normal that I have to scroll down a few miles on the "MediaWiki New Errors ECS" Logstash dashboard (<https://logstash.wikimedia.org/app/dashboards#/view/c7013c90-a487-11ec-be91-b3435f0c0c49>) to pass filters, or who to push eng teams into fixing stuff (or buy me a larger screen)? π―
- How do we decrease the number of filters?
- Talk to managers?
- Need to fix the script
- Have a script, it broke when we moved logstash behind IDP https://gitlab.wikimedia.org/repos/releng/release/-/tree/main/check-new-error-tasks
- Are all these issues still open?
- Is this the logspam problem? Hassle folks to achieve temporary noise reduction.
- PokeBot?
- Bot auth to get thru IDM script no instance of the error: close it out
- TODO: Bryan file a task w/o11y + obs for getting login auth thru IDP
- DISCUSSION: better error stuff
π» Open source/Upstream contributions
edit