Wikimedia Release Engineering Team/Checkin archive/20190701
2019-07-01
editVacations/Important dates
edit- July 2 - Greg's birthday, taking off half day, JR to take half day too in celebration of Greg's birthday (and Debbie's).
- July 4 (US Independence Day) - US Staff
- July 5 - thcipriani, Greg, JR vacation
- July 5 - Lars off (swapping with weekend)
- July 10 - Lars off (swapping with weekend)
- July 22 - August 9 - Željko vacation
- August 7–19 - James off (inc. Wikimania)
- August 12 - September 8 - Dan leave
- August 12 (Glorious Twelfth) - US Staff
- August ??? - ??? - Antoine
- August 14–18 - Wikimania
- Attending: James, Lars, Jean-Rene
- August 15 - Željko, Assumption of Mary
- August 25 - September 4 - Brennen vacation
- September 2 (Labor Day) - US Staff
- October 14 (Indigenous Peoples' Day) - US Staff
- November 11 (Veterans' Day) - US Staff
- November 28–29 (Thanksgiving) - US Staff
- December 6 - Lars, Finnish Independence Day
- December 25–31 (Christmas) - US Staff
- December 25–26 - Lars, Christmas
- 2020 January 1 (New Year's Day) - US Staff, Lars
Rotating positions
editTrain
edit- Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/query/s3KW8bpsXhYF/#R
- June 24 - wmf.11 - Jeena (with Mukunda)
- July 1 - wmf.12 - No train (Fourth of July)
- July 8 - wmf.13 - Jeena
- July 15 - wmf.14 - Lars (with Antoine)
- July 22 - wmf.15 - Lars
- July 29 - wmf.16 - Brennen (with Tyler)
- Aug 5 - wmf.17 - Brennen
- Aug 12 - wmf.18 - No Train (Wikimania)
- Aug 19 - wmf.19 - Zeljko 😱
- Aug 26 - wmf.20 - Zeljko 😭
SoS
edit- Zeljko 4eva! :)
Team Business
editTimespent spreadsheet
edit- For the avoidance of doubt: fill out the sheet week number for the previous week
- link to week starting June 24: https://docs.google.com/spreadsheets/d/1urCLNQXeEi1DOR8Iu0qW0yPt-glxX1laqlMovbGyCW0/edit#gid=695570696
- Last one of the quarter!
- Please review any missing data for yourself!
Book club
edit- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club
- Notes: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club/Continuous_Delivery
- Next: July 12th, chapters 14+15 (the rest of the book) (9am Pacific)
Fall Offsite + TechConf19
edit- First round invites went out
Annual Planning
edit
Changes to the meeting
edit- Turn into more of a real stand-up (see new section: What I did last week) so that we can answer most of the other questions (e.g. what is the team blocked on?) from those individual updates.
- Might also move this meeting to not be on Monday, e.g. Thursday/Friday so the accuracy of "what I did this week" will be much higher.
- Annual plan/etc. discussions will move into one-off meetings rather than crashing the stand-up.
- Engineering Productivity won't meet as a whole each week. Sub-team meetings will continue (for RelEng and Performance) and be set up (for Q&T) :-) Annual planning managed by managers.
- SoS managed… somehow?
Monthly reflection on accomplishments - May '19 edition
edit- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
- Add as you have them!
- Phabricator vandalism rollback tool completed 🎉 (blog post? 😉)
- Upgrade Zuul to 2.5.1-wmf6 (which unblocks the Gerrit upgrade to 2.16) - https://phabricator.wikimedia.org/T208426
- Team offsite in Chicago
- Repository-hosted CI/CD pipeline configurations now supported (.pipeline/config.yaml) - https://phabricator.wikimedia.org/T210267
- Train notes published on branch cut
- Codehealth pipeline beta - https://phabricator.wikimedia.org/phame/live/1/post/160/introducing_the_codehealth_pipeline_beta/
- Some baseline local development images published
- Speculative CI meta-architecture published within WMF for feedback (two versions)
- Old image versions automatically removed from jenkins agents when /var/lib/docker space > 80%
- scap 3.10.0 cut
- Jenkins build timings reports: https://people.wikimedia.org/~dduvall/jenkins/
- Helped Kask team sketch an outline of its architecture (https://www.mediawiki.org/wiki/Kask)
- Fatal Monitor with marker lines for deployments: https://logstash.wikimedia.org/app/kibana#/dashboard/77cc3e90-aa27-11e7-9109-51bd3197f7a9?_g=()
Incoming/Needs attention
edit- REL1_33 branching for extensions: https://phabricator.wikimedia.org/T220653
- Reedy said he'll move forward with rc0 announcement soon.
- Mukunda tried to run the script but it ran into trouble. Will re-try, manually.
- Switching on HTTP Auth again still seems blocked. Barricade should help with this; review when Tyler gets back.
- Update 2019-06-03: Fighting fires last; should be able to do this week.
- Update 2019-06-10: Done with a quick hack by Reedy; do we need to fix the script for next time?
- http auth patches merged in upstream, next week is the earliest it'll be released
- Update 2019-06-17: Gerrit 2.15.14 is out, need to build and release, hopefully this week
- 2019-07-01: Gerrit on 2.15.13 after a brief and glorious day
- Documentation!
- Zuul and force merge: https://www.mediawiki.org/wiki/Topic:V14dlv7nt5ne7gsd
- Antoine to file task and reply
- Update 2019-06-24: https://phabricator.wikimedia.org/T225955 filed.
Scrum of Scrums
editIncoming from last week
editOutgoing this week (wrong section heading level is on purpose for copy/pasting into Scrum of Scrums etherpad
editRelease Engineering
edit- Blocked by:
- Security team (already acknowledged): Make phan-taint-check-plugin work on PHP > 7.0 so we can move CI to PHP72 https://phabricator.wikimedia.org/T207344
- Core Platform Team:
- (low priority): https://phabricator.wikimedia.org/T205361 is blocking undeployment of CodeReview.
- MediaWiki installer silently ignores invalid extensions https://phabricator.wikimedia.org/T225512
- SRE:
- Traffic Team (low priority): https://phabricator.wikimedia.org/T213769 is blocking undeployment of Wikipedia Zero.
- ServiceOps Team:
- thanks for scap 3.10.0-1 deploy \o/
- Thanks to DC Ops, contint1001 now has extra drives; how do we get them mounted? https://phabricator.wikimedia.org/T207707
- Unknown team (?): wikimania-scholarships hosting needs to move to PHP7 so we can drop php56 from CI. https://phabricator.wikimedia.org/T224906
- Blocking:
- Updates:
- Train Health
- Last week: 1.34.0-wmf.11 - https://phabricator.wikimedia.org/T220736
- This week: 1.34.0-wmf.12 - NO TRAIN, WMF HOLIDAY (4 July)
- Next week: 1.34.0-wmf.13 - https://phabricator.wikimedia.org/T220738
- Code Health
- Log Health
- All: Input greatly wished for on the "Future of CI" planning document: https://lists.wikimedia.org/pipermail/wikitech-l/2019-June/092227.html
- Train Health
Callouts
edit- Release Engineering
- All: Input greatly wished for on the "Future of CI" planning document: https://lists.wikimedia.org/pipermail/wikitech-l/2019-June/092227.html
- Unknown team (?): wikimania-scholarships hosting needs to move to PHP7 so we can drop php56 from CI. https://phabricator.wikimedia.org/T224906
Train status and happenings
edit- New filtered fatal monitor dashboard including markers for scap deployments: https://logstash.wikimedia.org/app/kibana#/dashboard/77cc3e90-aa27-11e7-9109-51bd3197f7a9?_g=()
- Need to fix scap clean :\ (cf: Needs Attention)
Quarterly Goals for Q4
edithttps://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2018-19_Q4
TEC1 (Maint): Outcome 1 / Output 1.1
edit- GOAL: Undeploy the CodeReview extension.
- WHO: James, need help from CPT
- Blocked. James will ping CPT about this this week (April 8th)
- … and again w/c 15 April.
- … and again w/c 6 May (in SoS).
- … and again w/c 27 May (in SoS).
- [Recurring item]
TEC1 (Maint): Outcome 1 / Output 1.1
edit- GOAL: Setup 1-3 of the CI WG options (Zuul v3, Argo, GitLab)
- WHO: Lars
- Gitlab:
- https://wmf-gitlab3.vm.liw.fi/ is up and accepts registrations with wikimedia.org (and liw.fi) email addresses
- Please play with it and tell Lars anything that seems iffy
TEC3 (Pipeline): Outcome 1 / Output 1.2
edit- GOAL: Instrument Quibble for data collection
- WHO: Mukunda, Antoine
- Blocked
TEC3 (Pipeline): Outcome 1 / Output 1.2
edit- GOAL: Create a graph where time is spent and make a prioritized list for improvements.
- WHO: Mukunda, Antoine
- Blocked
TEC3 (Pipeline): Outcome 1 / Output 1.2
edit- GOAL: Prepare the Deployment Pipeline for changes to our CI tooling.
- WHO: ???, ???
- Blocked by not having new CI tooling yet
TEC3 (Pipeline): Outcome 3 / Output 3.1
edit- GOAL: Create a .pipeline/config.yaml standard to give users more control over how their tests are run in the pipeline and allow the easy saving of artifacts at pipeline completion. (RelEng)
- WHO: Dan, Tyler, ???
Done
TEC3 (Pipeline): Outcome 3 / Output 3.1
edit- GOALS:
- Adopt more services into Deployment pipeline - task T212801
- Wikidata Termbox SSR, Kask for Session Storage Service, cpjobqueue (stretch), ORES (stretch)
- Adopt more services into Deployment pipeline - task T212801
- WHO: Dan, Tyler, Lars
There are tasks: https://phabricator.wikimedia.org/T220403
- Wikidata Termbox SSR
- Done
- Kask for Session Storage Service
- Done
- cpjobqueue (stretch)
- Not done -> later
- ORES
- cf: Dan's comments
- Not done -> later
TEC12 (DevProd): Outcome 1 / Output 1.1
edit- GOAL: Provide an "Official" Docker base image for local development of MediaWiki based on the production tooling.
- WHO: Jeena, Brennen
- https://phabricator.wikimedia.org/T212449
- Done for MediaWiki, for some values of "done" and "MediaWiki". Production-likeness needs considerable work.
TEC13 (Code Health): Outcome 1 / Output 3
edit- GOALs: Presentation/session(s) at the Wikimedia Hackathon on the current state of Code Health projects (technical debt and code stewardship)
- WHO: JR
Done
TEC13 (Code Health): Outcome 1 / Output 1.1
edit- GOAL:
- Publish a re-imagination of the Review Queue process.
- Develop and implement metrics around task and code-review responsiveness
- WHO: Greg, JR (and Andre)
- Review Queue
- Blocked on Greg time
- Task and code-review responsiveness metrics
- No progress
= TEC13 (Code Health): Outcome 4 / Output 4.2
edit- GOALs:
- Expand SonarQube reporting into CI infrastructure
- Perform SonarQube analysis on all extensions
- Engage user communities in direct feedback solicitation
- WHO: JR, Zeljko, Code Health Metrics
Other non-goal work
editRelease MW 1.33
edit- Handed off to Reedy along with security releases.
Selenium
editGerrit
editPhabricator
edit- git-ssh is finally fixed!
Jenkins
editQA/Code Health
editSCAP
editStandup!
editAntoine
edit- What I did last week
- Quibble release, reviews etc
- What I plan to do this week
- Quibble review for parallelism. Clean up mess of jobs for mediawiki/* repos and multiple branches madness
- Get contint1001 partition with SRE
- stretch: look at Zuul phase out
- Rebuild Jenkins instances for Stretch, less RAM and s/slave/agent/ https://phabricator.wikimedia.org/T226233
- What I'm blocked on
- Other?
Brennen
edit- What I did last week
- Attended MW dependency resolution discussion, CTO meet & greets
- Advertised Docker SIG (without much success)
- Worked on understanding pipelinelib, jjb, etc.
Turn pipeline stage steps into objects with run()/validate() methods
- What I plan to do this week
- pipelinelib config validation (and maybe some additions to docs)
- Probably Add a builder script to mediawiki/core
- Finish reading book
- What I'm blocked on
- Other?
Dan
edit- What I did last week
- What I plan to do this week
- What I'm blocked on
- Other?
Greg
edit- What I did last week
- %&@!Q)!%^%
- TechConf invitations
- CTO meet and greets
- Chatted with a new Comms person re internal communications
- MW dependency meeting
- What I plan to do this week
- %^@()%*)@%
- Half day on Tuesday, out Thur/Fri
- Read
- What I'm blocked on
- Other?
James
edit- What I did last week
- Some light production UBN fixing, including a Friday deploy, whee.
- Bumping all CI PHP docker images to include the gmp PHP extension.
- What I plan to do this week
- [Again] Migrate some remaining node6 CI jobs to node10 (notably, OOUI) https://phabricator.wikimedia.org/T211784
- [Again] Building a proof of concept of shims in WikimediaMessages so we can undeploy things better: https://phabricator.wikimedia.org/T222918
- What I'm blocked on
- Blocked by SRE ServiceOps (migrate contint1001 to stretch -> php7x): Dropping php56 CI testing https://phabricator.wikimedia.org/T224906
- Blocked by Security (make phan-seccheck php72 compat): Migrating CI phan jobs over to php72 https://phabricator.wikimedia.org/T207344
- [low priority] Blocked by SRE Traffic (scary VCL management): Dropping WikipediaZero https://phabricator.wikimedia.org/T213769
- [low priority] Blocked by Core Platform (dump of review commennts): Dropping CodeReview https://phabricator.wikimedia.org/T116948
- Other?
Jean-Rene
edit- What I did last week
- Worked on various Code Stewardship Reviews
- Attended as many CTO meet and greets as I could
- scheduled Code Health Office hours - circular dependency
- What I plan to do this week
- Help role out more extensions into Code Health pipeline
- prep for Code Health Office Hours and Code Health Code Review WG kickoff meeting
- What I'm blocked on
- Other?
Jeena
edit- What I did last week
- Train
- Book club
- deployment-charts scaffold
- What I plan to do this week
- merge deployment-charts scaffold after reviews
- move charts to deployment-charts
- understanding beta cluster
- local-charts stuff
- Read book
- What I'm blocked on
- Other?
Lars
edit- What I did last week
- Made some further updates to the CI architecture document. Published it. Asked for feedback via wikitech-l and Mastodon.
- Meet and greet with two CTO candidates Rick Spencer and Graham Ingersoll.
- Participated in the Docker SIG meeting. It was very short, and we kind of agreed it's not very useful.
- Read chapter 13 of the CD book. Attended book club to discuss it.
- Made some more concrete plans to build a CI/CD system for WMf around GitLab. For now, just a prototype, and the GitLab component might be replaced with another candidate later. Discussed them with Greg, Tyler, and James a bit.
- Discussed MediaWiki development and testing, and possible CD with James. We'll be continuing this discussion.
- Sketched a skeleton of an HTTP API for the new CI components. (Very simple, very easy, very much not performant; Python, bottle.py, signed JWT access tokens, haproxy, Let's Encrypt.) http://git.liw.fi/wmf-ci-arch/tree/api.py is anyone's curious.
- What I plan to do this week
- Implement at least the VCS worker component for the new CI prototype around GitLab.
- Read book chapter 14.
- Chat with James about MediaWiki and new CI.
- Deployment Pipeline meeting on Thursday.
- What I'm blocked on
- Other?
- Spencer > Ingersoll > Noteboom
- I'll be away on Friday; I worked yesterday, Sunday, instead (lovely, quiet work day).
Mukunda
edit- What I did last week
- Train (backup for Jeena)
- Read a book
- Tested the fix for phabricator git-ssh - https://phabricator.wikimedia.org/T224677
- Merged phabricator upstream changes
- Started to build a prototype kibana extension for linking errors to phab tasks - https://phabricator.wikimedia.org/T185155
- Started rethinking the scap swat workflow with Tyler - some details captured in https://phabricator.wikimedia.org/T226682
- What I plan to do this week
- Meeting with William Doran to explore workboard triggers
- Deploy phabricator upstream changes
- Continue working on kibana extension
- What I'm blocked on
- Other?
Tyler
edit- What I planned vs did last week
- Done (and reverted) Deploy Gerrit 2.15.14
- Done (and reverted) Turn back on HTTP auth + announce
- Done (but not merged) Bump blubber version (to finish policy file work)
- Done lib/extensions deps meeting
- Done More talking about Enhance MediaWiki deployments for support of php7.x
- Not done Cannot assign user name "XXX" to account ####; name already in use. https://phabricator.wikimedia.org/T216605
- What I plan to do this week
- Cannot assign user name "XXX" to account ####; name already in use. https://phabricator.wikimedia.org/T216605
- Merge Barricade remove lucene deps https://gerrit.wikimedia.org/r/519168/ + other barricade work
- Clear error dashboard of gerrit (on the chance that one of these errors is our problem)
- Email for pipeline cross-team meeting ( Thursday :( )
- Stretch: blubber update
- Stretch: pipeline docs touching (looks like
- What I'm blocked on
- Other?
- Done (thanks effie!) scap 3.10.0-1
Zeljko
edit- What I did last week
- catching up
- reading the book
- What I plan to do this week
- T188742 Run tests daily targeting beta cluster for all repositories with Selenium tests
- T199113 All repositories with Selenium tests should use wdio-mediawiki
- T215178 Self-hosted SonarQube - check with Amir if he made any progress
- T227009 selenium-daily-beta-Echo Jenkins job failing
- Read the book
- What I'm blocked on
- Other?
- Antoine: can we enable Sonar for more repos (related to CI)?
- Tyler: can we add new Sonar Gerrit bot?
- SoS notes
- paperwork :| (start of the month)
Grooming
editTeam Kanban Board Review and Triage
edit- closed and touched in the 7 days
- No update for 4 weeks
- No update for 3 weeks
- No update for 2 weeks
- No update for 1 week
- All Open
- Review To Triage column of #releng
Once / month-ish review of backlog(s)
edit- releng Review To Triage column of #releng
- releng-kanban Review unassigned in kanban
- releng-kanban Review 'backlog' colum of -kanban
- releng-next - Review for things we need to put on our kanban backlog
- releng-backlog - oh my, the huge backlog of things...