Wikimedia Release Engineering Team/Checkin archive/20190408
2019-04-08
editVacations/Important dates
edit- April 9-12: Greg at tech-mgt F2F in Portland
- April 11: Dan out
- April 17-19 (Wednesday - Friday) - Željko vacation
- April 18-19 (Thursday, Friday) - Lars on vacation in Chicago
- April 22 (WMF Holiday) - US Staff
- April 22-27: Team offsite in Chicago
- April 29: Moved WMF Holiday for US staff at offsite
- May 1st - Lars, Antoine and Željko, Labor Day / May Day
- May 8th - Antoine, 1945 victory
- May 15 (Wednesday) - Željko vacation
- May 16-20 - Wikimedia Hackathon 2019 (Prague, Czechia)
- Attending: Greg, JR, Zeljko, James, and Jeena
- May 30th-31th - Antoine, Feast of the Ascension
- June 10th - Antoine, Pentecost -- see https://en.wikipedia.org/wiki/Eastertide for Antoine/France Easter holidays
- May 27 (Memorial Day) - US Staff
- June 6-7 - Brennen, Apogaea
- June 19 (Juneteenth) - US Staff
- July 22 - August 9 - Željko vacation
- August 25 - September 4 - Brennen vacation
Rotating positions
editTrain
edit- Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/query/s3KW8bpsXhYF/#R
- Jan 07 - wmf.12 - Dan
- Jan 14 - wmf.13 - Dan
- Jan 21 - wmf.14 - Mukunda
- Jan 28 - wmf.15 - No Train (All Hands)
- Feb 04 - wmf.16 - Mukunda
- Feb 11 - wmf.17 - Tyler
- Feb 18 - wmf.18 - Tyler
- Feb 25 - wmf.19 - Antoine
- Mar 04 - wmf.20 - Antoine
- Mar 11 - wmf.21 - Zeljko 🐌
- Mar 18 - wmf.22 - Zeljko 💣
- Mar 25 - wmf.23 - Dan
- Apr 01 - wmf.24 - Dan [train not finished yet]
- Apr 08 - wmf.25 - Mukunda
- Apr 15 - 1.34.0-wmf.1 - Mukunda
- Apr 22 - wmf.2 - NO TRAIN, team offsite
- Apr 29 - wmf.3 - Tyler
- May 06 - wmf.4 - Tyler
- May 13 - wmf.5 - Antoine
- May 20 - wmf.6 - Antoine
- May 27 - wmf.7 - Zeljko
- June 03 - wmf.8 - Zeljko
SoS
edit- Zeljko 4eva! :)
Team Business
editTimespent spreadsheet
edit- For the avoidance of doubt: fill out the sheet week number for the previous week
- W16 https://docs.google.com/spreadsheets/d/1urCLNQXeEi1DOR8Iu0qW0yPt-glxX1laqlMovbGyCW0/edit#gid=0
- James: Should I be doing this now? (I don't have access.)
- Greg: Yes, will deal with this later.
- TODO: Greg give James access
- TODO: Greg clarify distinction between "maintenance" and tec1
- TODO: CI/CD book, educaton/prof dev column? for now "Other"
- James: Should I be doing this now? (I don't have access.)
Book club
edit- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club
- Notes: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club/Continuous_Delivery
- Next:
- At the team offsite
- Up through Chapter 9
Spring Offsite
edit- Location: Chicago, IL (Central timezone, UTC-5 while we're there)
- Dates: Arrive Monday 4/22, Depart Saturday 4/27.
- Activity day
- Museum of Science and Industry on Friday
- Cubs game Tuesday night
- Program:
- Forming....
- Come prepared to discuss team mission and scope
- Current priority of topics based on the etherpad votes:
- 1) Future of WMF CI:
- 1a) what tooling do we commit to for the next phase, processes of using CI/CD, implementation plan for new tooling/versions
- 1b) Discussion of rubric (see mail - [RelEng] CI evaluation, phase 2: criteria)
- 1c) Showcase integration/pipelinelib Pipeline Builder and how it could enable self-serve CI
- 2) Continuation/”conclusion” of team scope/mission
- 3) Future of the Beta Cluster
- 3a) Things we said during annual plan discussions: https://etherpad.wikimedia.org/p/betaclusterwhat
- 4) Discussion of Prodlike and how to get there
- 5) How do we organise and track our own work? (Greg)
- 6) Maintenance of documentation
- 7) PGP training and keysigning (liw) see https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Onboarding/GPG
- 8) logspam cleanup epic (follow-up from the book club discussion on 3/21)
- 9) Book club discussion - Up through chapter 9
- 10) Everybody does deployments (p271 Every member of the team should know how to deploy)
- 1) Future of WMF CI:
Skill matrix redux
edit- I plan to have you update it next week (the week before the offsite).
- Should we add people outside the team who have significant skills in our matrix? / bus factor indviduals.
- Yeah, we should note it somehow.
- Can we transpose the table now it's so wide? +1
Here is the current table, please add/strike-through/leave comments for how to improve it/make it relevant to your work today.
- Developer Tools Support
- MediaWiki-Vagrant
- Elastic-search
- Gerrit maint
- Phabricator maint
- Maintenance of misc. tools like Docker image list, misc. monitoring stuff, etc.?
- Continuous Integration Infra
- Jenkins maint
- Zuul maint
- Nodepool maint
- CI config / JJB
- docker-pkg
- Quibble
- Testing Tooling and Education
- Unit test maint tooling
- Integration test maint tooling
- Acceptance test tooling
- MW-Selenium (Ruby) deprecated in 2017 https://phabricator.wikimedia.org/J79
- Selenium (NodeJS)
- Integration Test Environments
- Beta Cluster
- Deploying software
- Deploying new MW branches/The Train
- backports & SWAT deploys
- Developing scap
- Debugging and/or Reporting log errors
- Deployment Pipeline
- k8s
- minikube
- blubber
- pipelinelib
- local-charts
- MediaWiki Releases
- Doing major releases
- Doing point releases
- Doing security releases
Monthly reflection on accomplishments - April '19 edition
edit- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
- Add as you have them!
- Phabricator vandalism rollback tool completed 🎉 (blog post? 😉)
Annual Planning
edit- https://etherpad.wikimedia.org/p/releng-fy1920ap-tec1
- https://etherpad.wikimedia.org/p/releng-fy1920ap-tec3
- https://etherpad.wikimedia.org/p/releng-fy1920ap-tec12
- https://etherpad.wikimedia.org/p/releng-fy1920ap-tec13
- https://etherpad.wikimedia.org/p/releng-fy1920ap-new
- Nothing new right now...
- I'm talking with Mark tomorrow morning (he won't be in Portland, sadly)
- apparently he's coming now, I'll talk to him there :)
Incoming/Needs attention
edit[Task] Add Scribunto to extension-gate in CI
edit- https://phabricator.wikimedia.org/T125050
- https://gerrit.wikimedia.org/r/#/c/integration/config/+/497574/
- calling into question time spent on unit tests in pre-merge tests.
- yes to having better guidelines
task-series scap plugin broken
edit- Friendly reminder to Mukunda :) Just need it by end of week
Scrum of Scrums
editIncoming from last week
edit- Blocking:
Outgoing this week (wrong section heading is on purpose for copy/pasting into Scrum of Scrums etherpad
editRelease Engineering
edit- Blocked by:
- Blocking:
- Updates:
- Train Health
- Last week: 1.33.0-wmf.24 - https://phabricator.wikimedia.org/T206678
- This week: 1.33.0-wmf.25 - https://phabricator.wikimedia.org/T206679
- Next week: 1.34.0-wmf.1 - [NEEDS TASK]
- Code Health
- Log Health
Callouts
edit- Release Engineering
Train status and happenings
edit- Blocked :/
- RefreshLinks, last action:
- [2019-04-05T16:02:11Z] <krinkle@deploy1001> Synchronized php-1.33.0-wmf.24/includes/jobqueue/jobs/RefreshLinksJob.php: Ib1ac31365f9c / T220037 (duration: 00m 59s)
- Need to fix scap clean :\
- thcipriani has a crappy fix in mind until http tokens in gerrit are back
- Any idea when HTTP tokens will come back? Weeks? Months? Never? :-(
- When security lets us, hopefully Soon™ I'm pushing for it :\
- thcipriani has a crappy fix in mind until http tokens in gerrit are back
Quarterly Goals for Q4
edithttps://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2018-19_Q4
TEC1 (Maint): Outcome 1 / Output 1.1
edit- GOAL: Undeploy the CodeReview extension.
- WHO: James, need help from CPT
- I will ping CPT about this this week
TEC1 (Maint): Outcome 1 / Output 1.1
edit- GOAL: Setup 1-3 of the CI WG options (Zuul v3, Argo, GitLab)
- WHO:
- Focus on a couple noteworthy repos: e.g.,
- core
- extensions
- ops/puppet
- Maybe setup in serial, i.e., a week per evaluation
- Questions:
- RelEng/Extended working group?
- At least in the WG eval it was good to have non-familiar people
- But maybe with the setup of options it might be beneficial to have experienced with current setup people.
- Folks outside the original working group to join-in to setup options; people TBD
- Do we need a rubric before we do this prototyping? (yes)
- DONE lars to work on rubric week of 2019-04-01
- See email 2019-04-08
- DONE lars to work on rubric week of 2019-04-01
- RelEng/Extended working group?
TEC3 (Pipeline): Outcome 1 / Output 1.2
edit- GOAL: Instrument Quibble for data collection
- WHO: Mukunda, Antoine
- Still no progress / nowhere to store this data and other tasks taking priority
TEC3 (Pipeline): Outcome 1 / Output 1.2
edit- GOAL: Create a graph where time is spent and make a prioritized list for improvements.
- WHO: Mukunda, Antoine
TEC3 (Pipeline): Outcome 1 / Output 1.2
edit- GOAL: Prepare the Deployment Pipeline for changes to our CI tooling.
- WHO: ???, ???
- Blocked by not having new CI tooling yet
TEC3 (Pipeline): Outcome 3 / Output 3.1
edit- GOAL: Create a .pipeline/config.yaml standard to give users more control over how their tests are run in the pipeline and allow the easy saving of artifacts at pipeline completion. (RelEng)
- WHO: Dan, Tyler, ???
- Dan has a patch up for pipelinelib https://gerrit.wikimedia.org/r/#/c/integration/pipelinelib/+/500134/
- needs review/is set it WIP
TEC3 (Pipeline): Outcome 3 / Output 3.1
edit- GOALS:
- Adopt more services into Deployment pipeline - task T212801
- Wikidata Termbox SSR, Kask for Session Storage Service, cpjobqueue (stretch), ORES (stretch)
- Adopt more services into Deployment pipeline - task T212801
- WHO: Dan, Tyler, Lars
There are tasks: https://phabricator.wikimedia.org/T220403
- changeprop
- In progress ORES
- cf: Dan's comments
- Wikidata Termbox SSR
- Kask for Session Storage Service
- cpjobqueue (stretch)
TEC12 (DevProd): Outcome 1 / Output 1.1
edit- GOAL: Provide an "Official" Docker base image for local development of MediaWiki based on the production tooling.
- WHO: Jeena, Brennen
TEC13 (Code Health): Outcome 1 / Outcome 3
edit- GOALs: Presentation/session(s) at the Wikimedia Hackathon on the current state of Code Health projects (technical debt and code stewardship)
- WHO: JR
- no progress
TEC13 (Code Health): Outcome 1 / Output 1.1
edit- GOAL:
- Publish a re-imagination of the Review Queue process.
- Develop and implement metrics around task and code-review responsiveness
- WHO: Greg, JR (and Andre)
- no progress
TEC13 (Code Health): Outcome 4 / Output 4.2
edit- GOALs:
- Expand SonarQube reporting into CI infrastructure
- Perform SonarQube analysis on all extensions
- Engage user communities in direct feedback solicitation
- WHO: JR, Zeljko, Code Health Metrics
- new CI patches sumitted. Going to be moving off experimental.
- merged a patch Friday for polling results/printing results to stdout
- Perform SonarQube analysis on all extensions - done https://sonarcloud.io/organizations/wmftest
Other non-goal work
editSelenium
edit- T217544 selenium-daily-beta-MediaWiki fails due to QuickSurveys inserting HTML in the content
- T220035 Drop Ruby Selenium CI jobs; we don't support them any more
- T174018 [EPIC] Port Minerva's browser tests to Selenium Node.js
- T219815 Create integration tests to cover potential issues with editing and uploading on Commons
Gerrit
edit- Deploy barricade tomorrow
- Revert tool work by EOW, hopefully
- Threads/crashing recently discussion: https://groups.google.com/forum/#!topic/repo-discuss/pBMh09-XJsw
Phabricator
edit- vandalism rollback tool
- Started working on some other phab security hardening
Jenkins
editQA/Code Health
edit- Daniel met with Code Health Metric WG to discuss his work around Cycle dependency.
- Setting up meeting with Corey and Marcella to discuss next/steps annual planning re software maintenance.
SCAP
edit- Need to fix scap clean for ssh
- for the time being I will comment out until fix for: https://phabricator.wikimedia.org/T218750
Standup!
editAntoine
edit- What I plan to do this week
- quibble and zuul upgrade
- upgraded zuul 5-6 hours ago
- should unblock gerrit 2.16
- Friday Gerrit outage (cf: https://groups.google.com/forum/#!topic/repo-discuss/pBMh09-XJsw )
- quibble and zuul upgrade
- What I'm blocked on
- Other?
Brennen
edit- What I plan to do this week
- Get first pass at releng/dev-images to a reviewable state
- Got docker-pkg / CI image notes from Antoine on Friday, acting on that
- Push from last week: Follow up with Eric Gardner re: docs
- Get first pass at releng/dev-images to a reviewable state
- What I'm blocked on
- Other?
Dan
edit- What I plan to do this week
- Finish up 1.33.0-wmf.24 train
- Fix up an outstanding issue with the pipelinelib Pipeline Builder feature
- https://gerrit.wikimedia.org/r/c/integration/pipelinelib/+/499918
- It's a big change, so: refactor change into separate commits and write some nice commit messages
- Write up an email to Analytics (MUST do)
- What I'm blocked on
- Other?
Greg
edit- What I plan to do this week
- tech-mgt F2F Tue-Fri, slow response
- Quality Team follow-up
- Gerrit incident meeting (today) follow-up
- Annual Budget process kickoff/walk through meeting today
- Talking with Deb today about Offsite agenda and support
- Quarterly goal checkin meeting today (tyler attending?)
- What I'm blocked on
- Other?
James
edit- What I plan to do this week
- I owe Jeena some Mac testing of local-charts
- CodeReview status wrangling from CPT
- What I'm blocked on
- —
- Other?
- —
Jean-Rene
edit- What I plan to do this week
- reach out/start Code Review workgroup
- Continue work on Test Strategy
- What I'm blocked on
- Other?
Jeena
edit- What I plan to do this week
- Address comments on mac mw install script/readme for local-charts
- Work on volume mounting script in local-charts
- Work on using x debug in the local-charts env
- What I'm blocked on
- Other?
Lars
edit- What I plan to do this week
- read Go book
- discuss rubric for CI evaluation
- attempt to get Minikube working in a VM
- What I'm blocked on
- Other?
- six months at WMF today
Mukunda
edit- What I plan to do this week
- Train
- Phabricating phabricator phabulously
- Train task generation is broken
- Phabricator global search is partially broken
- Phabricator calendar is broken
- What I'm blocked on
- Time
- Other?
Tyler
edit- What I plan to do this week
- Gerrit revert tool
- Gerrit plugin deploys
- Gerrit explosion monitoring
- What I'm blocked on
- Other?
- Quick/dirty fix for scap clean
Zeljko
edit- What I plan to do this week
- T217544 selenium-daily-beta-MediaWiki fails due to QuickSurveys inserting HTML in the content
- T220035 Drop Ruby Selenium CI jobs; we don't support them any more
- T174018 [EPIC] Port Minerva's browser tests to Selenium Node.js
- T219815 Create integration tests to cover potential issues with editing and uploading on Commons
- What I'm blocked on
- Other?
Grooming
editTeam Kanban Board Review and Triage
edit- closed and touched in the 7 days
- No update for 4 weeks
- No update for 3 weeks
- No update for 2 weeks
- No update for 1 week
- All Open
- Review To Triage column of #releng
Once / month-ish review of backlog(s)
edit- releng Review To Triage column of #releng
- releng-kanban Review unassigned in kanban
- releng-kanban Review 'backlog' colum of -kanban
- releng-next - Review for things we need to put on our kanban backlog
- releng-backlog - oh my, the huge backlog of things...