Wikimedia Release Engineering Team/Checkin archive/20190313
2019-03-13
editVacations/Important dates
edit- April 9-12: Greg at tech-mgt F2F in Portland
- April 17-19 (Wednesday - Friday) - Željko vacation
- April 22 (WMF Holiday) - US Staff
- April 22-27: Team offsite in Chicago
- April 29: Moved WMF Holiday for US staff at offsite
- May 1st - Lars, Antoine and Željko, Labor Day / May Day
- May 8th - Antoine, 1945 victory
- May 15 (Wednesday) - Željko vacation
- May 16-20 - Wikimedia Hackathon 2019 (Prague, Czechia)
- Attending: Greg, JR, Zeljko, James, and Jeena
- May 30th-31th - Antoine, Feast of the Ascension
- June 10th - Antoine, Pentecost -- see https://en.wikipedia.org/wiki/Eastertide for Antoine/France Easter holidays
- May 27 (Memorial Day) - US Staff
- June 6-7 - Brennen, Apogaea
- June 19 (Juneteenth) - US Staff
- June 17 - July 5 - Željko vacation
Rotating positions
editTrain
edit- Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/query/s3KW8bpsXhYF/#R
- Jan 07 - wmf.12 - Dan
- Jan 14 - wmf.13 - Dan
- Jan 21 - wmf.14 - Mukunda
- Jan 28 - wmf.15 - No Train (All Hands)
- Feb 04 - wmf.16 - Mukunda
- Feb 11 - wmf.17 - Tyler
- Feb 18 - wmf.18 - Tyler
- Feb 25 - wmf.19 - Antoine
- Mar 04 - wmf.20 - Antoine
- Mar 11 - wmf.21 - Zeljko
- Mar 18 - wmf.22 - Zeljko
- Mar 25 - wmf.23 - Dan
- Apr 01 - wmf.24 - Dan
- Apr 08 - wmf.25 - Mukunda
- Apr 15 - wmf.26 - Mukunda
- Apr 22 - 1.34.0-wmf.1 - NO TRAIN, team offsite
- Apr 29 - wmf.2 - Tyler
- May 06 - wmf.3 - Tyler
- May 13 - wmf.4 - Antoine
- May 20 - wmf.5 - Antoine
- May 27 - wmf.6 - Zeljko
- June 03 - wmf.7 - Zeljko
SoS
edit- Zeljko 4eva! :)
Team Business
editBook club
edit- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club
- Notes: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Book_club/Continuous_Delivery
- Next: March 21st at the "same" time (9am Pacific/16:00 UTC)
Spring Offsite
edit- Location: Chicago, IL (Central timezone, UTC-5 while we're there)
- Dates: Arrive Monday 4/22, Depart Saturday 4/27.
- BOOK YOUR FLIGHTS BY: March 21
- Activity day
- Fill out the spreadsheet: https://docs.google.com/spreadsheets/d/1zqO8Mk1wUU2ZtyAM9xU68CQTpJFEOPALfDKCj7aMNo4/edit
- Program:
- start listing your topics! https://etherpad.wikimedia.org/p/releng-offsite-201904-topics
Monthly reflection on accomplishments - March '19 edition
edit- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
- Add as you have them!
- CI tooling future WG started, blogged
- GerritBot comments on patches going through the pipeline (with fancy badges and the like)
- Train deploy notes are now automatically generated on branch push
- Scap 3.9.2-1 released in production
- Phabricator upgrade: https://phabricator.wikimedia.org/phame/post/view/147/projects_forms_and_subtypes_oh_my/
- Published the ISOSTWG results and recommendation on officewiki and announced: https://office.wikimedia.org/wiki/Internal_Support_for_Open_Source_Tools_Working_Group
Q4 Goals planning
edit- etherpad: https://etherpad.wikimedia.org/p/releng-1819Q4-goals
- Due: Monday March 18th, aka this Friday
- Q3 goals question from debt:
Annual Planning is coming up
edit- I emailed mark re future testing/"evaluation" environments
Incoming/Needs attention
editPywikibot CI
edit- https://phabricator.wikimedia.org/T132138
- Antoine to take a time boxed look into this, this week
Post-mortem "MWException: No localisation cache found for English."
edit- https://phabricator.wikimedia.org/T217719
- next steps?
> I think we missed running a scap pull and the cache generation. [when the server was repooled] > So that is a glitch in how we repool a MediaWiki server?
- greg to follow-up
Merge blocker: The table 'l10n_cache' is full in quibble-vendor-mysql-hhvm-docker
edit- https://phabricator.wikimedia.org/T217654
- "The bump from 256M to 320M must be good enough and I have updated the Jenkins jobs. Lowering priority to High." -- https://phabricator.wikimedia.org/T217654#5020364
Merge blocker: quibble-vendor-mysql-hhvm-docker in gate fails for most merges (exit status -11)
edit- https://phabricator.wikimedia.org/T216689
- "I have rollbacked the jobs container:" -- https://phabricator.wikimedia.org/T216689#5020757
- See T218209 though. :-(
Merge blocker: Failed to create /nonexistent/.pki/nssdb directory
edit- https://phabricator.wikimedia.org/T218209
- Caused by revert for T216689?
FYI: Wikimedia-production-error (Shared Build Failure)
edit
Cannot access beta cluster db
edit- https://phabricator.wikimedia.org/T217938
- Mukunda to take a look
Scrum of Scrums
edit- Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums
Incoming from last week
edit- Blocking:
Outgoing this week (wrong section heading is on purpose for copy/pasting into Scrum of Scrums etherpad
editRelease Engineering
edit- Blocked by:
- Blocking:
- Language: Several CI failures
- Readers Infrastructure: Review needed for deploying Extension:WikimediaEditorTasks to production (https://phabricator.wikimedia.org/T218136 )
- Search Platform: Thanks RelEng for working on https://phabricator.wikimedia.org/T216689
- Updates:
- Work progresses on CI tool evaluation https://phabricator.wikimedia.org/phame/post/view/149/work_progresses_on_ci_tool_evaluation/
- Train Health:
- Last week: 1.33.0-wmf.20 - https://phabricator.wikimedia.org/T206674
- This week: 1.33.0-wmf.21 - https://phabricator.wikimedia.org/T206675
- Next week: 1.33.0-wmf.22 - https://phabricator.wikimedia.org/T206676
- Code Health:
- SonarQube is available as an experimental job for all extensions https://gerrit.wikimedia.org/r/c/integration/config/+/490950
Callouts
edit- Release Engineering
Train status and happenings
edit- minor issue in MFE yesterday (undeclared variable, somehow not caught somewhere first)
Quarterly Goals for Q3
edithttps://www.mediawiki.org/wiki/Wikimedia_Technology/Goals/2018-19_Q3
TEC1 (Maint): Outcome 1 / Output 1.1
edit- GOAL: Automate the generation of change log notes
- WHO: Mukunda, (Tyler on backup)
- In progress should now run on branch cut https://integration.wikimedia.org/ci/job/train-deploy-notes/
- problem with ref filter: https://gerrit.wikimedia.org/r/#/c/integration/config/+/494778/
TEC1 (Maint): Outcome 1 / Output 1.1
edit- GOAL: Investigate notification methods for developers with changes that are riding any given train
- WHO: Mukunda, Tyler
TEC3 (Pipeline): Outcome 1 / Output 1.2
edit- GOAL: Instrument Quibble for data collection
- WHO: Mukunda, Antoine
TEC3 (Pipeline): Outcome 1 / Output 1.2
edit- GOAL: Create a graph where time is spent and make a prioritized list for improvements.
- WHO: Mukunda, Antoine
TEC3 (Pipeline): Outcome 2 / Output 2.1
edit- GOAL: Select and integrate a code health metric solution into our tooling.
- WHO: JR, ...
TEC3 (Pipeline): Outcome 3 / Output 3.1
edit- GOALS:
- Adopt more services into Deployment pipeline - task T212801
- cxserver, ORES (partially), citoid, changeprop, cpjobqueue (stretch)
- Deploy eventgate
- Adopt more services into Deployment pipeline - task T212801
- WHO: Dan, Tyler, Lars
- In progress cxserver
- Images built via deployment pipeline
- Namespaces created for k8s eqiad/codfw
- helm charts created
- Done citoid
- Images built via deployment pipeline
- Deployed
- Traffic switched
- changeprop
- Done eventgate
- In progress ORES
- cf: Dan's comments
TEC12 (DevProd): Outcome 1 / Output 1.1
edit- GOAL: Conduct interviews with development stakeholders and compile a report that informs future work creation of a rubric.
- WHO: Jeena, Mukunda
- Results are posted: https://www.mediawiki.org/wiki/Developer_Satisfaction
TEC13 (Code Health): Outcome 1 / Output 1.1
edit- GOALs:
- Develop and communicate guidelines and best practices for successful Code Stewardship.
- (Continued from Q2) Update/refresh review queue (review process for initial code deployment)
- WHO: JR
- Created mockup for Code Stewardship dashboard
- Created metrics tracking spreadsheet
TEC13 (Code Health): Outcome 2 / Output 2.2
edit- GOAL: 5 of the 15 prioritized repositories have at least 1 end-to-end test - task T206621
- WHO: Zeljko
TEC13 (Code Health): Outcome 2 / Output 2.3
edit- GOALs:
- Evolve/develop tools and processes to support the PE refactoring effort to improve code health.
- Develop common test strategy that enable teams to engage in more effective and efficient testing practices. (maybe should be output 2.4?)
- WHO: JR, Core Platform Team
- made progress on addressing some of the action items from discussions with CPT
- Started putting strategy to paper
TEC13 (Code Health): Outcome 3 / Output 3.2
edit- GOALs:
- Speak at All Hands on the status of Technical Debt
- Engage and coach development teams on their approach to managing technical debt.
- WHO: JR, Core Platform Team
- This goal area to be absorbed into broader Code Health goals moving forward.
TEC13 (Code Health): Outcome 4 / Output 4.1
edit- GOALs: Code Health Dashboard with 50% of repositories covered.
- WHO: JR, Core Platform Team
- SonarQube is available as experimental job for all extensions. Key step towards general availability of Code Health metrics dashboard.
Other non-goal work
editSelenium
editGerrit
editPhabricator
edit- James: I've very excited that https://secure.phabricator.com/T10578 and https://secure.phabricator.com/T10333 are now Resolved upstream. It's only been three years. ;-)
- the `user.transactions` api method is now deployed to production, this will facilitate rollback of vandalism should anyone get past the antivandalism extension.
Jenkins
edit- 2.15.11 still needs deployed due to healthcheck rollback
QA/Code Health
editSCAP
editStandup!
editAntoine
edit- What I plan to do this week
- What I'm blocked on
- Other?
Brennen
edit- What I plan to do this week
- CI WG
- Evaluate Zuul v3 - https://phabricator.wikimedia.org/T218138
- Pivotal/Concourse discussion
- Rough out docker-pkg templates for use by local-charts
- Script sshfs setup in local-charts
- Revisit docs questions - https://phabricator.wikimedia.org/T217614
- CI WG
- What I'm blocked on
- Other?
Dan
edit- What I plan to do this week
- Evaluate Tekton for CI WG https://phabricator.wikimedia.org/T217912
- Modify blubber.yaml configs in projects for v4 https://phabricator.wikimedia.org/T218142
- Deploy blubberoid
- Draft email to Analytics about feedback on Jenkins/Gerrit event-log datastore
- Begin implementation of .pipeline/config.yaml https://phabricator.wikimedia.org/T210267
- What I'm blocked on
- Other?
Greg
edit- What I plan to do this week
- Slides for c-level/board(?) meeting at end of month
- Book reading
- TechConf planning with Deb (meeting with big group on Monday)
- What I'm blocked on
- Other?
James
edit- What I plan to do this week
- Mostly still working with the Multimedia team on SDC stuff
- Book reading!
- What I'm blocked on
- –
- Other?
Jean-Rene
edit- What I plan to do this week
- Work on stewardship best practices, include relocate Code Stewardship page
- Work on test strategy goal
- What I'm blocked on
- Other?
Jeena
edit- What I plan to do this week
- Work on Localsettings in local-charts (automate manual config/install steps)
- Other local-charts work
- Read Book
- What I'm blocked on
- Other?
Lars
edit- What I plan to do this week
- CI WG
- Pivotal meeting
- Concourse
- Read CD book
- CI WG
- What I'm blocked on
- possibly getting ill
- Other?
Mukunda
edit- What I plan to do this week
- look into beta cluster db issue ( https://phabricator.wikimedia.org/T217938 )
- Phabricator, Phabricator, Phabricator
- Finish rolling out the Vandalism rollback stuff with Andre
- More dabbling with phabricator on minikube
- Read a book ( https://www.youtube.com/watch?v=GlKL_EpnSp8 )
- What I'm blocked on
- Other?
Tyler
edit- What I plan to do this week
- Deploy notes fix deployment
- Gerrit 2.15.11 re-rollforward
- GerritHealthCheckBot setup for healthcheck plugin
- blubber policyfile
- update docker-pkg documentation
- What I'm blocked on
- Other?
- code health metrics (Kosta) blocked on releng (Tyler/Antoine) https://gerrit.wikimedia.org/r/c/integration/config/+/494548
Željko
edit- What I plan to do this week
- T206675 1.33.0-wmf.21 deployment blockers
- T217901 Evaluate Phabricator Harbormaster
- Mukunda will be glad to have a 1:1 if you'd like help with this one.
- T214478 The first Selenium test for AbuseFilter
- T217051 Echo notifications automation smoke test
- What I'm blocked on
- code health metrics (Kosta) blocked on releng (Tyler/Antoine) https://gerrit.wikimedia.org/r/c/integration/config/+/494548
- thcipriani: I talked to Kosta a bit about this on Friday, I'd like to make sonarqube be triggered after the existing coverage jobs rather than reimplement the coverage jobs (I think thtat makes sense anyway)
- code health metrics (Kosta) blocked on releng (Tyler/Antoine) https://gerrit.wikimedia.org/r/c/integration/config/+/494548
- Other?
- Google calendar and Deployments calendar are not in sync :(
Grooming
editTeam Kanban Board Review and Triage
edit- closed and touched in the 7 days
- No update for 4 weeks
- No update for 3 weeks
- No update for 2 weeks
- No update for 1 week
- All Open
- Review To Triage column of #releng
Once / month-ish review of backlog(s)
edit- releng Review To Triage column of #releng
- releng-kanban Review unassigned in kanban
- releng-kanban Review 'backlog' colum of -kanban
- releng-next - Review for things we need to put on our kanban backlog
- releng-backlog - oh my, the huge backlog of things...