Wikimedia Release Engineering Team/Checkin archive/20181010
2018-10-10
editVacations/Important dates
edit- Beginning October - Mid october, Antoine to take off some weeks/days/part time (October 1-14 according to https://phabricator.wikimedia.org/E40)
- October 8th - Holiday (Indigenous People's Day, Independence Day - Željko)
- October 8th - New hire start date
- October 21-28 - Greg in Portland for TechConf+TechMgrs F2F
- November 1 (Thursday) - Holiday (All Saints' Day - Željko)
- November 12th - Holiday (Veteran's Day, Observed)
- November 22+23 - Holidays (Thanksgiving)
- November 25-december 2nd: Mukunda vacation (in California ahead of the offsite)
- Week of December 3rd - Team offsite
- December 24-28 - Holidays (Christmas)
Rotating positions
editTrain
edit- Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R
- Oct 08 - wmf.25 - Dan (No train due to DC switchover) <----
- Oct 15 - wmf.26 - Mukunda (last 1.32 wmf.XX release, 1.33 starts the next week)
- Oct 22 - wmf.1 - Mukunda (warning, TechConf happening, ping Greg if you need responses from anyone there...)
- Oct 29 - wmf.2 - Tyler
- Nov 05 - wmf.3 - Tyler
- Nov 12 - wmf.4 - Antoine
- Nov 19 - wmf.5 - No Train (Thanksgiving)
- Nov 26 - wmf.6 - Antoine
- Dec 03 - wmf.7 - No Train (Offsite)
- Dec 10 - wmf.8 - Zeljko
- Dec 17 - wmf.9 - Zeljko
- Dec 24 - wmf.10 - No Train (Holiday break)
- Dec 31 - wmf.11 - No Train (Holiday break)
- Jan 07 - wmf.12 - Dan
- Jan 14 - wmf.13 - Dan
- Jan 21 - wmf.14 - Mukunda
- Jan 28 - wmf.15 - No Train (All Hands)
- Feb 04 - wmf.16 - Mukunda
- Feb 11 - wmf.17 - Tyler
- Feb 18 - wmf.18 - Tyler
- Feb 25 - wmf.19 - Antoine
SoS
edit- Sep 26 - Zeljko
- Oct 03 - Zeljko
- Oct 10 - Zeljko <----
- Oct 17 - Zeljko
- Oct 24 - Zeljko
- Oct 31 - Zeljko
- Nov 07 - Zeljko
- Nov 14 - Zeljko
- Nov 21 - Zeljko
- Nov 28 - Zeljko
- Dec 05 - Zeljko
- Dec 12 - Zeljko
- Dec 19 - Zeljko
- Dec 26 - Zeljko
- Jan 02 - Zeljko
- Jan 09 - Zeljko
- Jan 16 - Zeljko
- Jan 23 - Zeljko
- Jan 30 - Zeljko
- Feb 06 - Zeljko
- Feb 13 - Zeljko
- Feb 20 - Zeljko
- Feb 27 - Zeljko
Team Business
editHiring
edit- Software Engineer position open and reviewing/hiring for now
- https://boards.greenhouse.io/wikimedia/jobs/1225258
- graded 3 take homes
First Offsite
editDetails:
- Week of December 3rd
- At the Queen Mary hotel in Long Beach
- Deb T will be facilitating
Topics!
Needs attention
edit- jenkins security release 2018-10-10
- https://phabricator.wikimedia.org/T206234
- thcipriani: moritz uploaded new version -- anyone want to pair? mukunda and tyler to pair
- gerrit security release 2018-10-08
- https://groups.google.com/forum/m/#!topic/repo-discuss/eH0iLt2XawU
- jGit update, we are unaffected
- may want to hold off until next week: https://bugs.chromium.org/p/gerrit/issues/detail?id=9836
- slide decks review
- Need to get Lars' key added to pwstore
- party time!
Scrum of Scrums
edit- Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums
Release Engineering
edit- Blocked by:
- need review from SRE/services Support a literal body for POST requests in `fetch_url`
- Blocking:
- Updates:
- Hired Lars Wirzenius
- Interviewing on-going for our Developer Productivity position: https://boards.greenhouse.io/wikimedia/jobs/1225258?gh_src=f15731e11
- Train Health:
- Last week: a few blockers, resolved in time, no problems - T191070 1.32.0-wmf.24 deployment blockers
- This week: No train this week due to DC switchover - T191071 1.32.0-wmf.25 deployment blockers
- Next week: the last 1.32 release, 1.33 starts the next week - T191072 1.32.0-wmf.26 deployment blockers
- Log Health:
- T204871 Deployments of MediaWiki with scap cause a spam of "web request took longer than 60 seconds and timed out"
- Code Health:
Callouts
edit- Release Engineering
- Train Health: no train due to DC switchover - T191071 1.32.0-wmf.25 deployment blockers
- Log Health: T204871 Deployments of MediaWiki with scap cause a spam of "web request took longer than 60 seconds and timed out"
Train status and happenings
edit
Quaterly Goals for Q2
editTEC1 (Maint): Outcome 1 / Output 1.1
edit- GOAL: Determine the procedure and requirements for an automated MediaWiki branch cut.
- WHO: Mukunda, Tyler, Antoine
- Locked down releases-jenkins -- too tightly, caused problem with icinga -- probably change check to check /login
TEC3 (Pipeline): Outcome 1 / Output 1.2
edit- GOAL: Formalize the collection of CI infrastructure and tooling metrics
- WHO: Dan, Antoine
- installed prometheus plugin for ci jenkins
TEC3 (Pipeline): Outcome 2 / Output 2.3
edit- GOAL: Develop set of metrics to assess incident reports/post mortems
- WHO: Greg, Zeljko
- T206622 Develop set of metrics to assess incident reports/post mortems
TEC3 (Pipeline): Outcome 3 / Output 3.1
edit- GOALS:
- Adopt more services into Deployment pipeline
- Migrate graphoid to the Deployment pipeline
- Deploy zotero v2 to the Deployment pipeline
- Deploy blubberoid
- Adopt more services into Deployment pipeline
- WHO: Dan, Tyler, Lars
- https://phabricator.wikimedia.org/T205919
- zoterov2 has a patch
- ideally release blubber v0.6.0 to make it a smaller patch (node_modules thing)
TEC12 (DevProd): Outcome 2 / Output 2.1
edit- GOAL: The Annual Developer Productivity Survey results are synthesized and shared, creating a first year baseline.
- WHO: Mukunda, Greg
- Mukunda sent to Legal to get a Privacy Policy for it.
- Should have a response from legal (re: privacy policy) sometime this week, need to start building the actual survey once that's in place.
TEC13 (Code Health): Outcome 1 / Output 1.1
edit- GOAL: Update/refresh review queue (review process for initial code deployment)
- WHO: JR
No progress
TEC13 (Code Health): Outcome 2 / Output 2.2
edit- GOAL: 5 of the 15 prioritized repositories have at least 1 end-to-end test.
- WHO: Zeljko
- T206621 5 of the 15 prioritized repositories have at least 1 end-to-end test
TEC13 (Code Health): Outcome 2 / Output 2.3
edit- GOAL: Assess Platform unit test practices and define improvement plan
- WHO: JR, Core Platform Team
Met with Corey and Cindy to further refine this goal.
TEC13 (Code Health): Outcome 3 / Output 3.2
edit- GOAL: Core Platform and Search Platform teams are using TDM PoC
- WHO: JR, Core Platform Team
Met with Corey and Cindy to furtrher refine this goal.
TEC13 (Code Health): Outcome 3 / Output 3.4
edit- GOALs:
- Identify key Tech Debt areas
- Put in place Tech Debt management process for PEP
- WHO: JR, Core Platform Team
Met with Corey and Cindy to furtrher refine this goal. They have already identified some of the key areas of Tech Debt that they are addressing in the PEP.
TEC13 (Code Health): Outcome 4 / Output 4.1
edit- GOAL: Metrics defined and deployed for all 4 Code Health areas.
- WHO: JR, Code Health Metrics Working Group
Working group has met a few times. Added a new workgroup member. Team shake out phase is done and moving towards making progress.
Other work
editSelenium
edit- T198389 Q1 Selenium framework improvements - moved remaining tasks to Q2 :(
- T206624 Q2 Selenium framework improvements
- T199133 Find top 15 target projects that could use Selenium tests to prevent incidents
Gerrit
edit- Upgrade gerrit to
2.15.42.15.5- may want to hold off until next week: https://bugs.chromium.org/p/gerrit/issues/detail?id=9836
Phabricator
edit- Need to get phab1002 ready with Daniel
Jenkins
editQA
editSCAP
editThe scap pre-deploy fatal error check isn't catching fatals. Mukunda and Tyler have started investigating - https://phabricator.wikimedia.org/T121597#4652873
Standup!
editAntoine
edit- What I plan to do this week
- Standing hardwood in dinner room
- paint entrance and dinner room
- first layer of painting in
- What I'm blocked on
- Other?
- On my spare time, started doc about enhancing how we runntests, specially filtering out tests coming from dependencies.
Dan
edit- What I plan to do this week
- Add Prometheus exporter to Jenkins instances
- Quibble docker instance running on CI instance for 6 hours
- Refactoring Docker JJB builders and implementing a `docker-reap-containers` publisher
- Blubberoid – create swagger spec
- What I'm blocked on
- Other?
Greg
edit- What I plan to do this week
- Quarterly Check-In slides
- Board presentation slides
- hiring
- TechConf session planning with Birgit
- localizationupdate.. update (tl;dr: won't be resuming l10nupdate nightly job until after we've collectively identified a suitable architecture/plan for it going forward)
- What I'm blocked on
- Other?
Jean-Rene
edit- What I plan to do this week
- Outcome 1 / Output 1.1 - Update/refresh review queue
- work on presentation for quarterly check-in
- QA Team Slideset
- Code Health Metrics WG tasks breakout
- What I'm blocked on
- Other?
Lars
edit- What I plan to do this week
- Read all the wiki pages
- What I'm blocked on
- Brain overheating
- Too many accounts to manage
- Other?
- N/A
Mukunda
edit- What I plan to do this week
- Figure out why scap fatal check isn't working
- Pairing with Tyler to hopefully solve this and get another scap release ready.
- Work with Daniel to get phab2001 / phab1002 in shape
- Take some time to answer questions for Lars if he has any confusions.
- Start building the dev productivity survey in google forms
- Figure out why scap fatal check isn't working
- What I'm blocked on
- Response from legal re: privacy policy
- Other?
Tyler
edit- What I plan to do this week
- Jenkins updates
- Finish up l10n-bot-watcher setup
- Add liw to pwstore/resign
- new blubber release (maybe -- Dan?)
- new scap release (maybe -- pairing with Mukunda)
- Gerrit avatar followups
- Releases-jenkins icinga stuff
- Moar keyholder review
- Docs for ORES github sync problem (with heavy disclaimer)
- What I'm blocked on
- Other?
Zeljko
edit- What I plan to do this week
- T199133 Find top 15 target projects that could use Selenium tests to prevent incidents
- T206466 Onboarding liw
- T204068 QA: Automation Testing - port Echo Notification tests to Node.js
- What I'm blocked on
- Other?
- T206620 Check 'Check endpoints for mwdebug2002.codfw.wmnet' failed: /wiki/{title} (Main Page) is WARNING: Test Main Page responds with unexpected body
- T204871 Deployments of MediaWiki with scap cause a spam of "web request took longer than 60 seconds and timed out"
- Celebrating 41 tomorrow! 🎉🕺👴
Grooming
editTeam Kanban Board Review and Triage
edit- closed and touched in the 7 days
- No update for 4 weeks
- No update for 3 weeks
- No update for 2 weeks
- No update for 1 week
- All Open
- Review To Triage column of #releng
Once / month-ish review of backlog(s)
edit- releng Review To Triage column of #releng
- releng-kanban Review unassigned in kanban
- releng-kanban Review 'backlog' colum of -kanban
- releng-next - Review for things we need to put on our kanban backlog
- releng-backlog - oh my, the huge backlog of things...