Wikimedia Release Engineering Team/SSD Sync Up/2019-07-02
2019-07-02
editLast Time: 2019-06-25
Deployment Pipeline
editTODOs from Last Time
edit- In progress thcipriani -- Pipeline image build cleanup for contint1001
- In progress thcipriani -- Base Blubber policy file for CI
- need to bump blubber version
- TODO (next week) jeena + thcipriani to pair update blubber 0.8.0
- In progress pipeline config validation
- brennen: patches coming Soon™
- TODO Pipeline docs
- I'm getting notifications that things are linking to Wikitech/Blubber
- https://wikitech.wikimedia.org/wiki/Deploying_a_service_in_kubernetes
- https://wikitech.wikimedia.org/wiki/Docker
- so folks are poking at the edges
- Not done thcipriani: to scope task
- I'm getting notifications that things are linking to Wikitech/Blubber
- contint1001 store docker images on separate partition or disk
- Dzahn has claimed.
Other Work
edit- No additional work expected in the next week. We have enough.
New CI
edit- v2 of CI arch doc: https://docs.google.com/document/d/1EQuInEV-eY_5kxOZ8E1qEdLr8fb6ihwOD9V_tpVFWuU/edit
- Only a few comments receieved, no significant changes. v3 someday, but not immediately.
- Lars is hacking up new components around GitLab (mostly independent of what CI engine we choose, GitLab is just the first one to be tried)
- Starting setup of components
- VCSWorker should be done this week; works now, locally, but needs a deployment to a test instance
- http://git.liw.fi/wmf-ci-arch/tree/vcsworker.py
- HTTP API
- Signed JSON web tokens
- Controller (conductor)
- Listens to Gerrit and triggers events in GitLab
- DeploymentWorker
- VCSWorker should be done this week; works now, locally, but needs a deployment to a test instance
- Starting setup of components
- GitLab + Gerrit Stream events
- Support will have to be written for stream events
- GitLab is written in Ruby
- lars: hopefully we will not have to touch GitLab code, but will use the API
- Future CI UI
- since any solution will be hidden from users, the UI must expose enough information to not frustrate our users
- Lars: don't want to expose gitlab even for logs is due to security -- zero day exploits in Read-Only mode -- Jenkins is a good example of this
- Migration Plans
- Argo, Zuulv3, GitLab
- 437 existing zuul jobs
- doc publishing, pipeline image publishing, code-coverage + codehealth, periodic jobs in beta, and (oh right) test jobs
- Existing pipeline
- Docker-in-Docker seems hard to get right
- Zuulv3
- "Nodepool" and politics (also zookeeper for some reason)
- antoine: difficult to deal with the long tail (historically)
- Migration draft zuulv2 to zuulv3: (thcipriani can't find patchset :()
- One idea: move from zuulv2.5 -> zuulv3, then further migration with CI working group
- Gets rid of Jenkins
- Gets rid of python2 sooner (hard deadline given python2 EOL of 1 January 2020)
- Not necessarily the end position.
- Discussion of current status quo, who owns maintenance of existing Zuul config (Antoine mostly, James and others some). This all being a nontrivial maintenance task.
- One idea: move from zuulv2.5 -> zuulv3, then further migration with CI working group
- Talk to SRE about Zuul v3 needs TODO
- migration plan -- we don't have one
Local Development
editTODOs from Last Time
edit- mediawiki/core blubber: Jam a shell script into the builder?
- Done ??? Decide on that post-Wednesday MW extensions meeting.
- TODO: the actual jamming of shell script
- TODO: make a phab task to describe changes needed to scaffold.sh in the charts repo to support local dev
- Done https://phabricator.wikimedia.org/T226660
- Done go ahead and make patch set https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/519485/
- Done poke SRE
- Not done (thcipriani to send email) thursday: SRE, how much can we break?
Other Work
edit- In progress Porting from local-charts to deployment-charts ( https://phabricator.wikimedia.org/T224935 ):