This discussion took place at the 2019 WMF All-Hands at the Bently Reserve.

Last Time

Current Quarter Goals

TEC3:O6:O:6.1:Q3: Deployment Pipeline Documentation
TEC3:O3:O3.1:Q3: Move cxserver, citoid, changeprop, eventgate (new service) and ORES (partially) through the production CD Pipeline

General

ideal flow through the pipeline from the developer perspective and/or mapping the current flow from project inception to production.
More concrete idea: talk about what's missing for CDep Blubberoid to production
- Alex: the problem with option B is that it changes things drastically
- Joe: MVP for developers so we need to provide a decent experience, focus on what that is in 2 months time
- Greg: coming up with an ideal workflow depends on what the endpoint is, i.e., a CDep is going to be a different experience
- Joe: How many interactions do people need to have in order to get something through the pipeline. Right now it's coming to us to figure out what to do.
- Greg: Avoid the cargo culting.
- Tyler: This dovetails with conversation at the RelEng offsite. How many points of contact and how can we reduce that? Bringing in a bigger view would be useful.

Summarizing RelEng exercise:

Ideal next step: Like toolforge but for production.
Dev requests a project that goes into a proper namespace on Gerrit.
Sets up CI, etc.
On repo creation adds a dotfile that configures pipeline.

Discussion:

Dan: What's correct form of feedback for a developer?
Alex: Gerrit is the thing that developers interact with, so that should be the thing that users interact with, we shouldn't make developers click through to several sites
Joe: This is a common problem with how we report feedback to gerrit, but the amount of indirection means this is a bigger problem that it actually is. There's more interaction and it's a more complicated set of jobs
Mukunda: Deciphering console output is a mess.
Joe: Summary: creation of a pipeline should be automatic as soon as someone puts a .pipeline/config into their repository. Feedback from the pipeline needs to be better. Not have the link many pages down in gerrit.
James: the "standard" is github, you get a comment form abot, you clikc that, you see travis, you see red X, you click that you read that.
Travis output isn't that great either, basically.
Alex: for the failure scenario is fine to send people down a deep path, but in a success scenario we need something simple
Joe: Do we publish an image for each successful merge? (Yes)
Alex: We publish for each successfully merged commit.
Joe: for whatever we merge we should get back the url of the artifact for the image
Lars: if you change the interface so that the link to the artifact is in the metadata area (???)
Dan: What's the MVP for a feedback mechanism in the short term?
Alex: docker image plus version, also nice to have a link to the entire pipeline state so you know it's step 1 vs step 2.
Alex: In Gerrit you konw where you are in the process of getting it deployed. "I'm in step2 of 5 steps to production"
Dan: adding a nother label to Gerrit would be simple, like an "image built" label with links to the docker register url
Tyler: Summarize of ideal workflow now:
- How do you request a project currently? A task. For now keep that for the MVP.
- Somehow get the url of the image into the Gerrit UI on successful build, and a link to the successfull run
- QUESTION: do we want to change the image creation process?
- sidenote: no image per patchset :)
Joe: retention of our (old?) images needs an answer.
Joe: if we move to CDep it'll be impossible to store, for most thing smoving that direction, keep the latest N versions
Joe: questions of the workflow in the CI pipeline
- developer want to build a nodejs project in the pipeine, are there things I need to do that are different here than what I used to do?
Mukunda: not much, just the .pipeline config
Dan: the blubber config has the entry point
Lars: we give a number of options to choose from, otherwise we end up with 100 projects copy/pasting but there turns out to be an issue so we have to upgrade them all
Joe: ... less free blubber templates...
Dan: blubber has proven to be flexible, which is good, without much modification at all. the importance of explicitness and tie in entrypoints/dependencies. Hesitant to make it more contrained than it is.
James: we have CI entry points across 2000 repos, we have bots to sync them together, not too worried about it being c/p and fix it later.
Lars: I'm convinced ^
Joe: there is a value in using containers so that developers are contained :)
Joe: would it be possible for someone to build their blubber image starting from an image not in our registry
Tyler/Dan: yes
Tyler: however we have a policy file that was built for this scenario
Joe: let's make it clear so that CI uses that policy file
Dan: the pipleine job references it which is centrally located (and away from developers ;) And you can get really specific with it.
Lars: we can make it dow what we need. We should allow our developers do something useful without constraining them too much.
Lars: For an MVP of CDep, we need to get it started and then iterate.
Joe: we just want to build images that start from ones we (bless)
Lars: we need to know what versions of what is in each image
Joe: that will be a part of debmonitor (as planned)
Fabian: sometimes updates have to be done, what happenes when update Debian, we need to figure out the underlying serbvices
Joe: we will know because when we build an image through the pipeline we submit it to a thing that analyzes it with debmonitor. How do we update those images after building? TBD.
Tyler: we have atask about mass rebuilding all the images
Tyler: to answre your nodejs developer question:
- https://wikitech.wikimedia.org/wiki/Blubber
- https://wikitech.wikimedia.org/wiki/Blubber/Tutorial/HelloWorld
Joe: James needs to convince audiences to migrate to it
Joe: from SRE's side, what does a developer need to do..
Alex: you get your image, you're happy, the pipeline deployed it CI staging...
- ssh deploy1001, scap-helm, 100lines of bash, give it an image version, it deploys, to eqiad, or staging
- the user interface includes setting things via ENV variables
- moriel has already used it herself
- it's ugly UI
currently evaluating replacing it with helmfile ( https://github.com/roboll/helmfile )
- TODO pipeline should use helmfile
things devs can't do: LVS, DNS, etc
Lars: there are few review points: eg: does this project make any sense to Wikimedia? needs a security review?
Joe: how it's done now ^
Lars: not just security but also SRE, design an implementation that's suitable for production
James: will the helmfile configs be in the repo itself or somewhere else? pros and cons...
Joe: <argues for centralization>
Alex: operations/deploymentcharts
Joe: to get into production
create a helm chart via scaffold script in the deploy-charts
review from SRE, setting up DNS, load balance it
Greg's arms are getting tight... slowing down with note taking
thcipriani picks up the batton!
Dan: We could make this part of our setup skaffold project, i.e., filing a task, what the skaffold script creates is probably confusing for newcomers
James: how often are people going to do this?
Joe: if we make the process good enough then probably we would see more services to be creates, but I think making a few requests is a Good Thing
Lars: in 1996 someone wrote a packaging helper and we went from a very small amount of packages to 600 packages

Beta

Joe: has a solution, running an image in docker
antoine: devs want to test stuff in beta with an updated service image in staging ask the backend to their thing
Joe: open staging to public internet
Alex is sad about that idea
Dan: deploy to the service namespace in automatically as part of the pipeline
Tyler: Can we have a k8s in labs for labs use
Alex: BGP, LVS, Calico -- none of these things exist in labs
Exposing staging to beta cluster would require staging to be open to the public internet
TODO we'll need some way to update this automagically in beta...restart and pull

next steps

Lars: If dan and lars want to do this? what do we do next?
Joe: we're a bit behind on SRE side
Lars: not general person, me and dan :)
Joe: oh ok :)
Alex: missing a token in production
Dan: try to include the reporting back to gerrit (image uri etc)

TODOs

TODO: write blubber policy to ensure that we're using only wmf base images
TODO: file task to automagically create job from seed job
TODO: file task about automagic setup of pipeline on .pipeline/config.yaml creation
TODO: continuous deployment, what's missing? a k8s api token on contint1001
TODO: support documention like the one tyler did for the portal and pipeline/helmfile and deployment

Wikimedia Release Engineering Team/Deployment pipeline/2019-01-29

Contents

Last Time

Current Quarter Goals

General

Beta

next steps

Other questions

TODOs

RelEng

Serviceops

Services

As Always