Platform Engineering Team/Event Platform Value Stream/Use case: Event Platform SDLC practices

This page is a RFC. Please add your comments to Talk:Platform Engineering Team/Event Platform Value Stream/Use case: Event Platform SDLC practices

Published: 2022-09-15.

Internal discussion draft (read-only): https://docs.google.com/document/d/1okEcCs2qiWJOtB8iHldX1mkoesgzB3dZK5xM9jEfIgE/edit#

Use case: Event Platform SDLC practices edit

Author: Gabriele Modena <gmodena@wikimedia.org>, 2022-09-05

In Event Platform we need to establish software development life cycle (SDLC) processes to orchestrate the build and releases of our artifacts (applications, libraries and services).

We are in the process of taking ownership of a number of codebases hosted on Gerrit and Gitlab. As part of our Team Charter we are reviewing the maturity and support level of CI/CD capabilities and processes currently in place.

Desired workflow and current state edit

 
Figure 1. Desired end state for SDLC

A push (or merge) to a git remote should trigger a CI pipeline. Upon successful builds, an artifact should be generated with a version / tag that indicates its status (e.g. RELEASE vs SNAPSHOT). Deployments should be automated and predicated upon code review. Eventually, software artifacts (at different stages of the release cycle) should be published to a repository (e.g. Gitlab maven, docker registries).

Repositories edit

Currently our codebases are stored in Gerrit (existing projects) and Gitlab (newly created projects). We lack process uniformity for contribution, code review practices and deployment processes. Some teams use a +2 model for approval in Gerrit. We should evaluate how this scales with our team norms and establish practices for Gitlab repositories. Other models might be a better fit (e.g. a MR must receive two distinct +1).

Continuous Integration edit

We have established CI practices that rely on internal docker images and build steps. Integration testing, however, still requires developers to either run software locally or manually deploy artifacts to remote environments.

 
Figure 2. Cross project dependency. Service B might depend on different versions of Library A at different stages of its lifecycle.

Firgure 2. Cross project dependency. Service B might depend on different versions of Library A at different stages of its lifecycle.Lack of automation for artifact publishing might break cross project dependencies. It also impacts development velocity, as well as code review effectiveness. It’s essential to improve the feedback loop, by shortening the time between code being committed to it being available to test (or, in case of a library, as an artifact).

Deployments edit

We lack well defined development, staging and production environments. Different components might target different technology stacks, with different levels of support and process maturity. This is a particular pain point for streaming applications built atop Apache Flink  [1].

Next steps edit

In order to improve work efficiency and interfacing with other teams (e.g. by means of SLOs) our Team Charter should provide well documented SDLC practices. As next step we should document Gerrit processes, and compare them with other workflows for Gitlab codebases [2][5][6][7]. Questions we seek to answer:

  • How do we carry out deployments?
  • How do we build the capability of running multiple versions of a service/application/library at different stages of its lifecycle (development, staging, production)?
  • What does our contribution model look like?
    • Do we provide standard development environments (tooling, automation)?
    • What is our code review process? We should have a CONTRIBUTION.md template for our repos.
  • What is the versioning and release process for software artifacts?
  • What is the versioning and release process for services?

References edit

  1. Use case: compute needs for streaming pipelines
  2. https://www.atlassian.com/git/tutorials/comparing-workflows
  3. https://wikitech.wikimedia.org/wiki/Deployment_pipeline
  4. https://wikitech.wikimedia.org/wiki/Analytics/Systems/Airflow/Developer_guide
  5. https://docs.gitlab.com/ee/topics/gitlab_flow.html
  6. https://www.mediawiki.org/wiki/Gerrit/Tutorial/tl;dr
  7. https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Deploy/Refinery-source

Appendix edit

Components that are owned / might be owned byEvent Platform. WIP.

Component Type CI Deployment Environment CD
eventgate-wikimedia (4 deployments in prod k8s) Service Gerrit+Jenkins Dev Vagrant? minkube? Local
Staging k8 Manual
Prod k8 Manual
EventGate NodeJS Library Github+Travis Dev ? Manual (npm publish)
EventBus MW Extension Gerrit+Jenkins
wikimedia-event-utilities Java Library Gerrit+Jenkins Dev SNAPSHOT tag Manual trigger?
Staging ?
Prod Archiva RELEASE tag Manual trigger?
EventStreamConfig MW Extension: Lib and (action) API endpoint Gerrit+Jenkins Dev ? ?
Staging ? ?
Prod ? Manual trigger?
Schema Repositories (1, 2) and Service Repository / Service Gerrit+Jenkins Manual
jsonschema-tools NodeJS library Github+Travis Manual (npm publish)
Mediawiki Stream Enrichment PoC Application Gitlab Dev Local, YARN No
Staging ? No
Prod ? No
Image Suggestions Feedback PoC Application Gitlab Dev Local, YARN No
Staging ? No
Prod ? No