Platform Engineering Team/Event Platform Value Stream/Assess what is required for the enrichment pipeline to run on k8s

This page summarizes the learnings of T315428

[SPIKE] Assess what is required for the enrichment pipeline to run on k8s edit

Author: Gabriele Modena <gmodena@wikimedia.org>

Bug: https://phabricator.wikimedia.org/T315428


To bridge the gap between dev and prod environments we would like to run jobs on k8s. Our use case is described Use case: compute needs for streaming pipelines.

The goal of this Spike is to determine if local or WMF Cloud based k8s instances can be suitable environments for learning, experimentation and development.

We would like to collect info to make an informed decision about the following:

  • do we want to invest resources developing k8s capabilities for development productivity and testing?
  • do we want to invest resources improving our release and deployment cycles targeting yarn?

The two are not mutually exclusive. Discarding this work for now is ok too.

Summary edit

I explored with adjusting the k8 workshop to Apache Flink. It boils down to running Flink on minikube. This can be done locally, without the need of a cloud vps vm.

Following are some consideration to bring into the next grooming seession.

I'd say that Could VPS would not buy us much, other than _potentially_ granting multi users access to a self-hosted minikube - or expose a public facing service. I don't think we want to go down the path of maintaining either (for dev workflows).

Setting up minikube is a well documented and straightforward process (at least on macOS/linux).

For running Flink on k8, I explored two paths:

  1. Adjusting the Search flink-session-cluster helm charts.
  2. Using the recently release Apache Flink Kubernetes Operator.

While for production use cases we should clearly adopt 1), both approaches offer interesting angles for experimentation and local development.

Path 1) requires a Docker image and decoupling the charts from the specific use case and WMF envs (https://github.com/wikimedia/operations-deployment-charts/blob/master/charts/flink-session-cluster/values.yaml). We should consider contributing to a generic enough config, and make the setup more self service for developers (that want to run things on minikube).

Path 2) was easier to setup "out of the box". Setting up Cluster deployments that can accept Job submission either interactively or programmatically is well documented https://github.com/apache/flink-kubernetes-operator/tree/main/examples. The tutorial at

https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/try-flink-kubernetes-operator/quick-start/ gives the basic building blocks for setting up a Flink Cluster ready to accept jobs.

References edit