Talk:Wikimedia Release Engineering Team/MW-in-Containers thoughts

Compiled configuration per wiki

edit

Currently Wikimedia creates a /tmp/ json file that it subsequently reads from until there is an mtime update to InitialiseSettings.php: I don't know how that works in a container. We could carry that forward; however, if it is not using a shared volume of some kind then each pod will have some startup cost: having to regenerate this configuration. Alternatively, regenerating via pipeline makes a long process even longer. TCipriani (WMF) (talk) 17:57, 27 May 2020 (UTC)Reply

My vision is that we stop making the temporary JSON files on-demand on each server, and instead pre-generate them (per-wiki or a single mega file, not sure) on the deployment server and sync the compiled config out in the scap process instead of InitialiseSettings.php. Then, in the container universe, this JSON blob gets written into pods through the k8s ConfigMap system, rather than as a file tied to a particular pod tag. Jdforrester (WMF) (talk) 22:11, 1 June 2020 (UTC)Reply
+1 -- makes sense to me, that's what I'd like as well. Some investigation needs to happen -- random server shows 941 json files totaling 74MB of config for each wiki version. TCipriani (WMF) (talk) 15:44, 2 June 2020 (UTC)Reply
Note that ConfigMaps have a limit of 1MB (actually it's a bit more than that, but it's best to stick to 1MB mental model). That stems from etcd having a max object size of 1MB (again a bit more, like 1.2MB, but I digress). So we aren't going to be able to use that approach to inject that content into the pods (unless we split it into many many ConfigMaps).
We could alternatively populate the directory on kubernetes hosts and bind mount it to all pods (especially easy if it's read only from the container's perspective). But then we would have to figure out how to populate it on the kubernetes nodes, which is starting to imply scap. AKosiaris (WMF) (talk) 15:37, 10 June 2020 (UTC)Reply
Yeah. :-( Theoretically we could do one ConfigMap per wiki, but that means a new default setting would need 1000 ConfigMaps to be updated, which suggests a race condition/etc. as it rolls out. Jdforrester (WMF) (talk) 07:00, 11 June 2020 (UTC)Reply
Does any one JSON file approach 1M in size? K8s has a "projected" volume feature that allows multiple volume sources (including ConfigMaps) to be mounted under the same mount point, so ostensibly we could have one ConfigMap per wiki but still have them under the same directory on a pod serving traffic for all wikis. Still a bit cumbersome from a maintenance perspective perhaps but it might work around etcd's technical limitation. DDuvall (WMF) (talk) 16:03, 13 August 2020 (UTC)Reply
Does any one JSON file approach 1M in size?
Nope. The biggest one currently is:
108K /tmp/mw-cache-1.36.0-wmf.4/conf2-commonswiki.json TCipriani (WMF) (talk) 22:14, 18 August 2020 (UTC)Reply
What about doing the compilation in CI, and not during scap deployments? Would that be feasible? Maybe later? LarsWirzenius (talk) 15:51, 2 June 2020 (UTC)Reply

What about doing the compilation in CI, and not during scap deployments? Would that be feasible? Maybe later?

Doable, but makes merges a mess unless we write a very special git merge driver, and it bloats the git repo with these files, which can get relatively big as Tyler points out. 🤷🏽‍♂️ Jdforrester (WMF) (talk) 15:56, 2 June 2020 (UTC)Reply
One thing I would like, regardless of CI compilation, is a way to roll these back quickly: this is the advantage of having them generated on demand currently, and one thing that generating them on the fly at deploy time would slow down (maybe). TCipriani (WMF) (talk) 16:00, 2 June 2020 (UTC)Reply
I had some ideas on configuration provided each pod is dedicated to a single wiki (which might be nice if we wanted to scale based on traffic per wiki).
I thought it would be ideal if we could inject the configuration (overrides) into the pod at deploy time, but it seems like there's too much configuration to do that.
We could add the configuration with a sidecar container. That could assist with the size issue of the configuration and the rolling back issue as well, I think. JHuneidi (WMF) (talk) 21:30, 7 July 2020 (UTC)Reply
I don't think we will go with one wiki per pod, that would be extremely impractical as we would need to have 900 separated deployments.
When we start the migration, we will probably have one single deployment pod, and then at some point we might separate group0/1/2, but I don't see us going beyond that.
There are other practical reasons for this, but just imagine how long the train would take :) GLavagetto (WMF) (talk) 13:16, 9 July 2020 (UTC)Reply
Yeah, 900 is a lot, but I don't think that is such a problem for Kubernetes. I thought we could have an umbrella chart with the 900 wikis, and then update the image tags and install with helm once. I have never tested helm with such a large chart but since the release info is stored in configmaps/secrets(helm 3) I guess we could run into a size limit issue there so...maybe you are right :P
I think the sidecar container with configuration is feasible for a single deployment as well, though JHuneidi (WMF) (talk) 20:17, 10 July 2020 (UTC)Reply
My current thinking is:
  1. Give MediaWiki a way to load settings from multiple JSON files (YAML could be supported for convenience, but we probably don't want to use it in prod, for performance reasons). This needs to cover init settings and extensions to load as well as the actual configuration.
  2. pre-generate config for each of the 900 wikis when building containers (maybe by capturing the result of running CommonSettings.php). Mount the directory that contains the per-wiki config files (from a sidecar?). Let MediaWiki pick and load the correct one at runtime.
  3. pre-generate config files for each data center and for each server group (will all the service addresses, limit adjustments, etc). Deploy them via ConfigMap (one chart per data center and server group). Let MediaWiki load these at runtime.
  4. Let MediaWiki load and merge in secrets from a file managed by Kubernetes.
  5. Use a hook or callback to tell MediaWiki to merge live overrides from etcd at runtime.
  6. Extract all the hooks currently defined in CommonsSettings.php into a "WmfQuirks" extension
How does that sound? DKinzler (WMF) (talk) 07:36, 5 October 2021 (UTC)Reply

Every commit vs branch cut

edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


One thing that is still undecided about the "automatically packaged" part of this document is how often we'll make a container. You point out that we have more commits already than could be deployed continuously -- currently that's handled via the train acting as pressure release. As a first iteration: that might be preferable/might be worth mentioning. TCipriani (WMF) (talk) 17:59, 27 May 2020 (UTC)Reply

If we need 45 minutes to build/test a new container, and we get a change every 17 minutes, we can't do this on every commit. I think we need to do things: a) do a time-based build/test: every hour, on the hour b) work on reducing the wall-clock time to build/test a container. LarsWirzenius (talk) 13:33, 29 May 2020 (UTC)Reply
My starter-for-ten for this was maybe building an image every 24 hours, maybe at 04:00 UTC (our current trough of new commits). It'd be imperfect, but it'd reduce pressure significantly. However, this also reduces the ability to roll things out to only once a day, of course. Jdforrester (WMF) (talk) 16:37, 29 May 2020 (UTC)Reply
Current limitation is once per week, so once per day seems like an improvement. We could iterate from there. TCipriani (WMF) (talk) 16:09, 1 June 2020 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

On number of versions and rolling back

edit

Here's an idea, possibly too crazy: we build new container versions, in sequential order somehow: v1, v2, ...


We deploy each a new version, and after it's run acceptably with production traffic for time T, we label is golden. Any non-golden version can be rolled back (possibly automatically, based on error rate or UBN; possibly manually be RelEng/SRE/CPT/....). If rolling back one version isn't enough, roll back further, until newest golden version.


Every roll back results in an alert to RelEng, SRE, CPT, and anyone with changes since the newest golden version until the rolled back version. LarsWirzenius (talk) 13:43, 29 May 2020 (UTC)Reply

Definitely agreed on the "golden" label with possible auto-rollback, but sometimes golden labels turn sour over time, either temporary (e.g. a configuration setting for which endpoint the DBs are loaded from) or permanently (e.g. a feature is intentionall removed), so we may need more humans in the loop than ideal. Jdforrester (WMF) (talk) 21:59, 1 June 2020 (UTC)Reply

Exclusion of operations/mediawiki-config's InitialiseSettings.php from the "k8s" pod

edit

What's the reason behind that? It feels kind of weird that CommonSettings.php is in there and InitialiseSettings.php isn't going to be there.


Furthermore, how do we plan on making available to the configuration from InitialiseSettings.php to pods? AKosiaris (WMF) (talk) 15:42, 10 June 2020 (UTC)Reply

Ah, I see https://phabricator.wikimedia.org/T223602, that answers it AKosiaris (WMF) (talk) 15:44, 10 June 2020 (UTC)Reply

"On some trigger, the new pod is added into the production pool and slowly scaled out to answer user requests until it is the only pod running or is removed"

edit

Scaling out happens in kubernetes by increasing the number of pods so this sentence needs some rewording. Deployments just spawn up a batch of new k8s pods (25% by default, configurable though) and kill the same amount of pods from the previous deploy. Rinse and repeat in an A/B fashion until the entirety of the pods have been replaced. AKosiaris (WMF) (talk) 15:53, 10 June 2020 (UTC)Reply

Ah, yes, will re-word. Jdforrester (WMF) (talk) 07:01, 11 June 2020 (UTC)Reply
Is this change sufficient? Jdforrester (WMF) (talk) 10:30, 11 June 2020 (UTC)Reply
Yes, I think so. I am still a bit unclear on the ** How does we tell the controller (?) to know to which deployment state to route a given request? part, but judging from the "controller" having a question we probably want to define that first AKosiaris (WMF) (talk) 14:04, 11 June 2020 (UTC)Reply
That's wrapped up in the decision about how we want the A/B split of traffic to the new deploy – do we just do it uniformly randomly across all request types (standard k8s behaviour), or do we want do something closer to what we do now (roll out by request type sharded by target wiki), or something else. I've left that open as I don't think it's been discussed. Jdforrester (WMF) (talk) 15:03, 12 June 2020 (UTC)Reply

mediawiki-config size and Logos

edit

I 've had a quick look into mediawiki-config and it's size. So, I see the following


61MB, 50 of which are images, 47 of which are project logos. Is there a reason we ship project logos as configuration?

And most importantly to stay on track, how are we going to ship all those logos in the brave new k8s world?


du -h --exclude=.git | sort -rh |head -30

61M .

50M ./static/images

50M ./static

47M ./static/images/project-logos AKosiaris (WMF) (talk) 15:44, 24 June 2020 (UTC)Reply

We ship them as config because they're non-variant on flavour of MediaWiki, we want them to be statically mapped, and we want to be able to change them swiftly. I think moving them into the containers (so new logos take a few hours / days to roll out) is probably acceptable, but it's a fair regression in flexibility of config. Jdforrester (WMF) (talk) 16:27, 24 June 2020 (UTC)Reply

Objective: MediaWiki* is automatically packaged into a k8s pod**, which is semi-automatically deployed*** into Wikimedia production

edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


I just noticed this, I wonder how my pedantic inner self did not complain already about terminology


Anyway, may I suggest we switch the wording to:

Objective: Mediawiki* is automatically packaged into (one or more) OCI container images, which are semi-automatically deployed** into Wikimedia Production as kubernetes pods


I am on purpose putting there the (one or more) as we shouldn't restrict ourselves and having >1 OCI container images might turn out to solve problems we still have not foreseen. AKosiaris (WMF) (talk) 15:47, 24 June 2020 (UTC)Reply

Oh, sure, good point. Will fix. Jdforrester (WMF) (talk) 16:28, 24 June 2020 (UTC)Reply
(Done.) Jdforrester (WMF) (talk) 18:36, 24 June 2020 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.
Return to "Wikimedia Release Engineering Team/MW-in-Containers thoughts" page.