Kubernetes SIG/Meetings/2024-11-26
Agenda:
- Introductions for new members (if any):
- SIG administrivia:
- Misc:
- Multiple topics from DPE above, anyone willing to present something to the group?
- Topics:
- Update on containerd migration
- https://wikitech.wikimedia.org/wiki/Kubernetes/Administration/containerd_migration
- Wikikube-staging, aux completed
- Wikikube ~15%
- Update on the Kubernetes >1.25 upgrade
- Aiming for k8s 1.31
- So far not looking too scary
- Handling inbound IPIP traffic on low traffic LVS k8s based realservers
- We’re planning on lowering the MTU for all pod traffic via calico instead of iptables rules/eBPF (tcp-mss-clamper)
- LVS for Postgres Maps read replicas - https://phabricator.wikimedia.org/T322647
- Tegola is using envoy as tcp load balancer for postgresql
- More clean/defined way of depooling by using LVS?
- Proposal to manually/periodically clean up the Docker Registry - https://phabricator.wikimedia.org/T375645
- There is a script that walks though swift storage and could delete objects which refer to images that don’t have any tags (e.g. deleted via registryctl)
- Concerts were raised because we don’t know for sure that removing the objects in swift won’t cause any side effects in docker registry
- For now this is not urgent
- It would help bring storage usage down (~6TB currently)
- That in turn would help migrating to a different data store in the future when eventually redesigning the registry
- Update on containerd migration
Notes
- DPE might be happy to put some stuff on the table for a session in 2025 around Ceph/CSI stuff
- Code review - How are teams deciding when self +2 is acceptable and when it isn’t.
- JM: down to each engineer to decide about each patch; does it hurt to wait etc.
- BT: We are working on some airflow stuff and are doing some full CD
- Overall consensus seems to be ‘+2 where it makes sense and you feel like you know the blast radius/what can go wrong’. We don’t have explicit guidelines
- BT: DPE has recently implemented a (semi) mandatory +1 policy for the airflow-dags repository: airflow-dags!933 and T368033#10358512 - this comes about because the Airflow migration to dse-k8s uses continuous deployment, instead of scap deploys.
- Upgrade: targeting version 1.31 of Kubernetes (from 1.23)
Action items
- BT to coordinate something around Ceph and CSI drivers for the next meeting