Data Platform Engineering/Data Platform SRE/Status Update/2024-12-13
Airflow migration to k8s
editWDQS Graph Split
edit- T379330 Create pybal pools for wdqs-internal-main and wdqs-internal-scholarly (internal SPARQL endpoints for the graph split are ready, we're waiting until January to reconfigure the clients to use them)
DB replica
editGraphite deprecation
editHardware
edit- T378030 Q2:rack/setup/install wdqs102[567] (new servers racked and with initial install, but not active yet)
- T376166 Q2:eqiad:(2)cloudelastic hosts - custom config (new servers received by DCOps)
- T376165 Q2:codfw:(6) elastic hosts - Config D (new servers received by DCOps)
- T376670 Q2:eqiad:(3)Refresh of wdqs101[1-3] (new servers received by DCOps)
Misc / Operations
edit- T377134 Create and distribute a flink base image with flink 1.20.0
- T381283 wdqs1025 fails to PXE boot, NIC shows "no link" in DRAC web UI
- T380258 Create an Airflow instance for ML (The work on the migration to k8s is paying of already! After some time to define the exact requirements, it took less than a day to create that new instance!)
- T381961 Increase the capacity of /var/lib/archiva on archiva1002.wikimedia.org
- T377266 DSE kubernetes namespace for llm-inference
- T380835 Exclude zram devices from disk health checks