Data Platform Engineering/Data Platform SRE/Status Update/2024-11-29

No update was sent last week, so this update covers 2 weeks!

Airflow migration to k8s

edit

All Airflow webservers are migrated to k8s. This brings some quality of life improvements:

  • reach their airflow UI via a public domain (no need for SSH tunnels)
  • manage roles and permissions via LDAP group management
  • get working links in alert emails

We've migrated the scheduler of our test instance to k8s. We'll need to replicate this work for all production instances, but at this point we are confident that this should work with only minor surprises.

A new T368033 automated DAG deployment process has been discussed, implemented, documented, and communicated. Merge requests to Airflow DAGs now require formal approval by a peer before being deployed.

Spark version upgrade (in support of Dumps 2.0)

edit

Replace Archiva with Gitlab artifact repositories

edit

Migration of the Search clusters to OpenSearch

edit

Operations

edit

We've had some disk space and number of folders issues related to changes in how we deploy Refine. The immediate issue has been resolved (big thanks to DC-Ops for a quick reaction on adding disks!). This needs to be further addressed and has been communicated with Data Engineering.

Hardware

edit

Access Requests

edit