Data Platform Engineering/Data Engineering

Responsibilities

edit

The Data Enginering team is responsible for the core capabilities of the data platform, including data storage, batch and streaming infrastructure, and distributed query engines.

This platform supports ingestion of Wikimedia project content, web traffic, instrumentation, operational data and other datasets into the Data Lake. The team manages the ingress data pipelines, whereas the data producers manage their respective data pipelines and data products.

The team's responsibilities also include data quality, observability, and discoverability.

The Event Platform has been merged into this team.

Planning & Goal setting

edit

The current quarterly plan (Q2) can be viewed here.

And the corresponding OKRs are tracked in Asana.

Backlog & Sprint Backlog

edit

The backlog and current sprint work of the Data Engineering team is tracked in the Data Engineering & Event Platform Phabricator board.

New backlog items are triaged every week. The current Sprint cadence is 3 weeks.

Technical Documentation

edit

We are currently working on organizing our documentation. Meanwhile have a look at | Data Engineering

Contact Us

edit

Please see the Intake Process page to make a request or contact one of our Product Managers.