Data Platform Engineering
The Mission of the Data Platform Engineering team is to empower Wiki Communities and the Wikimedia Foundation to gain insights, conduct research, and build compelling user experiences, through access to privacy-aware data and data platform services. The Data Platform is a collection of systems and services providing end-to-end capabilities for our data producers and consumers to discover, use, and collect data. The group is led by Olja Dimitrijevic, Director of Engineering and Virginia Poundstone, Group Product Manager.
Our teams are:
- Data Engineering is responsible for the core capabilities of the data platform, including data storage, batch and streaming infrastructure, as well as distributed query engines. This platform supports ingestion of Wikimedia project content, web traffic, instrumentation, operational data and other datasets into the Data Lake. The team manages the foundational data pipelines, whereas the data producers manage their respective data pipelines and data products. The team's scope include data quality, observability, and discoverability. Collaborating closely with Research & Data Science, and other stakeholders, this team supports the development of various data products, such as curated datasets and instrumentation, while ensuring data management and modeling best practices.
- Experiment Platform team is responsible for delivering the A/B testing capabilities for the projects through an Experimentation Platform that provides experiment configuration, flagging, data collection, statistical engine and analytics reporting capabilities.
- Search Platform team is responsible for the Search features and APIs for MediaWiki. This includes the CirrusSearch extension, which relies on Elasticsearch, the search backend used to support Wikimedia projects. It also includes the Wikidata Query Service, the SPARQL endpoint used to query Wikidata. The team provides both a direct user experience around Search and an API on which higher level features can be developed
- Data Platform SRE team supports all of the above teams to manage their infrastructure, applications, and operations.
The Data Platform group is supported by principal engineers Adam Baso and Andrew Otto.
Mission
editOur Mission is to empower Wiki Communities and the Wikimedia Foundation to gain insights, conduct research, and build compelling user experiences, through access to privacy-aware data and data platform services.
What we do
editWe provide the infrastructure and services that empower our users to collect, discover, and use trustworthy data to derive insights, conduct research, and build new data products.
The data platform provides capabilities that include:
- Ingestion
- Storage
- Transform and serve
- Search and query
- Exploration and analysis
- Visualization and reporting
- Publishing and reuse
For more details see the Data Platform Overview or visit the Data Platform documentation on Wikitech.
Who we serve
editWe support the open-knowledge communities and the Wikimedia Foundation at large. Specifically:
- Wiki administrators
- Wiki readers and editors
- GLAM programs
- Analysts
- Researchers and machine learning practitioners
- WMF Trust and Safety
- WMF SRE and Traffic teams
- WMF Product feature teams
- WMF Fundraising
Contact Us
editPlease see the Intake Process page to make a request or contact one of our Product Managers.