Data Platform Engineering/Data Products/work focus
Data Products Goals edit
At a high level, our team is currently focused on two hypotheses within the WMF Product & Technology FY 23/24 annual plan
- 2.5.1: "If we develop a data contract composed of schema fragments and a consistent cross-platform API, we can reduce the number of steps required to instrument for an experiment and produce consistent data used across teams and experiments."
- 3.4.1: "If our trusted datasets are all in the same place following the same conventions in dimension semantics, naming, and granularity considerations; it will be easier to combine and extract the data and serve data that can be easily evaluated in terms of privacy."
We are also working on the committed work of Commons Impact Metrics and other essential work to maintain and and decrease maintenance burden on systems we steward.
Sprint Goals edit
The goals for current sprint are (23/10/24 - 23/11/214)
- [HIGHEST] Commons Impact Metrics: Prep for GLAM Wiki Conference
- [HIGH] SDS 2.5: Core Interaction API Design, Implementation & Documentation
- [MEDIUM] Transition to 50/25/25 capacity structure
- [LOW] Sunset AQS 1
Past Sprints edit
- SDS 2.5.1: Prepare to onboard the rest of the team
- Traffic to all six services routed to AQS 2. AQS is ready to sunset.
- Technical strategy for Commons Impact Metrics prototype including implementation draft
- Dumps 2: Bring to complete or pause with a plan for future.
- Knowledge gaps: pause until we open work on SDS 3.4
- At least one client library is refactored to include the new data contract (core schema and scheme fragments) and an existing instrument is prototyped [receiving live data?]
- Did not yet
- Almost at two client libraries refactored
- Merge requests not quite landed
- [Continue] Generate XML dumps for simplewiki
- Not yet
- XML generated with everything but data quality issues form input
- How we import is remaining work
- 100% of traffic routed to Media, Pageviews [Edit and Editor Analytics next]
- Media done 🎉
- Pageviews is waiting on SRE
- Knowledge Gaps Index metrics receive production traffic
- Waiting on SRE
- Data dumps transition has been clearly communicated across stakeholders
- Done 🎉
23/08/28 - 23/09/08 edit
Generate XML dumps for a simplewiki
Core interaction schema and schema fragments are prototyped and tested in preparation for updating metrics platform client libraries next sprint
100% of traffic routed to Geo and Media Analytics
Identify and mitigate risks associated with MediaWiki History pipeline