Analytics/Epics/Pageview API
For the documentation of the current pageview API, see: wikitech:Analytics/PageviewAPI.
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. |
Goals
editWikipedians need a reliable and accurate API for querying page views for articles. This epic describes the steps that need to be taken to build such an API. Initially, this epic will focus on the underlying infrastructure (e.g. kafka/hadoop) that needs to be built for this purpose. This Epic is definitely not finished and will be expanded with more requirements about the front end as the back end work progresses.
Detailed Tracking Links
editTBD
Users
editUser | Description |
---|---|
Product Managers | The people who are researching, designing and iterating on the page view metrics |
Researchers/Analytics Developers | The people who define the various page view metrics |
Analytics Developers | The people who write the software that produce the metrics |
Analytics Operators | The people who ensure the software is running and the data is updated |
Management | WMF who make decisions based on the results of the data |
Community | The wikipedians who look at the data to assess their success and the health of the community and their pages |
Readers | The people who read wikipedia |
Prioritized Use Cases
editHigh Priority
edit- As a Wikipedian, I need an API that allows me to query various page view stats
- As a Reader, I want any PII (IP address, UA, etc) to be removed from my page view information
- As a Product Owner, I want page views to be geo-coded at a country level
- As a Product Owner (and a lot of other stakeholders), I want raw logs to be deleted within 90 days
- As a Product Owner, I want page views to conform to a community reviewed definition
Later
editNon functional requirements
edit- Data should be updated daily, with hourly granularity
Additional information
editWe've done some planning with tech-ops documented here: List of tasks for backend work