User:Milimetric (WMF)/Hackathon/Mediawiki history

This is a very rough, top-of-my-head proposal, feel free to improve/change/pivot.

Background edit

Many tools on our cloud infrastructure access data through the cloud replicas. These are using the same schema as mediawiki uses in production. We've been wondering for a while if some of these tools would be better served by different schemas. We (wikitech:Data Engineering) built the mediawiki history dataset to serve research and data analysis use cases, but we always imagined it might help tools get to data they need.

Description edit

In this session, it would be fun to work with aspiring, novice, and veteran tool maintainers to understand their data needs. We can try to query the mediawiki history dataset to see how it compares to the production schemas. This would be very helpful already. And we can use the opportunity to go over the mediawiki history dataset, it can be daunting at first glance.