Today, there are many barriers to the use of Wikimedia application data for analysis, decision-making, intelligence, and applied data science. These include lack of shared information describing the data, varying methods of access and access control, distributed and unclear data stewardship, technical and architectural impedance mismatches, unclear responsibility for data policy enforcement, etc. Although we have several teams around the organization performing data analysis and using data for a variety of purposes, their capacity is limited by these barriers.
Our purpose is to address these problems, at a scope and scale that crosses organizational boundaries, to establish a home for clear answers to questions about data access, accountability, and organizational policy. And to dissolve the barriers, enabling and empowering the data capabilities of the entire community (staff, volunteers, and external users of data).
By establishing the data governance capabilities described in Key Result 1, we provide the organizational structure to manage data at the Foundation level. In fulfilling the use cases described in Key Result 2, we demonstrate the ability to deliver capabilities that have previously been stymied by the barriers described above. And in Key Result 3, we transform our machine learning capabilities to be modern, standardized, flexible, scalable, and transparent.
To fulfill these goals, we must create a data strategy that clearly articulates how enhancing the data management capabilities of the Foundation enables us to better support Movement and Foundation strategy and to better measure our own performance and capabilities in the intersection of systems, programs, and people.