Core Platform Team/Initiatives/Dependency Tracking

Initiative Description

< Initiatives

Summary

Create a system for storing the dependencies of generated artefacts (on other artefacts and primary resources). When one resource or artefact changes, that system should be notified, and generate notifications that cause any dependent artefacts to be invalidated and/or regenerated.

Significance and Motivation

MediaWiki as configured on the Wikimedia cluster generates a number of different artefacts, directly or indirectly based on page content and other primary resources, such as page titles. MediaWiki implements tracking of these dependencies and the corresponding purging mechanisms in a variety of ways. Any extension that introduces a new derived artefact will need to implement its own tracking and purging mechanism. This not only causes much code complexity, it also means that the overall system state is very complex and hard to reason about, with congestion and starvation issues being frequent and hard to debug.

Providing a unified system for tracking dependencies and triggering invalidation and regeneration would allow core code and extensions to define new kinds of derived artifacts without having to worry about implementing a new tracking and purging mechanism. It would also allow us to manage the overall system state more concisely, by managing the flow of events through a single service.

Outcomes

A centralized dependency tracking system is available

Baseline Metrics

No dependency tracking system

Target Metrics

Have dependency tracking system

Stakeholders

TBD

Known Dependencies/Blockers

None

Epics, User Stories, and Requirements

< Initiatives

  • Document the set of target use cases
  • Define abstract operation model
  • Plan system architecture
  • Specify APIs
  • Select technology for graph storage
  • Implement service(s)
  • Implement MediaWiki interface for dep service
  • Start maintaining the dependency graph, use for analytics only
  • Start using the dependency graph for purging/regeneration in MediaWiki

Open Questions

< Initiatives

  • Maintain state in the graph, or use event stream based architecture?
  • What technology to use for storing the graph?
  • How far can we make this event based?
  • Do we have to wait for the new frontend architecture to finalize the design of this?

Subpages