Core Platform Team/Initiative/Shard Revision Table/Initiative Description
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
- Summary
The ability to do multiple MySQL read queries in parallel, hitting multiple shards at the same time, would allow sharding of the revision table to be done without duplication of data, and without any client latency impact. However, this comes at the expense of duplicating the read query rate. A read query that includes an index can be done without duplicating read query traffic if shards are split by the same index. However, this adds complexity in the application.
Joins on small tables could be supported by replicating those tables into all shards. Joins on large tables, such as a join of the revision table with itself, would have to be emulated in MediaWiki.
The minimum viable product is to distribute rows into shards randomly and to duplicate all read queries asynchronously. If approved, the project plan would then be:
- Evaluate existing asynchronous/parallel MySQL clients in PHP. Choose one.
- Provide an interface within MediaWiki with the following capabilities:
- Distributing SELECT queries to all shards, reassembling and sorting results.
- Distributing DELETE and UPDATE queries to all shards.
- Sending INSERT queries to a single shard based on some simple policy, e.g. fixed or random.
- Remove problematic joins from MediaWiki.
- Provide a maintenance script for moving data between shards.
- Work with DBAs to deploy the system to the Wikimedia cluster.
- Significance and Motivation
Splitting up storage over multiple servers will allow faster schema changes and possibly cheaper hardware. In some cases, this can increase our capacity. This will enable us to handle more applications like JADE.
- Outcomes
- Schema change speed. This process directly slows down any feature that requires database changes. The amount of improvement we get is dependent on the number of shards we introduce.
- Performance. We should have parity with current MYSQL access. We should have no performance degradation.
- Baseline Metrics
None given
- Target Metrics
None given
- Stakeholders
- Extension developers
- Scoring
- WMDE (Wikibase)
- Known Dependencies/Blockers
Actor and Comment Table Schema migrations
MCR (Slot and Content Table) Schema migrations