Proposed Database Schema Changes/October 2004
Have now checked this into CVS HEAD for upcoming 1.5. --brion 08:08, 19 Dec 2004 (UTC)
The current cur/old table split is kind of ugly; it is hard to work with and slows down aggregate operations by putting large amounts of bulk text into a table that is frequently being looked over for the smaller metadata fields.
Here's a color-coded diagram of the proposed split and how the old fields (on the left) map to the new ones (on the right):
(page_latest points to the rev_id/old_id of the page's current revision.)
Advantages of new structure:
- Renames don't require changing the revision or text table, only 'page' and link updates.
- Aggregate operations don't need to slog through text-laden tables.
- Don't need to merge cur and old manually in things like contribs
- rev_id will be a unique id for both current and old revisions (and will retain existing old_id values after conversion)
- and more!
Disadvantages of new structure:
- Building and maintaining an index for fulltext search is now harder
- Not particularly; it's been a separate table for a long time. Anyway the MySQL search is untenable and will be replaced at some point soon. --brion 21:08, 19 Mar 2005 (UTC)
Potential changes:
- If text.old_id and rev_id are decoupled we could avoid storing a second copy of the text on reverts. Not sure if this is worth the trouble.
- They are now officially decoupled but this isn't used yet.
Conversion code is in maintenance/updaters.inc. There's also a pure-SQL updater file in maintenance/archives somewhere, but which may not be safe in a replicated environment due to use of a variable.