Proposed Database Schema Changes/October 2004

Have now checked this into CVS HEAD for upcoming 1.5. --brion 08:08, 19 Dec 2004 (UTC)


The current cur/old table split is kind of ugly; it is hard to work with and slows down aggregate operations by putting large amounts of bulk text into a table that is frequently being looked over for the smaller metadata fields.

Here's a color-coded diagram of the proposed split and how the old fields (on the left) map to the new ones (on the right):

(page_latest points to the rev_id/old_id of the page's current revision.)

Advantages of new structure:

  • Renames don't require changing the revision or text table, only 'page' and link updates.
  • Aggregate operations don't need to slog through text-laden tables.
  • Don't need to merge cur and old manually in things like contribs
  • rev_id will be a unique id for both current and old revisions (and will retain existing old_id values after conversion)
  • and more!

Disadvantages of new structure:

  • Building and maintaining an index for fulltext search is now harder
    • Not particularly; it's been a separate table for a long time. Anyway the MySQL search is untenable and will be replaced at some point soon. --brion 21:08, 19 Mar 2005 (UTC)

Potential changes:

  • If text.old_id and rev_id are decoupled we could avoid storing a second copy of the text on reverts. Not sure if this is worth the trouble.
    • They are now officially decoupled but this isn't used yet.

Conversion code is in maintenance/updaters.inc. There's also a pure-SQL updater file in maintenance/archives somewhere, but which may not be safe in a replicated environment due to use of a variable.