Deployment tooling/Notes/l10nupdate dataflow

There are two scripts used for updating the l10n cache files: l10nupdate (run by a cronjob nightly) and mw-update-l10n (which is also run by scap). The former calls extensions/LocalisationUpdate/update.php and rebuildLocalisationCache.php and extensions/WikimediaMaintenance/refreshMessageBlobs.php, while the latter calls mergeMessageFileList.php and rebuildLocalisationCache.php.

Data flow edit

mergeMessageFileList.php edit

This generates the ExtensionMessages.php file.

Input:

  • Currently deployed tree
  • wmf-config/extension-list
  • Files listed in $wgExtensionEntryPointListFiles (currently wmf-config/extension-list-$version)

Output:

  • File eventually copied to wmf-config/ExtensionMessages-$version.php

Timing:

  • Negligible, less than 1s.

extensions/LocalisationUpdate/update.php edit

This basically automatically "backports" non-English translations from master, updating cache files for use by rebuildLocalisationCache.php.

Input:

  • Currently deployed tree
  • Checkout of mediawiki/core master
  • Checkout of mediawiki/extensions master
  • (optional) $wgLocalisationUpdateDirectory/l10nupdate-hashes.cache
    • If present, this allows for skipping reprocessing of i18n files that haven't changed (by md5) since the last run
  • (optional) $wgLocalisationUpdateDirectory/l10nupdate-$lang.cache
    • When the English message changes, non-English messages are no longer backported. But any that were previously backported (and so are already in this file) will be kept.

Output:

  • $wgLocalisationUpdateDirectory/l10nupdate-hashes.cache
  • $wgLocalisationUpdateDirectory/l10nupdate-$lang.cache
    • These are currently 37 bytes to 1.7M, 109M in total.

Timing:

  • Initial run: 370s
  • Run with no changes: 18s

rebuildLocalisationCache.php edit

This constructs the l10n cache CDB files, which MediaWiki uses for faster access to messages.

Input:

  • Currently deployed tree
  • (indirectly) ExtensionMessages-$version.php
  • (optional) cdb files and files (by full path) used by the last rebuildLocalisationCache.php run
    • If present, this allows for skipping reprocessing of i18n files that haven't changed (by mtime) since the last run.
  • $wgLocalisationUpdateDirectory/l10nupdate-$lang.cache

Output:

  • cdb files. Currently 2.0M-2.9M, 779M total.

Timing:

  • Initial run: 270s (4.5 minutes)
  • Run with no changes: about 3.5s
  • Run with one language deleted: about 5.5s.
  • Run with 10 languages deleted: about 11.5s.
  • Run with 20 languages deleted: about 18-19s.

extensions/WikimediaMaintenance/refreshMessageBlobs.php edit

This updates the ResourceLoader message cache; we can't just flush the cache, because that results in cache stampedes that can bring down the site temporarily.

Input:

  • Cache cdb files (mtime checked).
  • Database.

Output:

  • Database is updated.

Timing:

Analysis edit

More timing tests could be run. In particular, better statistics on how long rebuildLocalisationCache.php takes with varying degrees of out-of-dateness would be useful. Still, the worst case for l10nupdate's update.php and rebuildLocalisationCache.php is about 11 minutes, while the Server admin logs indicate worst-case times of 20 minutes or more.

If the assumption that a large part of the l10n-related time during scap is due to copying the cdb files to the apaches is correct, we could likely realize a speedup by reworking things as follows:

  • l10nupdate: (nightly cronjob)
    1. Run extensions/LocalisationUpdate/update.php
    2. Copy cache files (109M) to apaches
    3. Copy deployment tree to apaches
    4. Run rebuildLocalisationCache.php on all apaches (and tin, etc.)
    5. Once #3 is complete on all apaches, run extensions/WikimediaMaintenance/refreshMessageBlobs.php
  • mw-update-l10n / scap:
    1. Run mergeMessageFileList.php
    2. Copy ExtensionMessages-$version.php to apaches
    3. Copy deployment tree to apaches (which scap does anyway, currently after the l10n sync)
    4. Run rebuildLocalisationCache.php on all apaches (and tin, etc.)

Items to investigate related to the above:

  1. Do the apaches have sufficient space to store the cache files for deployed versions of MediaWiki?
  2. Do the apaches have sufficient free CPU to run rebuildLocalisationCache.php without unduly impacting site performance?
  3. Is it ok to have out-of-date or missing messages for the time it takes rebuildLocalisationCache.php to run on the apaches? Could these missing messages be cached inappropriately?