Requests for comment/Localisation format
This is an RfC on a JSON based localisation format and structure for MediaWiki core and extensions, based on the work to create jQuery.i18n. There will be no loss of internationalisation or localisation functionality in this change.
Localisation format | |
---|---|
Component | Localisation |
Creation date | |
Author(s) | Siebrand, James F. |
Document status | implemented |
Rationale
editThe current MediaWiki localisation files are based on PHP arrays. Having PHP arrays makes the localisation files not directly usable in JavaScript, which will be a follow-up stage to this change. Having JSON as the MediaWiki file format will make it easier to integrate modularly-developed functionality into MediaWiki, while preserving all the rich internationalisation features MediaWiki currently enjoys.
The largest i18n files in extensions currently contain localisations for 200 or more languages, have a size in the megabytes, and have several tens of thousands lines (for example WikimediaCCLicenseTexts.i18n.php
is 3.8 MiB and over 18,000 lines, and UploadWizard.i18n.php
is 3.4 MiB and over 32,000 lines). These large files are hard to edit for developers altering existing English messages and adding new ones, and slow to process for MediaWiki and others. A change in one translation in an extension could for example no longer trigger a re-cache of all languages any more, as we can track timestamps per language after this change. This benefit applies to MediaWiki core as well as to translatewiki.net. While handling JSON may be slower than using PHP (we have no benchmarks on this), overall this change can improve performance by avoiding loading unnecessary data, as data for each language is held in a separate file.
Additionally, using executable PHP files is a potential security risk. Loading the messages for translation at translatewiki.net and any other kind of manipulation is icky.
Out of scope
edit- Representation in resource loader for i18n files; a separate RfC will be submitted for this.
Requirements
editRequirements are at least the following:
- The scope of this RfC is limited to
$messages
. - The messages should be stored in a non-executable UTF-8 file format.
- MediaWiki core and individual extensions should be able to have multiple groups of messages, located in an advised standard structure (for example,
i18n/[modulename]/langcode.json
). - All current MediaWiki i18n features must keep working.
- The current extension localisation format should keep working (
$wgExtensionMessagesFiles
) for backward compatibility purposes. - The new format will be mandatory for all Wikimedia deployed code.
- Conversion scripts should be made available, and be performed for all Wikimedia-deployed extensions, expected by the time of the branching of MediaWiki 1.24.
Proposal
editThe proposal is as follows
- We will use UTF-8 JSON files using the same syntax that is used by jQuery.i18n, https://github.com/wikimedia/jquery.i18n. This provides complete support for
$messages
and basic authoring information, but does not support all the edges of theMessagesXx.php
files, which will remain (replacing them is out-of-scope for this RfC).- The JSON translation files will be stored with actual human-readable Unicode characters and not Unicode character numbers. If this is a problem, it should be a job of for example ResourceLoader to resolve this.
- Each locale will be stored in its own file, even though the jquery.i18n file format specifies that it is also possible to put all locales in the same file.
- Extension messages files will be expected in general to be in a directory called
i18n
in the root of the extension's repo, split by language (en.json
,qqq.json
,fr.json
, etc.), with support for multiple sub-directories for splitting up into keyed groups, auto-loaded. - Groups' keys are locally-scoped, allowing only
[a-z0-9\-_]
, and auto-generated from the name of the sub-directory (if appropriate); the primary group will be automatically named the same as the extension. If you want to refer to default group distinctly from the extension's whole set of modules, refer to it outright (named other than''
). - We will introduce a new configuration variable,
$wgMessagesDirs
, which allows the directory/directories for the extension to be loaded.$wgExtensionMessagesFiles
will continue to work but will be ignored in$wgExtensionMessagesFiles
if the same key is specified in$wgMessagesDirs
. - Implementation will be completed in advance of the release of MediaWiki 1.23, running in dual-support mode. As MW 1.23 will be an LTS release, this will ensure that users of existing extensions will be able to use them unaltered for several years.
- Support for the old PHP based file format will be retired in MediaWiki 1.24.
- The old PHP based localisation format for $messages will be considered deprecated starting with MediaWiki version 1.23.0.
- Write conversion script to convert from PHP arrays to JSON files, and run it on all Wikimedia-deployed extensions. Complex splits (e.g. for MediaWiki core and VisualEditor) will be done manually by the Language Engineering and VisualEditor teams.
maintenance/language/messages.inc
will become obsolete.maintenance/language/messageTypes.inc
will be converted to translatewiki.net configuration.$messages
oflanguages/messages/MessagesXx.php
core files will be converted into ini18n/xx.json
. Conversion of other properties maintained inMessagesXx.php
is out of scope of this RfC.- The installer i18n will be migrated to
i18n/installer/xx.json
. - jquery.i18n JSON could support fuzzy tags, but it does not at the moment. In the future, this could be implemented using an array in the @metadata.
- The current extension localisation format should keep working (
$wgExtensionMessagesFiles
) for backward compatibility purposes, and converted extensions would get backward compatibility code. With an extension that has$wgMessagesDirs
defined, running a conversion script would suggest code like this forExtension.i18n.php
:
<?php
$messages = array();
array_map( function( $dir ) use ( &$messages ) {
$files = glob( __DIR__ . "/$dir/*.json" );
foreach ( $files as $file ) {
$langcode = substr( basename( $file ), 0, -5 );
$data = (array)json_decode( file_get_contents( $file ) );
unset( $data['@metadata'] );
$messages[$langcode] = array_merge( (array)$messages[$langcode], $data );
}
}, array(
'modules/oojs-ui/i18n',
'modules/ve/i18n',
'modules/ve-mw/i18n',
'modules/ve-wmf/i18n'
) );
Examples
editAn extension with multiple groups of messages might choose to split its files up as follows:
WikimediaMessages/i18n/en.json
- This will be loaded as wikimediamessages
WikimediaMessages/i18n/CCLicenseTexts/en.json
- This will be loaded as wikimediamessages/cclicensetexts
WikimediaMessages/i18n/WikimediaTemporaryMessages/en.json
- This will be loaded as wikimediamessages/wikimediatemporarymessages
The extension would load its values using $wgMessagesDirs['WikimediaMessages'] = __DIR__ . '/i18n';
. This is short for $wgMessagesDirs['WikimediaMessages'] = array( '' => __DIR__ . '/i18n' );
, and would load all message files in the above example automatically, including the namespacing.
A more complex use case is for extensions that use libraries, where moving the internationalisation files into the root of the extension would split the import and make things more complicated. A (slightly artificial) example might be:
VisualEditor/i18n/en.json
VisualEditor/ve-core/i18n/en.json
VisualEditor/qunit/localisation/en.json
VisualEditor/oojs-ui/messages/en.json
For this example, the messages files would be loaded using:
$wgMessagesDirs['VisualEditor'] = array(
'' => __DIR__ . '/i18n',
've-core' => __DIR__ . '/modules/ve-core/i18n',
'qunit' => __DIR__ . '/modules/qunit/localisation',
'oojs-ui' => __DIR__ . '/modules/oojs-ui/messages',
);
Implementation
edit- Done Support in MediaWiki for using this for extensions [1] and documentation at Manual:$wgMessagesDirs, Manual:$wgExtensionMessagesFiles
- Done Support in MediaWiki for using this for core.
- Done Use JSON instead of
.i18n.php
in various extensions:- Done the script for conversion and PHP shim, documented at generateJsonI18n.php.
- Done for VisualEditor (and OOjs UI) – conversion, use.
- Done for the Wikimania Scholarships application, as of 2013-12-18 – [2].
- Done for 8 more extensions as of March 2014 (search "format = json" in [3] for complete list).
- Feature parity for translation and localisation updates
- Done Improvements to Translate/translatewiki.net workflow.[4] [5]
- Done Support JSON i18n in LocalisationUpdate (v1) AKA json-rewrite.
- Done Use JSON instead of
Messages*.php
in MediaWiki core. - Done Use JSON instead of
.i18n.php
in all extensions in gerrit.