Translatable modules/Proposed solutions
The Translatable modules project is trying to build a new framework for module localization.
Several proposed storage solutions are proposed here for discussion.
One of the main goals of this consultation is to decide which of these solutions to implement and recommend to all module developers.
Translatable page
Description
Use something similar to Module:ModuleMsg on Meta, but standardized:
- Put all the messages on a usual wikitext page marked for translation.
- Have standard Lua functions to load them (rather than a module like Module:ModuleMsg on each wiki).
Advantages
- Marking pages for translation is familiar to translation administrators.
- Can also work for templates.
- Every translation is a wiki page. It’s good for credit, history, separate processing, etc.
Disadvantages
- By default, translation unit markers are numbers. Using numbers as message keys makes code unreadable. It’s possible to replace the numbers with strings, but, as noted above, the way to do it in the Translate extension is not currently documented well. This can probably be addressed by proper documentation and standartization of this Translate extension feature.
- It’s unclear how will parameters ($1, etc.) and other wiki syntax i18n features (GENDER, PLURAL, etc.) work. They are not necessarily compatible with wikitext content pages.
- This may work for module localization within one wiki, but will not necessarily work when modules become global.
- A solution will be needed for looking up pages for translation. Currently, the message group selector just shows all the translatable pages.
- In the global modules and templates age, it’s unclear how will it be locally customized on wikis.
- Possible performance issues if each message translation is loaded individually, and in handling of fallbacks. There is the
messagecollection
Action API but possibly no nice Lua or Wikitext API for loading the translations.
JSON .tab file in the Data namespace on Commons
Description
Use something similar to what Module:TNT is doing, but formalized:
- Store all the source messages in a JSON file in the Data namespace on Commons in the “banana” format, like in MediaWiki extensions, including qqq for documentation.
- The same syntax can be used as for core and extension messages.
- Enhance the Translate extension to load the source messages and write the translations to JSON files by language.
- Add Lua functions to the standard Scribunto library to load the messages.
- Use another JSON file to organize the translatable file for convenient display in Translate’s message group selector.
Advantages
- The file format is the same as in extensions.
- These pages are already globally accessible from modules.
- The raw pages are easily available for JavaScript gadgets, which will convert JSON into native object, containing all translations if desired. An API might be needed to retrieve one single language or fallback, improving network access. JavaScript gadgets prefer one single query for all messages, stored at client cache.
Disadvantages
- A new file format support (FFS) will have to be developed and maintained in the Translate extension. We will need a new type of MessageGroup, as well as MessageLoading.
- In the global modules and templates age, it’s unclear how it will be locally customized on wikis.
JSON file in an MCR slot
Description
Similar to “JSON .tab file in the Data namespace”, but:
- Store all the source messages in a JSON file in the Data namespace on Commons in the “banana” format, like in MediaWiki extensions, including qqq for documentation.
- (Need to decide whether to store all the languages in one JSON structure, or a slot per language.)
- The JSON file is stored as an MCR slot with the wiki page that stores the module’s code.
- The same syntax can be used as for core and extension messages.
- Enhance the Translate extension to load the source messages and write the translations to JSON files by language.
- Add Lua functions to the standard Scribunto library to load the messages.
- Use another JSON file to organize the translatable file for convenient display in Translate’s message group selector.
Advantages
- Stored elegantly with the module.
- If the module becomes global, the data becomes global with it.
- Creating MCR slots may require some privileges, but that’s probably OK because creating new messages files is not for total newbies anyway, while editing is still accessible to most editors.
Disadvantages
- Will require some development to create slot support.
- A new FFS will have to be developed and maintained in Translate. We will need a new type of MessageGroup, as well as MessageLoading.
- In the global modules and templates age, it’s unclear how it will be locally customized on wikis.
- Common disadvantage of MCR slot approach: Permissions and history mixed altogether. Code programming has the same protection level as translations, every translator is permitted to modify the effective code. There is no history of the effective changes in global programming, but drowning among translation edits. If protection and history would be separated, these are individual pages but not MCR.
TemplateData
Description
Similar to the JSON proposals above, but with the JSON stored inside TemplateData:
- Store all the source messages as a JSON value in TemplateData associated with a template that uses the module. Other than being part of a larger JSON structure, the format is otherwise the same as “banana” format, like in MediaWiki extensions, including qqq for documentation.
- The same syntax can be used as for core and extension messages.
- Enhance the Translate extension to load the source messages and write the translations to JSON files by language.
- Add Lua functions to the standard Scribunto library to load TemplateData and the messages.
- Use another JSON file to organize the translatable file for convenient display in Translate’s message group selector.
Advantages
- Continuity with existing TemplateData technology.
- In particular, TemplateData already has some support for internationalization, e.g. template description can be in several languages.
- Keys can be managed through the TemplateData editor (this will require updates to the editor UI, however).
- The technology can be later shared with templates.
- If TemplateData ever moves to MCR (task T56140), it will move there, too.
Disadvantages
- There is a hard-coded limit of 64 KiB (gzipped) in the TemplateData extension’s code. While this is enough room for something like 700 messages, we have about 400 languages to manage. When using the rate at which MediaWiki core messages are localized, there is room for only 20 messages.
- Requires adding TemplateData support to Scribunto (task T107119).
- A new file format support (FFS) will have to be developed and maintained in Translate.
- In the global modules and templates age, it’s unclear how it will be locally customized on wikis.
Messages as pages in the MediaWiki space
Description
- Store the translatable messages in the MediaWiki namespace, like core and extension messages.
- Create message groups for Translate using a JSON or YAML file stored as a wiki page. This is already supported (WikiMessageGroup) as whitespace separated lists, however, there is no mechanism exposed to define groups inside the wiki itself.
Advantages
- Mostly natural for Translate to process (but support for the message group organizer will probably have to be developed).
- Mostly natural for Scribunto to process—message loading parsing functions already exist.
- Can be customized on local wikis when modules become global the same way that messages from core and extensions are customized.
Disadvantages
- Double duty of listing messages as well as creating them separately.
- Creating the messages will require sysop or edit-interface permissions, making comprehensive module development and bug fixing accessible to much fewer people.
- Lack of packaging. Many distributed development teams will create and maintain packages of module, global template, accompanied by TemplateData, or JavaScript gadget, but should not conflict in naming between packages of similar targets. Should be bundled per package.
Lua table
Description
Do it similarly to existing solutions in Module:I18n on Commons and Module:Wikidades/i18n on the Catalan Wikipedia, but:
- Standardize the Lua table format: Decide whether it is one message key pointing to many translations indexed by language, or language codes pointing to many message keys, etc.
- Add functions to the Scribunto standard library to load these messages.
- Add support for reading and writing this file format to Translate.
Advantages
- Natural for Lua.
- Continuity with at least some existing solutions.
Disadvantages
- This is actual code, which is error-prone and less safe. (We already used to have messages in PHP arrays, and moved away from it.)
- It’s natural for Lua, but what if Scribunto acquires support for other programming languages? There are recurring requests to support JavaScript, Python, Rexx, etc.
- Language codes that have hyphens have to be written with square brackets, which is non-obvious and error-prone.
Proposed solutions comparison table
Feature | Translatable page | JSON .tab file in the Data namespace on Commons | JSON file in an MCR slot | TemplateData | Messages MediaWiki space | Lua table |
---|---|---|---|---|---|---|
Translate changes (see details in “Engineering considerations”) | minor | major | major | major | minor | major |
Needs permission to edit source messages | To mark for translation | No | To create slots | No | Yes - sysop of edit interface | No |
Translate FFS | None | Major | Major | Major | None | Major |
Customize on-wiki | Unclear | Unclear | Unclear | Unclear | Probably easy, but may have performance issues | Unclear |
Similar to core and extensions | No | Very similar | Very similar | Mostly similar | Yes, but only for onwiki editors | No |
Readable message keys | Brittle, needs fixing in Translate | Yes | Yes | Yes | Yes | Yes |
Importing and exporting | Not easy | Not easy | Easy | Easy | Not easy | Probably easy |
Can also be used in templates on the same wiki | Directly | Through a module | Through a module | Through a module | Directly | Through a module |
Handling of fuzzying | Probably already done | Needs non-trivial work | Needs non-trivial work | Needs non-trivial work | Probably already done | Needs non-trivial work |