Content translation/Documentation/Comparison with the Translate extension

The Language team works on two major MediaWiki extensions, both of which are used for translation: Content Translation and Translate. This document serves to explain the differences between them and why the two extensions are needed.

Summary table edit

Content Translation Translate
Translate any Wikipedia article, without any technical preparation Translate wiki pages prepared for translation on multilingual sites, such as Commons, Meta, and mediawiki.org, and software user interface on translatewiki.net.
Content adaptation encouraged Content structure enforced
No support for updating already-translated content Status tracking and surfacing of outdated content needing update
No collaboration Collaboration possible, and explicit proofreading (review)
Side-by-side editor with automatic adaptation Unit-focused editor with translation memory and message documentation
Visual editing Wiki syntax editing

Content Translation extension edit

Content Translation is used for creating Wikipedia articles[1] by translating them from an article about the same topic in another language. See a short video about Content Translation.

 

In the screenshot you can see the translation of the article Beit Hakerem, Jerusalem from English into Japanese. Machine translation using Google Translate is enabled, but the result is not yet edited, and probably contains errors in grammar and vocabulary. The paragraph is marked in yellow to attract the translator’s attention to the need to correct these errors.

Translate extension edit

Translate is used for three kinds of tasks:

  1. Translating the user interface of software: This is used on the translatewiki.net website for translating the user interface of several free software projects: MediaWiki, MediaWiki extensions, Wikimedia mobile apps, Pageviews tool, and also several projects not related to Wikipedia, such as Etherpad and OpenStreetMap. In this capacity, it is integrated with the Gerrit source code management system—all the new and updated translatable user interface messages (strings) in English are semi-automatically copied from Gerrit to translatewiki, and all the translated strings are semi-automatically copied from Gerrit to translatewiki.
  2. Translating wiki pages on multilingual wikis and community wikis: Commons, Wikidata, Meta, mediawiki.org, and several others.
  3. Translating CentralNotice banners for fundraising, article writing campaigns, and other purposes.
 

The screenshot shows translating the weekly Tech News on meta.wikimedia.org from English to Hebrew. The text is divided into short sentences. Some technical parts that don’t need to be translated are packed as variables (known as ‎<tvar>): $list, $contribute, and $feedback. The Suggestions in the right-hand sidebar show translation memory—a past translation of similar sentence.

The second task of Translate, “translating wiki pages”, may sound very similar to translating Wikipedia articles. However, there is a significant difference between the two. The pages for which the Translate extension is used tend to be tightly structured and stable. Some examples:

Pages that are translatable using the Translate extension have to be prepared for translation: all the parts that have to be translated must be marked with XML-like ‎<translate>...‎</translate> tags. The purpose of this is to indicate which pages can be translated and divide the long text into small parts that are easy to translate one by one and to help translators skip all the parts that don’t have to be translated (images, templates, tables, numbers, code examples, etc). This division into small units also helps translators identify which parts of the page were updated, so that their translation can be updated easily and separately from the parts that weren’t modified. These smaller units enable the software to track changes and surface any outdated areas for translators to edit.

Adding such tags to Wikipedia articles, however, would be extremely uncomfortable for Wikipedia editors. Unlike user manuals or legal documents pages, Wikipedia articles change frequently and unexpectedly, both in their text and their structure. For editors who mostly edit in one language, seeing ‎<translate>...‎</translate> tags everywhere in the text would inhibit the ease of translation.

To demonstrate this, here is how the wikitext source of the same section looks like:

{{Tech header|1=<translate><!--T:1--> The Tech News weekly summaries help you monitor recent software changes likely to impact you and your fellow Wikimedians. [[<tvar name=list>Global message delivery/Targets/Tech ambassadors</tvar>|Subscribe]], [[<tvar name=contribute>Special:MyLanguage/Tech/News#contribute</tvar>|contribute]] and [[<tvar name=feedback>Talk:Tech/News</tvar>|give feedback]].</translate>}}
{{Deadline|timeanddate=https://www.timeanddate.com/countdown/generic?iso=20200629T09&msg={{URLENCODE:Publication of Wikimedia Tech News}}}}
{{Tech news nav}}
<languages/>
<section begin="tech-newsletter-content"/><div class="plainlinks">
<translate><!--T:2--> Latest '''[[<tvar name=technews>m:Special:MyLanguage/Tech/News</tvar>|tech news]]''' from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. [[<tvar name=more-transl>m:Special:MyLanguage/Tech/News/2020/27</tvar>|Translations]] are available.</translate>

This has a very large amount of markup: ‎<translate>...‎</translate>, ‎<languages>, ‎<tvar>...‎</tvar>, <!--T:1-->, and more, which get in the way of editing the text. Therefore, the Content Translation extension was developed as a distinct product for translating Wikipedia articles. No technical preparation is needed to translate an article using this extension, and the workflow of the editors who write the source article is not affected in any way. Content Translation also focuses on creating the first version of a translated article rather than keeping the article’s translation up to date with the source article. This is a trade-off, because it would be quite useful to see how a Wikipedia article changed since it was first translated, but actually doing it in the manner in which it is done in the Translate extension does not scale well.[5]

In addition, Content Translation recognizes that Wikipedias have different writing styles across the different languages. It doesn’t force the translators to stick strictly to the content and the structure of the source article. This is different from the Translate extension, which strongly encourages precise translation and forces identical page structure.

And finally, Content Translation gets translators to use only rich-text WYSIWYG editing, using VisualEditor as a component. Editing in wiki syntax is not allowed.[6] This is done for two reasons: to make it generally easier for new Wikipedia users who are not familiar with wikitext, Wikipedia’s markup language, and to make it easy to adapt content such as images, links, and templates semi-automatically from the source article to the target article. In contrast, the Translate extension uses only wikitext source editing, because it is targeted at more experienced editors and needs very precise and fine-grained formatting.

Footnotes edit

  1. In theory, it could also be used for pages in wiki sites other than Wikipedia, for example Wikivoyage, but this is not done at the moment.
  2. Most notably by community relations specialists (liaisons) and product managers.
  3. Sometimes written by product managers, designers, community relations specialists, or technical writers, but often also by volunteer editors.
  4. Written by volunteer community members.
  5. The “Section translation” feature in Content Translation, which is in development as of June 2020, is trying to help update or extend articles that were already translated.
  6. Although it is occasionally requested.