i18n docs

Internationali[sz]ation/locali[sz]ation (i18n/l10n) tutorial. Please also see a presentation from late 2012.

For use in Pune Hackathon Feb 2012

First, we give a general intro to the i18n problem, explaining where we are. We'll be going over each of the major extensions that i18n has developed, breaking down where functionality lives regarding input, output, and searching. Then we will give the participants an exercise, probably "add a new keymapping to Narayam".

General principles and goals edit

  • We're using the standard Unicode UTF-8 character set. (Wikipedia was one of the first major websites to adopt Unicode for everything.)
  • We are using other standards whenever possible: CLDR, ISO 639, etc.
  • We always use open source software and open source fonts.
  • We're building a software stack which is open source and reusable for the web.
  • We need to localise the MediaWiki experience for all the ways people edit and read Wikimedia content. So, we have to support the desktop web and mobile web (tablets, feature phones, smart phones) + should also be ready for offline (PDF, Kiwix) and also Print

How we do it:

Localising edit

Now we'll show you Wikimedia localisation from a tools perspective and from a user perspective, including overviews of the major extensions.

Input edit

(open source keymaps for different languages)

Extension: Narayam -- Extension:Narayam

  • also see Help:Extension:Narayam
  • If typing in a language is not well-supported in common operating systems, we are trying to provide an input method right in the website.
  • For the key mapping we usually choose the standard keyboard for a language, but if there is demand, we find other keymaps which are used and add support for those too. Identifying 3 top most used keymaps and then mapping to a browser (on screen) keymap
  • Currently we have: 14 Indic Languages (Lohit family), Hebrew, Arabic (Urdu), Cyrillic (minority language of Russia), some European languages, Berber and others.

Future directions / external ideas:

(Comment from Amir: it doesn't really belong in the tutorial, it's probably the meeting summary. I kept the links as bookmarks.)
  • Arabic script (for Persian and Arabic) On-screen keyboard: http://behdad.org/editor/
  • Korean - Hangul, phonetic, fonts are glyphs - 2 or 3 chars create a glyph

Output edit

Extension: Webfonts

The css3-based technology allows us to deliver required fonts along with an html page. This is important since for many non-latin languages, we cannot assume that users have installed required fonts, or they know how to get fonts and install.

Searching edit

Extension:CirrusSearch is often used. (here's the category: Category:Search_extensions)

Translation tools edit

Translate - for translation

If you want to help translate MediaWiki and its extensions, see https://translatewiki.net for more documentation and our TODO lists: http://translatewiki.net/wiki/Issues_and_features

Babel edit

See http://www.mediawiki.org/wiki/Extension:Babel . We initially used this to indicate the languages that the user knows. Now it also tries to include CLDR data from the Unicode Common Locale Data Repository.

Exercises edit

Narayam edit

Developing a key mapping for Narayam, per Extension:Narayam#Developing_a_key_mapping

Step-by-step instructions on how someone develops and adds a key mapping/scheme:

  1. Find a key mapping for the language you want to add. This is a keyboard layout or list of Latin characters with their equivalent in the script of your language. For a lot of Indian scripts, an InScript key mapping is available. See for example these pages: https://fedoraproject.org/wiki/Special:PrefixIndex/I18N/Indic
  2. Add it to Narayam.php:
    1. In the array $wgNarayamSchemes, add your mapping similar to the other ones:
      'xyz' => array( 'ext.narayam.rules.xyz', 'beta' ),
      'xyz' => 'ext.narayam.rules.xyz', if you know the language well and have tested it thoroughly at the end
    2. If there are existing key mappings for the language, call it 'ext.narayam.rules.xyz-name' where 'name' is e.g. inscript or phonetic
    3. Register the interface message that will show up in the menu, in the 'messages' array in $wgResourceModules['ext.narayam.core']. It's best to use the same message key as the mapping name, 'narayam-xyz' or 'narayam-xyz-name'.
      You can add the message key in Narayam.i18n.php
    4. Now add a $wgResourceModules array, similarly to the other arrays
  3. If you have set it as a beta mapping and are testing it on an own wiki, make sure to set $wgNarayamUseBetaMapping to true
  4. Now create the file resources/ext.narayam.rules.xyz.js (depending on the name of course)
  5. If you have a key mapping you can paste it there and convert it to a javascript array. At the bottom of the file you need to add jQuery.narayam.addScheme(). Both are explained on Extension:Narayam#Developing_a_key_mapping

WebFonts for a specific language edit

Idea: something involving looking at Narayam, taking the example of a specific language, and walking through Webfonts and/or Translate? Example: look at what Korean fonts are available, show how you would add them to WebFonts and Translate.

Less likely exercises edit

Antoine's exercise idea: i18n messages edit

Basic API documentation is in the Message class docblock. Our documentation is regenerated daily from trunk at https://doc.wikimedia.org/. The Message class general documentation is available at Message. Though it needs improvement, it is a good start to have an overall overview about the messaging system. More examples with old wfMsg*() functions, e.g. wfMsgWikiHtml()/wfMsgNoTrans(), could be added, along with their expected behavior.

That is reference documentation; another idea is to write some documentation at https://www.mediawiki.org/wiki/WfMessage%28%29 that helps the reader understand how to use this class to solve various problems.

This is a less likely exercise because it's ill-defined and because it doesn't directly help the student learn how to use important i18n tools.

Unit test for an i18n extension edit

Follow https://www.mediawiki.org/wiki/Manual:Unit_testing and write a unit test for an i18n extension. This is a less likely exercise because it requires that the student additionally learn how to use Selenium and the workshop may not have time for that.

Presentation history edit

  • Alolita originally gave this on Jan 21, 2012 to Alolita Sharma, Chulki Lee, Shervin Afshar in San Francisco. All in all, this took about 75 minutes, including Q&A.
  • To be presented in Pune on 10 Feb 2012. Edited by Siebrand, Sumana, SPQRobin, Antoine, and Niklas.