Localisation/Tutorial
This page is outdated. |
Internationali[sz]ation/locali[sz]ation (i18n/l10n) tutorial. Please also see a presentation from late 2012.
- For use in Pune Hackathon Feb 2012
First, we give a general intro to the i18n problem, explaining where we are. We'll be going over each of the major extensions that i18n has developed, breaking down where functionality lives regarding input, output, and searching. Then we will give the participants an exercise, probably "add a new keymapping to Narayam".
General principles and goals
edit- We're using the standard Unicode UTF-8 character set. (Wikipedia was one of the first major websites to adopt Unicode for everything.)
- We are using other standards whenever possible: CLDR, ISO 639, etc.
- We always use open source software and open source fonts.
- We're building a software stack which is open source and reusable for the web.
- We need to localise the MediaWiki experience for all the ways people edit and read Wikimedia content. So, we have to support the desktop web and mobile web (tablets, feature phones, smart phones) + should also be ready for offline (PDF, Kiwix) and also Print
How we do it:
- A technical overview of the different i18n/l10n capabilities of MediaWiki: http://www.mediawiki.org/wiki/Localisation
- The concept of message in MediaWiki is very central. It's important not just for localization work, but for understanding a lot of things in MediaWiki in general. See the Message class general documentation, available at Message.
- The l10n tools team: Internationalization and localization tools
- The translation team: https://translatewiki.net
Localising
editNow we'll show you Wikimedia localisation from a tools perspective and from a user perspective, including overviews of the major extensions.
Input
edit(open source keymaps for different languages)
Extension: Narayam -- Extension:Narayam
- also see Help:Extension:Narayam
- If typing in a language is not well-supported in common operating systems, we are trying to provide an input method right in the website.
- For the key mapping we usually choose the standard keyboard for a language, but if there is demand, we find other keymaps which are used and add support for those too. Identifying 3 top most used keymaps and then mapping to a browser (on screen) keymap
- Currently we have: 14 Indic Languages (Lohit family), Hebrew, Arabic (Urdu), Cyrillic (minority language of Russia), some European languages, Berber and others.
Future directions / external ideas:
- (Comment from Amir: it doesn't really belong in the tutorial, it's probably the meeting summary. I kept the links as bookmarks.)
- Arabic script (for Persian and Arabic) On-screen keyboard: http://behdad.org/editor/
- Korean - Hangul, phonetic, fonts are glyphs - 2 or 3 chars create a glyph
Output
editExtension: Webfonts
The css3-based technology allows us to deliver required fonts along with an html page. This is important since for many non-latin languages, we cannot assume that users have installed required fonts, or they know how to get fonts and install.
- also see user documentation at https://www.mediawiki.org/wiki/Help:Extension:WebFonts
Searching
editExtension:CirrusSearch is often used. (here's the category: Category:Search_extensions)
Translation tools
editTranslate - for translation
If you want to help translate MediaWiki and its extensions, see https://translatewiki.net for more documentation and our TODO lists: http://translatewiki.net/wiki/Issues_and_features
Babel
editSee http://www.mediawiki.org/wiki/Extension:Babel . We initially used this to indicate the languages that the user knows. Now it also tries to include CLDR data from the Unicode Common Locale Data Repository.
Exercises
editNarayam
editDeveloping a key mapping for Narayam, per Extension:Narayam#Developing_a_key_mapping
Step-by-step instructions on how someone develops and adds a key mapping/scheme:
- Find a key mapping for the language you want to add. This is a keyboard layout or list of Latin characters with their equivalent in the script of your language. For a lot of Indian scripts, an InScript key mapping is available. See for example these pages: https://fedoraproject.org/wiki/Special:PrefixIndex/I18N/Indic
- Add it to Narayam.php:
- In the array $wgNarayamSchemes, add your mapping similar to the other ones:
- 'xyz' => array( 'ext.narayam.rules.xyz', 'beta' ),
- 'xyz' => 'ext.narayam.rules.xyz', if you know the language well and have tested it thoroughly at the end
- If there are existing key mappings for the language, call it 'ext.narayam.rules.xyz-name' where 'name' is e.g. inscript or phonetic
- Register the interface message that will show up in the menu, in the 'messages' array in $wgResourceModules['ext.narayam.core']. It's best to use the same message key as the mapping name, 'narayam-xyz' or 'narayam-xyz-name'.
- You can add the message key in Narayam.i18n.php
- Now add a $wgResourceModules array, similarly to the other arrays
- In the array $wgNarayamSchemes, add your mapping similar to the other ones:
- If you have set it as a beta mapping and are testing it on an own wiki, make sure to set $wgNarayamUseBetaMapping to true
- Now create the file resources/ext.narayam.rules.xyz.js (depending on the name of course)
- If you have a key mapping you can paste it there and convert it to a javascript array. At the bottom of the file you need to add jQuery.narayam.addScheme(). Both are explained on Extension:Narayam#Developing_a_key_mapping
WebFonts for a specific language
editIdea: something involving looking at Narayam, taking the example of a specific language, and walking through Webfonts and/or Translate? Example: look at what Korean fonts are available, show how you would add them to WebFonts and Translate.
Less likely exercises
editAntoine's exercise idea: i18n messages
editBasic API documentation is in the Message class docblock. Our documentation is regenerated daily from trunk at https://doc.wikimedia.org/. The Message class general documentation is available at Message. Though it needs improvement, it is a good start to have an overall overview about the messaging system. More examples with old wfMsg*() functions, e.g. wfMsgWikiHtml()/wfMsgNoTrans(), could be added, along with their expected behavior.
That is reference documentation; another idea is to write some documentation at https://www.mediawiki.org/wiki/WfMessage%28%29 that helps the reader understand how to use this class to solve various problems.
This is a less likely exercise because it's ill-defined and because it doesn't directly help the student learn how to use important i18n tools.
Unit test for an i18n extension
editFollow https://www.mediawiki.org/wiki/Manual:Unit_testing and write a unit test for an i18n extension. This is a less likely exercise because it requires that the student additionally learn how to use Selenium and the workshop may not have time for that.
Presentation history
edit- Alolita originally gave this on Jan 21, 2012 to Alolita Sharma, Chulki Lee, Shervin Afshar in San Francisco. All in all, this took about 75 minutes, including Q&A.
- To be presented in Pune on 10 Feb 2012. Edited by Siebrand, Sumana, SPQRobin, Antoine, and Niklas.