Wikimedia Language engineering/Reports/2015/May

TriangleArrow-Left.svgApril 2015 Wikimedia Language engineering monthly report for May 2015 June 2015TriangleArrow-Right.svg

Monthly updates from WMF Editing - Language team

Project UpdatesEdit

Content TranslationEdit

Updates from the ongoing development and deployment for Content Translation

Development UpdateEdit

  • Reference tool card can copy a reference to source even if the translation is from scratch
  • Proper handling of wikis with different content language code and domain name code.
  • Many RTL fixes to get the tool ready for RTL wiki deployments
  • Links system is getting rewritten to meet upcoming complex usecases
  • Improvements to translation source selector. A page selector widget was developed to do prefix search with results listed with thumbnail images and brief description
  • One click beta feature enablement campaigns at new article creation workfows - both with wikitext and VE based article creation points.
  • Entry points for beta feature users - contributions menu, contribution page links, article creation, interlanguage links
  • Stats page getting new design since number of languages are too big to represent in table
  • Switched to RESTBase to fetch HTML pages. Publishing api switched to use ParsoidVirtualRESTService
  • Automatic linking of source and target articles in wikidata when translation published is coming.
  • Echo integration to notify translators on milestones and various other events are in progress
  • CXServer logging improved to log to logstash
  • Continuing collaboration with Apertium, more apertium language pairs are packaged and enabled in WMF apertium instance
  • Niklas, Santhosh and Pau wrote a paper on computer-assisted translation (CAT) system for Wikipedia, which was selected for European Association of Machine Translation conference ( They presented it at Antalya, Turkey.

Usage DataEdit

  • Articles created - 2317
  • Number of new translators - 710
  • Highest number of articles created by one user - 299 (from the very beginning)
  • Highest number of translators for a Wikipedia - 346 (Spanish)
  • Number of translators with 1 article edit - 937
  • Number of translators with 2 article edits - 176
  • Number of translators with 3 article edits - 78
  • Number of translators with 4 article edits - 39
  • Number of translators with >= 5 and <= 20 article edits - 136
  • Number of translators with >20 edits - 30
  • Articles deleted - 189 (from all namespaces)


Comparitive figures for the month of April and May 2015


Updates from Translate, Universal Language Selector, MLEB and other projects

Development UpdateEdit

  • We became aware of pressing issues in the Translate translation memory performance at (in addition to known issues of missing suggestions at Wikimedia sites). Niklas will propose fixes based on the help of David Chan during the Lyon hackathon which could fix both issues.
  • Many bugs in Translate extension and MediaWiki core i18n were fixed during the Lyon Hackathon in addition to other cleanups during the month.
  • Translate web service framework was improved to support querying multiple services in parallel to increase response times
  • Some core Translate classes have been slightly refactored to be more test friendly to stop Translate unit tests failing intermittently.
  • TwnMainPage now shows powered by items.
  • Message group workflow selector was briefly broken on Special:Translate until the code was fixed.
  • Universal Language Selector received couple of fixes for Input Methods.

Usage DataEdit

  • Translation rally increased MediaWiki language coverage (covered in Niklas's blog and Wikimedia blog)
  • MLEB was not released this month. Latest release, 2015.04, has been downloaded 92 times so far, possibly indicating a drop from the usual 150 downloads per release.

Deployments and other site related updatesEdit

  • Content Translation has been deployed in following Wikipedia during May:
    • Armenian (hy), Turkish (tr), Albanian (sq), Aromanian (roa-rup), Avar (av), Azerbaijani (az), Gagauz (gag), Kabardian (kbd), Karachay-Balkar (krc), Karakalpak (kaa), Maltese (mt), Ossetian (os), Abkhazian (ab), Ladino (lad), Mirandese (mwl), Romani (rmy), Crimean-Tatar (crh), Tagalog (tl), Cebuano (ceb), Waray-Waray (war), Ilokano (ilo), Kapampangan (pam), Zamboanga Chavacano (cbk-zam), Central Bicolano (bcl), Pangasinan (pag), Georgian (ka), Kashubian (csb), Rusyn (rue), Belarussian (be), Belarussian Taraškievica (be-x-old), Latvian (lv), Lithuanian (li), Latgalian (ltg), Bhojpuri (bh), Polish (pl), Hindi (hi), Aymara (ay), Gurarani (gn), Extremaduran (ext), Papiamento (pap), Swahili (sw), Somali (so), Shona (sn), Yoruba (yo), Amharic (am), Kabyle (kab), Wolof (wo), Igbo (ig), Northern Sotho (nso), Quechua (qu) Nahuatl (nah) and Lithuanian (lt), Slovak (sk), Estonian (et), Finnish (fi), Romanian (ro), Hungarian (hu), Serbian (sr), Croatian (hr), Bosnian (bs), Northern Sami (se), Samogitian (bat-smg), Veps (vep), Silezian (szl), Voro (fiu-vro), West Frisian (fy), Dutch Low Saxon (nds-nl), Dutch (nl).
  • Apertium Machine Translation support added for language pairs:
    • Basque -> Spanish
    • Catalan -> Occitan
    • English -> Galician
    • Portuguese -> Galician
    • Spanish <-> Aragonese
    • Spanish <-> Asturian
    • Spanish <-> French
    • Spanish <-> Galician
    • Spanish -> Occitan
    • Kazakh <-> Tatar
  • cxserver is updated to use RESTBase API for page fetch.
  • Campaigns: newarticle and cxstats campaigns are enabled in all wikis where Content Translation is deployed.
  • Use all languages in 'source' selector.

Cross team work/requirementsEdit

  • Pywikibot finished conversion of their i18n file format to JSON during Lyon hackathon with some assistance from
  • Need information on feature graduation process: beta-feature to fully supported production feature

Team statusEdit