Wikimedia Language engineering/Roadmap 2014-15

Draft Version June 30 2014

Summary edit

This initial roadmap itemizes features and functionality that the Language Engineering team has selected as key areas of focus and development in the next fiscal year 2014-15.

The feature categories include internationalization and localization categories with cross-platform support for mobile and web.

Internationalization categories include Language Selection, UI Language Components, Input Methods, Fonts, Mapping and Geo-Location and Wikimedia i18n support.

Localization categories include UI Translation, Content Translation, Machine Translation, CX Translation tools, Language APIs and extended development tools.

This roadmap is a draft version proposed by the Language Engineering expert team and is pending prioritization for WMF’s product team.

ULS - Language selection edit

  1. Use Cases
  2. Language Selection
  3. Compact Links (CLS)
  4. Wikidata
  5. VE language selection use cases
  6. Translate language selection
    1. Other language selections in MediaWiki and extensions
  7. Removing Map from Language Selection
  8. Architecture updates:
    1. Extracting langdb and its utils to a separate jQuery Milkshake library
  9. Extracting algorithms for identifying common languages to a separate library (geolocation, Accept-Language, UI language, etc.)
  10. Update UI to meet Agora (styles and grid)
  11. Anonymous language selection for WMF
  12. Caching architecture improvements

UI language components edit

  1. Length sensitive containers (count characters, show X lines of text...)
  2. HTML substring extraction library
  3. User Interface
  4. Content-aware components (detect language the user is writing in, what it is about)
  5. Multilingual input support (e.g. tagging an image with a wikidata entity)

Input methods edit

  1. Architecture enhancement: jQuery.IME: revise the event model for easy interaction with VE and other Javascript
  2. Visual IMEs - Candidate selection
    1. On screen keyboards
  3. Tablet compatible (layout design and development)
  4. Improving IME support by preparing data for mobile vendors to integrate
  5. Input field integration
  6. Result from EventLogging for popular IME can be useful

Fonts edit

  1. Efficient compression - Woff2
  2. Font Distribution: Build or Collaborate / Reuse CDN- Use font delivery service
  3. Analytics - Make font usage analytics visible as a dashboard
  4. "'Upstream fonts'": Integrating with Mobile OS vendors
  5. Delivering webfonts for mobile web

Mapping and geo-location edit

  1. Map labels - multilingual data localization leveraging Wikidata (and possibly Translate)
  2. OSM language script rendering bugs and inconsistencies
  3. Multilingual search: as a Wikipedia editor, I want to add a map to an article, and to search for it right from the editing interface, like image searching in VisualEditor. (Not all of it is necessarily in our scope, but some of it may be.)

Improve i18n in Wikimedia universe edit

  1. Outreach within WMF
  2. Readers: Qualitative reader surveys about i18n features
  3. Support for WMF features teams on i18n review and problem solving
  4. MediaWiki.org multilingual documentation (Internal outreach across WMF engineering teams)
  5. Complex message parameters
  6. Handling locales and variants
    1. Review variants in mediawiki (variants for Chinese, and also Serbian, Kazakh, Uzbek, Kashmiri, Konkani)
    2. Review list(s) of supported UI languages in mediawiki codebase (inc. "related languages" like en-US/en-GB and zh-TW/zh-HK)
    3. Language vs. locale
    4. Matching our codes to standards like BCP-47
      1. Language Converter
      2. Existing RFC
    5. Better support for formal-informal variants including fallback language
    6. nl, hu, de, tagalog/filipino
  7. MediaWiki language support maintenance - (maybe 4 week window after each release) Timely update of CLDR changes to MediaWiki & document the process.
  8. Selectable page content language (with the language selection being stored outside of the wikitext)
  9. Add MW Core support to handle translated page titles: Localised page listings (especially categories: allow defining the title to display on category pages)
  10. Localisation update v2 (currently gsoc, but further dev. and maintenance)
  11. Frontend i18n
  12. Creating independent components (e.g. moment.js for human readable time)
  13. Collaborate with other libraries (maintain cldr.js)

Translate features edit

  1. Notifications
    1. For example, new thing to translate
    2. Hey X, your translation is now live at X
    3. Should be implemented using echo
  2. TUXification of stats (modernization of the current stat views)
  3. Planning improvements for page translation:
    1. Page translation with VE (Visual translation: Integration of page translation with Visual Editor)
    2. Evaluate CX results when applied to page translation scenarios (e.g., A/B test with users using Translate & CX)
    3. Allow to deal with Translate-related annotations from VE (represent them, add them).
  4. GSoC (translation improvements - ongoing project)
  5. Minor and consequential changes to support language fallback/variants features: see "Handling locales and variants" above
  6. Allow attaching UI designs (SVG with message codes) to show a schematic version of how the UI will look with translated messages (e.g., showing a Mockup for search including buttons such as "search" and "I'm feeling lucky" will help to translate them providing more context)
  7. Multilingual Commons: Support translating images, video (subtitles) and other media
  8. Subtitles
  9. Deployment of TranslateSVG
  10. Translation hub
  11. Feed of exports from translatewiki.net
    1. For example a twitter account

Content Translation features edit

Content Translation Roadmap

  1. Usable Moses test instance with HTTP API
  2. Better plan for rich text handling (styles, links, references etc)
  3. Wider use of LinearDoc for programmatic HTML string manipulation
  4. Design for parallel translation update merging
  5. Design for reference/citation template translation
  6. Analytics - understanding better how much translation is happening
  7. Content translation infrastructure development, maintenance & monitoring

Translation tools (more related to CX) edit

  1. Integrate glossaries and dictionaries to Translate Extension (and VE?) (once they are available for CX)
  2. Glossary building tools
  3. Tool to create a draft glossary from existing parallel translations
  4. Glossary service
  5. Develop for CX, in a way that Translate can use it as well
  6. MT API
  7. Translation memory improvement
  8. Rich text awareness: WMF-internal TM needed?
  9. Alignment: (sentence and sub-sentence)
  10. Morphological analysis
  11. "Error-prone lemmatizers" for languages where nothing exists (sufficient for aligners, dictionary/glossary matching etc)

Language service APIs to meet Translate, CX needs edit

(service=public http interface or equivalent)

  1. Glossary service
  2. Dictionary service
  3. Spellchecker/proofreading service
  4. MT service
  5. TM Service

Development Support Tools edit

  1. Language Coverage Matrix Dashboard (LCMD)
  2. Documentation and code review - get it into maintainable mode
  3. Features - analytics views esp for ULS IMEs, Webfonts, CX etc.
  4. Testing Dashboard (possible integration w LCMD)
    1. A unified dashboard to capture all available tests, their frequency, condition, changes and results would help analyse test coverage and their effectiveness for each component
    2. A webhook to pull in data for updating the data to display on the dashboard
    3. Possible hook into cross browser testing dashboard by hashar (Not exactly crossbrowser, https://integration.wikimedia.org/ci/view/BrowserTestsDashboard/)
  5. Understanding usecases for TCMS and manual testing and potential integration of views (get more feedback from VE)
  6. Performance tracking
    1. For example views on timings of our API modules
    2. For example how long translation memory queries take
  7. Evaluate dev-ops tools for language features
  8. Evaluate Chrome dev tools for language features