Language goals and wishlist

This document is based on the previous wishlists internationalisation wishlists (2013, 2014, 2017). This page is an living document used to collect ideas that can be used by the Language team for planning purposes. The purpose is to have a ready list of idea candidates for annual planning and for mentoring programs. Each idea should have a title and a description that describes the issue being solved, the possible impact, possible approaches, community demand and expected effort needed to implement the idea. Optionally each idea has a line about its current status, link to a relevant Phabricator task.

Planned goals: January - June 2018

edit
edit

Compact Language Links are enabled by default on Wikipedia in all languages. Percent of clicks out of pageviews doesn’t go down anywhere.

Content Translation version 2

edit

Restart work as per roadmap to complete the second version of CX on the new codebase (including VE integration for the translation interface), and initiate appropriate community interactions for release of CX as non-beta on selected wikis.

Translation support on mobile devices

edit

Research and design a mobile translation experience

edit

Tags: #translators #contenttranslation

Description:

New editors using mobile can participate into a low-barrier contribution process. Design solutions for translation experiences on mobile. Ideas will be evaluated with user research, and specifications ready to be implemented in upcoming steps.

Impact:

Increase in translated content created on mobile. Mobile contributions enables those users with mobile as their only platform.

Backstage translatewiki.net process enhancements

edit

Better insertables

edit

Tags: #translators #translate-extension #rtl #php #javascript #gsoc-outreachy

Description:

Various i18n libraries use different ways to mark variables. Some examples

$1 // MediaWiki
$var
%1$s, %2 // Many C/Gettext projects
${var}

With insertables (those buttons that can be activated to insert these), we have made it easier to add these and avoid spelling mistakes in them. However, some of these formats, those with latin letters, are difficult and confusing to use in right-to-left language translations.

Possible approach:

One possible approach is to unify all these formats, so that translators only see one of them, even though the underlying code will see whatever syntax they use. We can also make it so that in right-to-left languages we use syntax that does not cause issues.

Another aspect of unification is that for translation memory, we should replace all variables with a similar placeholder, so that translation memory matching is more accurate.

If we want to take this even further: Insertables should perhaps be easier to control, so that project contacts have more visibility on them without having to write PHP code to support them. We can do this by 1) supporting most common formats out of the box 2) allowing to specify regular expression directly in YAML configuration.

See also Phabricator:T64099 and Phabricator:T114101.

Extensive and robust localisation file format coverage

edit

Tags: #translate-extension #php #back-end

Description:

Translate extension supports multiple file formats. The formats have been developed "as needed" basis, and many formats are not yet supported or the support is incomplete.

Possible approach:

In this project the aim would be to make existing file formats (for example Android xml) more robust to meet the following properties:

  • the code does not crash on unexpected input,
  • there is a validator for the file format,
  • the code can handle the full file format specification,
  • the code is secure (does not execute any code in the files nor have known exploits).

Example known bugs are bugzilla:31331, bugzilla:36584, bugzilla:38479, bugzilla:40712, bugzilla:31300, bugzilla:57964, bugzilla:49412.

In addition new file formats can be implemented: in particular Apache Cocoon (bug 56276) and AndroidXml string arrays have interest and patches to work on, but we'd also like TMX, for example. (More example formats other platforms support: OpenOffice.org SDF/GSI, Desktop, Joomla INI, Magento CSV, Maker Interchange Format (MIF), .plist, Qt Linguist (TS), Subtitle formats, Windows .rc, Windows resource (.resx), HTML/XHTML, Mac OS X strings, WordFast TXT, ical.)

Impact:

Adding new formats is a good chance to learn how to write parsers and generators with simple data but complicated file formats. For some formats, it might be possible to take advantage of existing PHP libraries for parsing and file generation.

This project paves the way for future improvements, like automatic file format detection, support for more software projects and extension of the ability to add files for translation by normal users via a web interface.

Transparent and fast language addition process for translatewiki.net

edit

Tags: #documentation #translatewiki.net #translationadmins

Description:

Adding a new language on translatewiki.net (translatewiki:Translatewiki.net languages) requires many decisions and checks (e.g. ISO status, names in Wikipedia/CLDR/request, jquery.uls) and changes in various repositories. It's also not clear to translators what the status of their request is, sometimes data is forgotten. Only core staff can help (in practice just a single person) since full access to configuration and repositories is needed.

Suggesting to build a good documentation for the process and clear criteria that can be executed by anyone, leaving only +2 and oversight to admin. Thanks to more active code review tracking, patches there are slightly less likely to get stuck.

Transparent and fast project addition process for translatewiki.net

edit

Tags: #documentation #translatewiki.net #translationadmins

Description:

Adding a new project on translatewiki.net (requires many decisions and checks (e.g. file format, access rights, license, string quality) and changes in various places. People are asked to join a IRC channel to discuss this. Progress is loosely tracked (with support requests and sometimes Phabricator tasks).

Suggesting to build a good documentation for the process and clear criteria that can be executed by anyone, leaving only +2 and oversight to admin. Thanks to more active code review tracking, patches there are slightly less likely to get stuck.

Better export thresholds

edit

Tags: #translatewiki.net #yaml #front-end

Description:

It would be helpful to alert users when translations are not being exported from translatewiki.net due to not meeting the export threshold. This information should be accessible to the Translate extension. Currently this is specified in the repository management.

Possible approach:

It would be also good to reconsider whether the current export levels make sense. One should check how many translations we currently have, that are not being exported due to not meeting the export level. We should also consider of lowering our thresholds for hopefully increased translator motivation, faster deployment of translations and less wasted work.

Impact:

If this information is moved to the message group configuration, we would avoid duplication, and simplify repository management for exports.

Support renames when importing message changes

edit

Tags: #translate-extension #translatewiki.net #php #translationadmins

Description:

Currently translation administrators of translatewiki.net need to check all changes manually using Special:ManageMessageGroups. One of the reason is that sometimes messages are renamed. If this happens, the admin must use Special:ReplaceText or equivalent to move translations in the wiki to retain histories. It would be better if this could be done directly from the review interface.

Possible approach:

Automatic detection of renames (same content deleted in one message and added in another) would be best, with manual override for the cases rename is not automatically detected or is detected incorrectly.

Impact: This would save considerable amount of time and bring us closer to fully automated imports as well.

Advanced Translate features

edit

Visual page translation

edit

Tags: #difficult #visualeditor #parsoid #translate-extension #translationadmins

Description:

The wiki page translation feature of the Translate extension does not currently work with Visual Editor due to the special tags it uses. More specifically, this is about editing the source pages that are used as the source for translations, not the translation process itself.

Possible approach:

The work can be divided into three steps:

  1. Migrate the special tag handling to a more standard way to handle tags in the parser. This need some changes to the PHP parser for it to be able to produce wanted output.
  2. Add support to Parsoid and Visual Editor so that editing page contents preserves the structures that page translation adds to keep track of the content.
  3. Add to Visual Editor some visual aid for marking the parts of the page that can (or cannot) be translated.

This is a difficult project due to complexities of wikitext parsing and intersecting multiple different products: Translate, MediaWiki core parser, Parsoid, Visual Editor.

See also: T143327

Support creation and use of glossaries

edit

Tags: #translate-extension #research #gsoc-outreachy #translators

Description:

There must be a lot of glossaries and terminologies out there. Some of them must be useful to integrate in Translate.

Provide technical support for building glossaries with Translate extension. These should directly integrate into the translation editor. The solution can range from a simple glossaries defined in a wikipage to much more complex solutions. But currently we have almost nothing, so it would be good to get the thing rolling.

Improve translation memory

edit

Tags: #translate-extension #research #translators

Let's gap a bit the hole between plain TM and MT and do something more useful than edit distance. Try to get the ElasticSearch-as-TM project to fly and promote it. It could be *the* open source TM solution to plugin into your software, given that there doesn't seem to be much competition at the moment. Use sentence alignment to increase the recall of the translation memory. Or at least fix the known issues mentioned in the task that are making it more annoying than useful for e.g. Tech News translators.

Translation features for new Wikipedia editors

edit

Translation lists and campaigns

edit

Tags: #contenttranslation #neweditors

Description:

Providing tools for our users not only to contribute directly, but also to organise campaigns that encourage others to contribute has a multiplying effect, reaching to more and more diverse users. Individuals and groups often follow some kind of list of articles to translate. Those lists are often informal (personal notes, a list in a user page or just an idea in the user's head) and lack integration with the rest of the translation process. By providing an integrated support for them, we can improve the whole translating experience.

MediaWiki core i18n

edit

Integrate continuous translation updates to MediaWiki core

edit
See also Generic, efficient Localisation Update service, Extension:LocalisationUpdate/LUv2

Tags: #mediawiki-core #php #back-end

Description:

Nobody updates MediaWiki regularly. LocalisationUpdate type of functionality should be available and functional on all MediaWiki installations. It should not need extra configuration such as cronjobs. This requires we can safely and very efficiently serve the updates to the wikis. Naturally, it should be very easy to not use this feature for those who don't want to.

For safety, resolving Phabricator:T2212 is important.

See also #Continuous translation update service provided by translatewiki.net for the service side of this.

Better support for formal and informal variants

edit

Tags: #php #back-end #mediawiki-core #translatewiki.net

Description:

Currently the formal and informal variants are a bit hit and miss. The fact that not every message needs translation (compare with variants of English) makes them problematic. For languages which want to take seriously, we could make the formality an inline feature in addition the existing PLURAL, GENDER and GRAMMAR features.

Possible approach:

The formality could be an additional option in the user preferences (like gender) or driven by the language codes directly. Language could choose their own number of formalities, not limited to two.

Example (current solution):

(es)        ¿Estás {{GENDER:$1|seguro|segura}}?
(es-formal) ¿Está {{GENDER:$1|seguro|segura}}?

Example (proposed solution):

(es) {{FORMAL:¿Estás|¿Está}} {{GENDER:$1|seguro|segura}}?

The underlying assumption here is that only MediaWiki uses these formal/informal variants. If other projects use them, they could keep them as separate languages still until a better solution for them is described.

See for example Phabricator:54957.

Typed message parameters

edit

Tags: #difficult #php #javascript #back-end

Description:

The MediaWiki message library is very versatile, but some limitations have become apparent over the time. The main one is the inability to embed structures that themselves contain linguistic content in sentences. This is best illustrated with the case of links. All the current alternatives are no nice:

# Alternative 1: lego
msg1: Please see our $1 for more information
msg2: terms of service
call: $this->msg( "msg1" )->rawParam( Html::element( 'a', [ 'href' => '...', ], $this->msg( "msg2" )->text() ) )->escaped();

# Alternative 2: markup 1
msg1: Please see our <a href="$1">terms of service</a> for more information
call: $this->msg( "msg1", '...' )->text(); // Lacks proper escaping!!!

# Alternative 3: markup 2
msg1: Please see our $1terms of service$2 for more information
call: $this->msg( "msg1", )->rawParam( '<a href="...">', '</a>' )->escaped();

Instead, if we could do embedding, things would be quite simple for translators and developers:

# Suggested solution
msg1: Please see our {{#embed:$1|terms of service}} for more information
call: $this->msg( "msg1" )->rawParam( Html::element( 'a', [ 'href' => '...' ], '$1' ) )->escaped();
// The $1 inside the link gets replaced with "terms of service" from the translation with same escaping as the rest of the message.

It is also possible to device a custom syntax to make it shorter, but that is probably not necessary as translators encounter a lot this kind of syntax already with PLURAL, GRAMMAR, GENDER and some others.

Possible approach:

See https://github.com/Nikerabbit/monkey-i18n for proof of concept for this idea. It also supports typed parameters, so that GENDER, PLURAL etc can validate that they are really getting a user or number, and even format it automatically without the need to use numParams().

Improve MediaWiki support for grammar

edit

Tags: #mediawiki-core #php #backend

Description:

Grammar is complicated. Amir Aharoni started a project to move grammar rules from code into data (regular expressions in this case) that can be reused both in JavaScript and PHP, and more easily tested. However, this has only been done for a few languages.

And regular expression are likely not flexible enough for all languages. There exists libraries such as Open morphology for Finnish that do a better job. The task for this project would be to investigate how these kind of libraries can be integrate into MediaWiki message processing. Speed is of concern here, as is how to manage the dependencies and services, as these libraries for different languages are developed by different people using different programming languages etc.

Impact: Way for replacing our home-grown grammar support with better solutions and collaborating with them.

Make collation support more robust

edit

Description:

MediaWiki is able to support ICU collation, and also has some support for custom-built collation (e.g. in Bashkir). However, it must be enabled manually on each site. It makes sense to have MediaWiki core specify a default collation for language that are supported in ICU or inside MediaWiki (T164985, T47611). Without this, even languages that already can have good collation may be missing it. This affects all non-Wikimedia users of MediaWiki as well.

Synergies to reduce maintenance costs and friction

edit

Librarization of MediaWiki i18n

edit

Tags: #php #javascript #mediawiki-core

Description:

We should have a reference library which embeds all our learnings and best practices on i18n handling and l10n formats, to promote and use it widely in PHP and JavaScript projects. The library should also try to unify the custom/diverse formats like those for dates from moment.js or others (compare phabricator:T31235).

Does simplei18n do any of these?

Possible approach:

Collaborate with Globalize.js, cldr.js and other projects and ensure our jquery i18n projects works well with those with minimal overlap. A complete solution includes everything like time and number formatting, localisation file formats, message delivery, message formatting, input methods, web fonts, language selection. It also makes no sense for us to main two i18n libraries written in JavaScript. Make jquery.i18n good enough and make MediaWiki use it.

Currently, we have a sort of conflict between our own PHP and JavaScript libraries and even many Wikimedia projects in PHP end up using custom solutions. At translatewiki.net we have multiple PHP projects and many php projects that would benefit from high quality i18n library they could just plug in. Licensing issues might be a problem if we want to reuse code from MediaWiki.

We don't have recommendations for important languages like Python, which are "stuck" with Gettext (or custom formats like pywikibot?).

Extract our PHP message parsing code to a library

edit

Tags: #php #javascript #mediawiki-core

Description:

There are many PHP projects that would benefit from high quality i18n library. MediaWiki has many excellent features such as extensive handling of parameters, parameter types etc. It has some drawbacks though such as not being able to support nested constructions.

See also https://github.com/Nikerabbit/monkey-i18n

At translatewiki.net we have multiple PHP projects. The licence (GPL-2.0+) might be a problem if they want to reuse code from MediaWiki.

Translator hub

edit

Tags: #php #translate-extension #gsoc-outreachy

Originally proposed as: Translate Roll

Description:

The number of wikis using translation extension has increased significantly. At translatewiki.net, in some rare cases people run out of things to translate. It would be benefical to have some kind of central place to see translation status across the Translate universe. It would facilitate cross-project collaboration and raise awareness of different wikis having different kinds of content to translate.

Possible approach:

Various ideas have been floated for implementation, from one special page just listing overall translation coverage in each wiki for a given language, to a "blog roll" type of links across wikis as well as single sign-on systems to ease moving between wikis.

Increase impact on projects beyond Wikimedia

edit

Providing our translation resources to additional projects, which are not directly or entirely under the Wikimedia umbrella, can often give back a lot: translatewiki.net is an example of a success, where for instance "offering" our community of translators to additional software projects has helped expand the number of translators also for MediaWiki.

Continuous translation update service provided by translatewiki.net

edit

Tags: #back-end

Description:

Provide an efficient service/API for any product to automatically update their translations live. It is not necessary to implement as part of MediaWiki with PHP. Could also be Node.js/Go or something else.

See also Display "where are my translations deployed" information at translatewiki.net, because some tracking what is exported where is needed as well.

Impact: Increase in the number of software projects that adopt fast translation updates. Increased translator motivation and quicker fixes thanks to seeing translations in action faster.

Open source machine translation corpus and translation memory

edit

Tags: #difficult #translate-extension

Description:

There are many open source software products out there with translations. It would be great if those translations could be harvested to provide a open source translation corpus and translation memory. See also https://intense.wmflabs.org for an attempt that has since been stalled.

See also http://okapiframework.org/wiki/index.php?title=OpenTran_Translation_Repository_Connector

Release translatewiki.net translation memory data

edit

Tags: #translatewiki.net #puppet #php

Description:

We can either provide periodic dumps in TMX or some other format or provide it as a service. The service could be registration-only to make sure we can handle the load.

Impact:

The data can be useful for machine translator developers or researchers due to the very high number of parallel languages.

Automated exports from translatewiki.net

edit

Tags: #php #puppet #back-end #security #git #translatewiki.net

Description:

Translatewiki.net exports are currently semi-automated to the level that one needs to run one script and watch the output. Ideally, it should be fully automated, run by a cron job.

The issues we currently consider blocking this are:

  1. migration away from personal ssh keys to a key used by translatewiki.net
  2. secure handling of this service ssh key
  3. migration of all projects to our new repository management tool (repong)
  4. reliability of exports
  5. automating the process of addition for new languages
  6. defining the automation via puppet

Possible approach:

Step 1) can be completed by poking existing products to add access with the new key, preferably using an account for translatewiki.net. See for example https://github.com/translatewiki.

Step 2) would need advice from people with experience on this kind of thing to make sure it is secure. Obviously the automated exports would need access to this key, which is currently password protected.

Step 3) requires adding support to non-git version control systems to repong (written in PHP)

Step 4) would entail adding more checks on our end to verify we are not creating broken files. This rarely happens in syntactic level, as we use pretty standard libraries and battle tested code, but on higher level this can happen (e.g. not outputting authorship info). We should also better handle failures (logging, making sure admins can easily see and act on them). One issue is that the project might commit changes between our last import and following export. At minimum we should abort the export if this happens, by checking that we are exporting to the same revision as we have imported.

Step 5) needs more thought. Many projects need to add a code map or register new language in a separate file. Perhaps we can devise a safe way to run scripts that these projects create, or just not export new language automatically, falling back to humans to add those manually.

Step 6) is just making sure this all happens automatically by having cron or similar execute exports periodically.

Support translation of multiple software branches

edit

Tags: #translate-extension #translatewiki.net #php #yaml

Description:

Software translated in translatewiki.net usually uses the master branch as input and export. This means that once a stable branch is created, it stops receiving translation updates. It should be possible to translate, import and export multiple branches simultaneously. When translating, the messages which are same across branches should only be translated once.

Branch support has two benefits:

  1. software that is branched but not yet released can receive translation updates
  2. software that is already released, can release minor updates with latest translations

In the past, for MediaWiki core, minor releases were kept up to date with a great deal of manual effort, see translatewiki:Repository management#Backporting to stable.

Impact: Users get more and better translations faster.

Invite big FLOSS projects to join translatewiki.net

edit

Tags: #translatewiki.net

Lately the number of active translators at translatewiki.net has not grown. We should try to get big projects like KDE to get growth in number of translators.

Alternatively, convince people like the Free Software Foundation to adopt MediaWiki+Translate for the translation of their software with as little quirks as possible.

With increased number of translators, it is expected that proofreading will increase, providing more consistent translations. It also justifies increasing resources to support the development of the platform. Would also help with: Promote i18n best practices.

Increase coverage of our translation tools on Wikimedia and MediaWiki translation

edit

Translation of non-prose MediaWiki strings

edit

Tags: #php #javascript #full-stack #translate-extension #structured-data #translatewiki.net

Description:

Magic words, special page aliases and namespaces should be translatable with a web interface to:

  • allow translators to change or update translations easily and quickly, without having to know about order of precedence or allowed characters and so on, but also reports on mistakes;
  • keep translations in a data format which is resilient to mistakes (no fatals due to data errors) and can be easily exported to the repositories (without worrying about removing translations which should be kept for backwards compatibility), like some JSON format on ContentHandler pages on translatewiki.net;
  • ideally, export such updates as part of the usual scripts to follow the usual continuous translation model and reduce breakage.

Translate SVG files in Commons

edit

Tags: #php #javascript #translate-extension #full-stack #wmf-production

Description:

The TranslateSVG extension has been developed as a part of Google Summer of Code project. It needs further development to bring it to the level it can be deployed to Wikimedia cluster. Also the community needs to be involved in testing this tool to make sure they want it.

Impact: Much more translated illustrations available for use in Wikipedias and other sites to make people understand better.

Subtitle translation

edit

Tags: #translate-extension #wmf-deployment

Description:

Commons supports subtitles on videos. Those are translated by editing a wiki page containing a special syntax. For multiple reasons this is not ideal. We should be looking into possibilities of integrating this with our existing translation tools or with some other free software tools that already exists (perhaps by integrating those into our tools).

Possible approach:

The goals should be: discoverability of things to translate, easy translation, modification tracking and no extra steps to have translations become available.

See also T44790 TimedText integration with Translate.

Would help with: #Multilingual Commons.

Wikidata translation

edit

Tags: #php #wmf-deployment #translate-extension

Description:

Translate extension could help people to translate properties and other content at Wikidata by the following ways:

  • Easy access to content which still needs translation
  • Statistics about translation coverage
  • Providing a familiar interface for translators

This will be even more important if commons:Commons:Structured data gets real.

Translatable PageForms

edit

Tags: #translate-extension #pageforms #translationadmins #php #gsoc-outreachy

Description:

PageForms is an extension previously known as SemanticForms. It allows to create forms for inputting data in a structured manner. It would be great, if it was possible, when creating a form with a form, to ask the form to be made translatable.

Possible approach:

This can be done manually after the form page has been created, but it is a laborious process to extract all the strings manually, create dozens of pages and manual configuration in LocalSettings.php (only available to wiki administrators). This all could be simplified with a better integration to something like one checkbox to check during form creation. After that one could go to Special:Translate to translate the form.

Gadget localisation

edit

Tags: #javascript #php #translate-extension

Description:

Provide localisation facilities for gadgets. Very difficult or impossible if we rely on gadgets 2.0 or gadgets 3.0?

Impact: Would help #Multilingual Commons.

MediaWiki multilingual documentation

edit

Tags: #documentation

Description:

As seen from many empty or semi-empty pages at Help:Contents, most of the help pages and manuals for MediaWiki core (and certain extensions) still live on Meta-Wiki or rather on local Wikipedias: this leaves most small Wikipedias and other projects in "small languages" with no documentation or outdated documentation. Everything should be central, translatable, easily and equally available from all wikis.

Multilingual Commons

edit

Description:

Help Commons become truly multilingual. Content (like SVG and srt), categories, templates, gadgets etc. should be translatable.

Possible approach:

File descriptions should be shown in the chosen language with a method supported by MediaWiki, rather than a a JavaScript hack (requested e.g. at T10287).

Once translations are available, true language selection for everyone is needed.

«Jon Liechty [...] indicated that half of Wikimedia [Commons] uses the English language template, but the rest of the languages fall off logarithmically. He is concerned about the "exponential hole" separating the languages on each side of the curve.» [1]

Wiki page translation advanced issues

edit

Having a fully multilingual MediaWiki with Translate is possible, but sometimes tedious, especially for large wikis with a lot of history. Certain tasks should require less manual work than they do now.

The known pain points include the translation of categories (T31928, T31975, T46867) and templates (T47096).

Would help with: #Multilingual Commons, #MediaWiki multilingual documentation.

Language for users in general

edit

Captchas suitable for all languages

edit

Tags: #confirmedit

Description:

Wikimedia CAPTCHAs are broken: they don't stop any bot or spammer, but they harm real users and are harder for non-English speaking users, impairing the growth of non-English wikis. The captchas should be localised or make language agnostic. Research is needed to identify different CAPTCHA options, designed for multilingualism. See related bug 32695 (mostly focused on the reCAPTCHA-like solution with Wikisource integration). Some prototypes have been designed a while ago.

Impact:

Over three millions captchas are filled every month on Wikimedia projects. Risk of failure is high, but when it succeeds, the rewards may be huge.

Possible approach:

Preliminary discussion and general questions to mentors should happen on Talk:CAPTCHA; please create specific proposals/applications as subpages of the page CAPTCHA and discuss them on talk.

Machine translation of discussions

edit

Tags: #community #javascript #gadget?

Description:

Wikimedia now has infrastructure for providing machine translations via a service (partially based on unfree software). These services are now in use by the Content Translation tool and the Translate extension for page translation. We could also use these services in wiki discussions, to request translation of a comment of whole thread, to help non-speakers to understand what is being discussed, without having to copy-paste the text manually to a translation service. This could be integrated into Flow, for example.

See also: T98728

Possible approach:

One component that is required for this is to detect the source language. Often we can assume it is the default language of the wiki or page, but in multilingual wikis such as Meta and MediaWiki.org, it is necessary to use a library or service that identifies the language.

Language selection for anonymous users for Wikimedia sites

edit

Tags: #universal-language-selector #wmf-production

Description:

Multilingual Wikimedia sites such Commons, Meta, Wikidata and MediaWiki.org require to register a user account to change interface language. It should be possible to change the language without registering a user account and logging in and without relying on the traditional uselang hack and uselang links created via JavaScript.

edit

Description:

The portal of each project, like wikipedia.org, should have a multilingual search, automatically returning results in the most relevant language; see also bug 1837. Such multilingual search could then be expanded to Special:Search of each wiki and triggered either automatically, or as fallback (see also above), or as another option/profile/whatever.

Localised Wikimedia shop

edit

Description:

The Wikimedia Shop should be translatable. At least the interface, but why not the product descriptions as well.

Good internal search engine for all languages

edit

Description:

A lot of work needed, see for instance User:TJones (WMF)/Notes/Language Analysis Morphological Libraries.

Motivation and understanding

edit

Activity reporting and engagement

edit

Description:

Project administrators/coordinators (project contacts on translatewiki.net) should be able not only to have a clear sense of what work is going on (#Relevant statistics), but also of what translators/languages may need an additional effort (or vice versa are going especially well), in order to be able to contact translators where needed. Detailed reporting may be needed, if not an interface to semi-automatically send notifications in certain cases (such as translators who've reduced activity a lot in a language which needs more translations).

Thanking translators is still best done manually, but project contacts need to know whom to thank (knowing about new languages exported may also be helpful to tweak their configuration to actually use them, at times).

Impact:

The ability to easily communicate with "your own" translators can help project administrators build a sense of community and make them feel they're still in charge of the project even though they've merged with a larger wiki/community.

Possible approach:

Translators should be able to stay on top of new translation work easily, e.g. by subscribing to feeds and notifications in the projects of their interest when there are new messages in the source language or requests for translations update (which no longer triggers edits and hence escapes enotifwatchlist). They are also interested in knowing how they rank against others, but our tools to this purpose may be: currently we have a monthly rank on the main page, a contribution count with a babel template and total "ranks" with language statistics

Allow to solicit translations for specific subsets of messages

edit

Description:

People interested in the localisation of some project often has some time-bound need for a specific subset of its messages to be translated quickly. For instance, developers or project coordinators may want a group of recent messages to be translated in as many languages as possible before a release; or interested users may want to increase translations of a handful messages which they find especially important.

The typical way to solicit translations is to make a clickable list of messages and advertise it by public or private messages (in Wikimedia, translators-l is often used).

Minimal solution: Special:SearchTranslations and/or Special:Translate allow to search by message key (message name), ideally with regular expression patterns too, and to link such filtered view. Related tasks: phabricator:T93527#1254537, phabricator:T90144#3139578.

Current workarounds include:

  • a wiki page containing wikilinks to each message (tedious, requires to handle subpages; translators have to open many tabs and use the bare wiki editor);
  • using pre-TUX Special:Translate to list all messages in the group (and link an anchor) or list each message individually but in all languages with Special:Translations (tedious, slow to load, unsupported, uses old interface);
  • making a custom message group (requires intervention of developers and/or sysadmins).

Possible approach:

In an ideal workflow, Translate (or TranslationNotifications) would also be able to handle the delivery of such notifications/requests, with two possible levels of integration:

  1. allow to send a manually-composed message (with said URL) to a list of languages/projects/users,
  2. also allow to build such message and list of recipients.

The focus should probably be on an interface to query message groups (to identify the needed messages) and make a list of users depending on their language (and its translation level for the needed messages), activity and interest in the project. The actual delivery could be delegated to existing systems such as Extension:MassMessage.

Relevant and reliable translation statistics

edit

Tags: #php #javascript #full-stack #gsoc-outreachy

Description:

Sometimes projects want to know more about the workload for translators and so on. Translate offers a lot of reporting, but one simple feature we're currently lacking is the ability to count translations by number of words rather than by number of messages. Additionally, our statistics pages are currently missing information about proofread progress.

Possible approach:

Needed components:

  1. a way to computer number of words (should work in any language)
  2. storing the number of words somewhere in the database for quick access
  3. updating the statistics pages to use words instead of messages

Special:LanguageStats and Special:MessageGroup stats look outdated compared to TUX editor and need a face-lift too. In addition we could make them Web 2.0 compliant and make them faster with AJAX by not loading all information immediately. When the number of messagegroups grow to thousands, the page gets slow to load and use.

A reliable way for system administrators or wiki administrators to force hard updates of statistics and all caches may also be welcome, to easily overcome and problem with cache or job queue or other (compare phabricator:T145295#2916209).

See also https://phabricator.wikimedia.org/tag/mediawiki-extensions-translate/ column "statistics".

Semantic Translate

edit

Tags: #semantic-mediawiki #translate-extension #php

Description:

Lots of information could be made available via semantic properties or other means. Some examples:

  • How many/who have reviewed a translation
  • Who has translated a message
  • Language of the translation

Impact:

Such a semantic system would make it simple to build queries like "Top translators for MediaWiki in Finnish in 2013".

See also https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/2300

Display "where are my translations deployed" information at translatewiki.net

edit

Tags: #git #php #javascript #full-stack #gsoc-outreachy #translate-extension #translatewiki.net #translators

Description:

New translators in particular want to easily have feedback on their translations and what happened to them. On Translate wikis in general, watchlist (for accept log and modifications) and contributions (for a mere list) are not enough, especially for unlogged/separate actions like setting workflow state, pushing to CentralNotice, copying to another wiki, exporting to a CVS. Credit: Gloria_S.

It is especially unclear to people when their translations will appear in the software. With some more integration of repository scripts, it should be possible to add metadata to translation revisions in which commits or branches they are included. Different kind of summaries can then be built on this data, such as "these translations of yours are still waiting to be exported".

Possible approach:

The implementation would consists of two mostly independent parts:

  1. a tool that reads git repositories to check for each translation in which branch they are,
  2. an interface that displays relevant information to the translators.

The interface could also be RSS, Twitter, IRC, whatever makes sense (perhaps also for imports): the main benefit would be transparency for what and how much we do. Since 2016, rakkaus sends some messages to IRC for autosyncs, but these are cryptic. Going further, we could send out notices to translators "Your translations are now visible to users".

As an extension, we could try to hook up into the Wikimedia LocalisationUpdate process and the release processes of different projects to also record the information when they are deployed. This is likely much more complicated.

Near-real time translation collaboration

edit

Tags: #translate-extension #javascript #front-end #translators

Description:

When multiple people are translating same group (say a translatable page), it would be easier to see the updates they do live, something akin to etherpad. It doesn't need to support multiple people editing the same message at the same time. Even seeing what messages are open (and their content) would help. Credit: neverendingo.

edit

Tags: #translate-extension #front-end #javascript #translators

Description:

Wiki editing and discussion doesn't work with many users not agreeing on what's best. A quick list of alternative (past/proposed/used elsewhere) translations with relevant info would help find what's consensual.

See for instance fundraising messages which in theory should gradually approach perfection but in reality only see things changed back and forth continuously, with the worst translations usually surviving. Situation is made worse by: a) history not being easy to reach from translation editor, b) discussion even less, c) activity and discussion around same translations in other messages being completely lost, d) people never finding where the message is (especially from other wikis as with banners).

Impact:

See also T47831 Discussion about translation.

Would help with: #Display "where are my translations deployed" information at translatewiki.net

Improved metrics and data representation

edit

Migrate Language team statistics to Reportupdater

edit

Tags: #translate-extension #contenttranslation #analytics

Description:

Currently, Language team's statistics are collected using a lot of scripts, manual work, and Google Sheets.

This should be more transparent, more automatic, and less dependent on external sites, using the Report updater tools.

Impact: Relevant data about language tools, and their use is available and transparent.

edit

Compact Language Links (CLL) are enabled by default in most projects as of March 2018. They proved to be more effective for most users, because they are clicked more. They allow people to search for languages, and to analyze what pages they cannot find in their language (they should be a proper report about this some time soon).

However, the deployment of Compact Language Links is not the last step at the improvement to the design interlanguage links. There are several more things to do:

  • Complete the deployment of CLL to Wikivoyage and Wikisource. (https://phabricator.wikimedia.org/T136677)
  • Fix the main pages design both for people who use CLL and for people who don't. https://phabricator.wikimedia.org/T179140 This is generally desirable for design consistency, and the analysis of failed searches in the Compact Language Links' search box hints that the main page could be much better designed for getting traffic to Wikipedias in languages that people really need. (The details of the analysis will be available as part of the failed searches report when it's ready.)
  • The design of the interlanguage links on desktop, mobile web, and mobile apps could be unified where it makes sense. The comparison at the page Interlanguage links/Implementation comparison shows that at least some effort was invested in making the design on mobile platforms common, though it's still not exactly the same, but there was never any attempt to make it unified with the desktop design. At least some aspects could be shared:
    • Should the article name be shown or not? (This is a question that requires design research.)
    • The language name search box should use the same search API or at least provide the same features.
    • The location of the element to access the languages list should be common, but more importantly—measurably efficient. On the desktop it is sometimes at the bottom and sometimes at the top. On the desktop it is on the sidebar and usually far down (see a separate item).
  • The whole of the languages list on the desktop could be moved higher up on the page. This was already attempted in the Winter project, and in the Timeless skin, but never massively tested in production. The guiding principles behind this design update must be:
    • Make other languages more accessible to people who need them.
    • Make the element not too intrusive for people who don't need other languages.
    • Don't make other tools in the sidebar and the personal bar (log in / out, contributions, recent changes, etc.) less accessible that they should be. This will require analyzing how much are they actually used and by whom.
  • We should consider making all languages accessible from all pages. Currently we show only those languages in which the current page is available. There are several arguments for showing all the languages: showing all the languages creates more chances to raise the awareness that a Wikipedia in a language exists, and more chances to attract a potential editor to translate the article if it doesn't exist. However, this raises legitimate concerns about the appearances of the links to pages that don't exists—they must not overload the selector, and must be somehow distinguishable from links to articles that do exist. There's also the question of where will they lead: ArticlePlaceholder, ContentTranslation, Wikidata, Main page, or something else? In any case, this must be carefully examined and properly designed.
  • There are many elements around the languages list: CLL's "X more", ULS's cog, and Wikidata's Add links / Edit links. Their placement and functionality could be redesigned. See https://phabricator.wikimedia.org/T54383