Language and Product Localization Technical Support/Background
Summary
editThere are over 800 Wikimedia wikis across various projects. Wikimedia’s language communities have a wide range of needs around language technology. Their requests include adding and removing languages from projects (e.g., MediaWiki core), adding font and keyboard support for languages, Translatewiki, and other miscellaneous language-related configuration changes such as RTL fixes, and spelling and grammar adjustments. According to research conducted by the Language Diversity Hub project in 2022, 13 language communities across Europe, Africa, Asia, and Latin America were interviewed to understand their challenges related to technology, education, economy, and/or social conditions. The top challenges identified were: few contributors, language tech challenges, and the need for more training. Small and new languages need the most technical support since they usually have only a few tech savvy volunteers and it can be overwhelming for them to tackle all the technical needs for their communities.
New Language Communities
editThe Wikimedia Foundation Vision “Imagine a world in which every single human being can freely share in the sum of all knowledge”. There are existing Wikipedia projects in a little over 300 languages. About a 100 of them are active, and the others less so, even though some of them are in languages spoken by many millions of people. And there are also thousands of other languages in the world, in which there is no content at all on Wikimedia projects and to varying degrees both online and in offline written publishing. Some of the languages in which there is little or no activity on Wikimedia projects are spoken by millions of people, many of whom don’t know any other language in which there is a successful Wikimedia project. The reasons why these underrepresented languages have low Wikimedia presence and general low presence in online and offline written publishing are systemic, complicated, and diverse. There are some things that we can do to make the development of Wikimedia content in these languages more accessible. Some of them are in the space of product design and infrastructure, and some of them are more in the space of human-to-human community support.
The Wikimedia Incubator, is where potential Wikimedia project wikis in new-language versions can be arranged, written, tested and proven worthy of being hosted by the Wikimedia Foundation. The Wikimedia Incubator was launched in 2006 with the assumption that its users would have prior wiki editing knowledge. This problem is exacerbated by the fact that this process is supposed to be mostly performed by people who are the newest and the least experienced in our movement. While editing on Wikimedia wikis has significantly improved since then, the Incubator hasn't received these updates due to technical limitations.
Existing research and materials reveal technical challenges in every phase of language onboarding: adding new languages to the Incubator, complexities in developing and reviewing content, and slow process in creating a wiki site when a language graduates from Incubator. Each phase is slow, manual, and complex, indicating the need for improvement. Addressing this problem will allow creating wikis in new languages more quickly and easily, and allow more humans to share knowledge. Various stakeholders, existing research and resources have highlighted proposed recommendations both social and technical. The current process of language onboarding is divided into phases, each involving a series of complex processes:
Before Incubation: The process of request creation for a language in Incubator involves various manual steps, including understanding project principles, creating Meta and Translatewiki.net accounts, confirming language eligibility, translating essential messages.
During Incubation: Incubator faces technical limitations and lacks many of the modern features found in other Wikipedia wikis. This deficiency leads to a poor editing experience for contributors, as highlighted in numerous Wikimedia convenings and previous research. Language wikis often remain in the Incubator for several years before graduating, primarily due to the poor editing experience, fewer community contributors, and a shortage of native speakers. According to April 2024 statistics, the average duration for a language wiki to graduate from the Incubator is 4.4 years (e.g., Fon Wikipedia).
After Incubation: Upon approval of a language by the Language Committee, the setup of the wiki site, content importing, and ongoing maintenance involve a series of manual steps that are carried out through collaboration among community members, the Language Committee, and server maintainers.
Existing Language Communities
editMost languages have similar issues: they are spoken by many people (many of them by several millions of people), but they are not written or used online much. The work on creating the wikis is done by activists who know and love their languages and want to contribute local knowledge in them (and in other, more established languages), but they have several challenges including Lack of modern terminology, for both localization and encyclopedic article writing and Platform level issues or use of old devices where scripts, or fonts are not always update. As a support team, currently we are supporting the below challenges in one way or another however there are many more challenges that we can look into in the future that these languages face in general.
Localisation activity
editAccording to the current policy, initial creation of a wiki requires the translation of 500+ most basic MediaWiki user interface messages. This serves as an incentive to translate them. After this is done, however, there is practically nothing that incentivizes users to keep localizing the user interface.
In some languages, such as Tyap, localization activity continues even after graduating from the Incubator. However, this happens thanks to excellent volunteers who understand the importance of localization, and it's the exception and not the rule. Most languages have low localization activity after graduation, and they are stuck at about 12% localization completeness in core and even lower in extensions.
The people who are best equipped at doing to make those translations usually also know English, French, or some other fallback language, and don't notice that something is untranslated. It would be good to develop incentives to keep completing the localization, which is growing literally every day. Currently, we are randomly reaching out to volunteers encouraging them to translate. For example utilizing community events, where we get a chance to meet community members speaking various languages. At the Wikimedia Hackathon 2024, a Swahili volunteer, did a lot of localisation work on translate wiki, among other things completing the translation of the mobile front end.
Localisation quality
editIn some languages, such as Igbo, there were complaints that localization was done in the past by people who don't know the language well, use a rare dialect that is unreadable to most other speakers, or made mistakes because of low wiki editing experience or poor understanding of the technical terminology. As a result tasks related to such may be reported for our attention. Broadly and collectively, we should encourage more localization activity in all languages, and specifically, encouraging people to write localization guides and glossaries in each language. Currently, there are localization guides for fewer than twenty languages:
Localisation terminology
edit"Finding the right translations for some of the technical terms is a challenge in several of the communities. Many of the technical terms have no equivalent in these languages, and good processes for establishing new terminology on the conditions of the language will be of value for the languages in total. Some communities contribute in a language that has no standardized written form. Some languages have developed strategies for negotiating issues of dialect difference, spelling divergence, and the lack of an official language standardization guide (e.g., Scots). Connected to the lack of standardization, is the limited ability to write texts, both short and long, about modern technology-oriented themes, which is caused by generally low availability of such texts even outside the Wikimedia ecosystem. This might lead to challenges when translating terminology, as well as with the general quality of the content. A linguist in the network describes a situation where many languages borrow terms directly from English instead of finding their own terms" (from Barriers_experienced_by_contributors_to_small_language_versions_of_Wikipedia.pdf)
(Basic glossary: https://translatewiki.net/wiki/Translating:MediaWiki/Basic_glossary )
Keyboard support
edit“A recurring issue among all the smaller language communities is that tools for supporting their language online are lacking; they have insufficient keyboards for typing in their own language, or there might be few or no online dictionaries or spell checking software. There are some on-wiki solutions for keyboards for some languages, and a few dedicated people in the movement have put a lot of time and effort into supporting those languages. However, some of the solutions only work on wiki platforms, and they require maintenance. Besides, these languages deserve to have keyboards available everywhere, on all devices and platforms. The lack of digital tools is not only a barrier for Wikimedia activities.” (from Barriers_experienced_by_contributors_to_small_language_versions_of_Wikipedia.pdf)
RTL
editRTL languages are generally usable on the MediaWiki platform and technical support cost for them is relatively low thanks to CSSJanus.
In most features of day to day reading and editing, minor RTL issues do occur, for example misplaced icons or form labels, misaligned text, words appearing in incorrect order. However, they are either quickly fixed, or they are deprioritized because they are barely noticeable. Many examples of current open tasks can be found at the RTL board in Phabricator.
However, some important issues and concerns remain including Wikitext editing, Transition to Vue, Wikifunctions and many more.
Incomplete configuration
editSome languages have incomplete configuration: no namespace and magic word translations (or incorrect translations), no date formats, no grammar rules (even though they are necessary), no digit conversion, and incorrect autonyms. This particularly affects languages that were added to the system long ago, in the mid-2000s—their codes and names were added, but some configuration details were not. This is slowly improving by personal outreach to people who speak those languages and making patches to complete the configuration (recent examples include Twi, Kinyarwanda, Hausa, Igbo, Xhosa, and Swazi). Many languages, however, are still far from complete. In future we can consider, systematic mapping of lacking configuration can be performed.
Missing features implemented as templates and modules
editThousands of features that are available in wikis in larger languages are implemented as templates and cannot be conveniently used in any other language. This includes infoboxes, formatted references, navigation boxes, and many others (see examples at https://www.mediawiki.org/wiki/Global_templates/Taxonomy). To be able to use them in their languages, editors have to implement them from scratch or manually copy each of them into their wiki, which is unsustainable because the number of those features grows constantly, and most of the smaller wikis have no people who are able to continuously do this technical work. This has been discussed for years in various forums (Movement strategy, Community wishlist, etc.). In 2024, it was confirmed again in the "Connecting Wikifunctions to Wikipedia Opportunities and Challenges" report.
Complaints about those issues come up very frequently in the context of Content Translation and general editing. The support team can, at most, point the requesters to documentation about importing templates. The problem should be addressed more systematically.
Issues related to language support, but handled in practice by other teams (if at all)
edit- Search - handled by Search Platform team
- Language Converter - occasionally handled by Content Transform team
Volunteer Engagement for Technical Support
editEngaging volunteer developers in Language technical tasks
Language communities often lack enough tech-savvy volunteers to address technical challenges for themselves and their communities, particularly smaller ones. There are only a few contributors who know how to handle this work, and it can be overwhelming for them to address all the technical needs of a community. For example, technical requests from the Indic language communities are sometimes single-handedly managed by User:Jayprakash12345, one of the experienced technical contributors who effectively tackles these issues. Here is another quote from a contributor from Bangladesh:
“From my experience, I've seen that small Wikimedia communities struggle a lot with dev works or even MediaWiki configuration. My idea was to create a team having that capacity and directly receive requests from different Wikimedia communities and work as a one-stop solution for light technical support on a rolling basis”.
Given the Foundation’s pillars of nurturing volunteers with a long-term vision to become a multigenerational project, bringing volunteers together to collaboratively solve these tasks and helping build a network of volunteers who can support each other is the way forward.
A strategy for volunteer engagement for technical support in language technology is essential. Some new or existing approaches include:
Community Spaces: Create spaces (both online and offline) where community members can share knowledge, ask questions, work on projects, and provide mutual support. An example of this is organizing monthly language community meetings and participating in hackathons to bring people together for collaborative problem-solving and technical tasks. In addition to these usual venues, onboarding people to work on tasks year-round through new approaches—such as collaboration with external and internal developer groups, meetups, promoting language technical support area, and providing onboarding support to volunteers—will be a focus for the upcoming year.
Outreach: Utilize the Language and Internationalization newsletter and Diff blog posts to promote language-specific technical tools, events, and updates.
Advocacy: Secure dedicated support from experienced developers and reviewers in Language & Product localization to assist volunteers and advocate for the needs of technical users and contributors from smaller language communities to relevant stakeholders (e.g., WMF Product/Tech teams).
Partnerships; Collaborate with relevant stakeholders to support Language technology related needs & related program design/implementation e..g Language diversity hub, Unicode, etc