The following discussion has been transferred from Meta-Wiki.
Any user names refer to users of that site, who are not necessarily users of MediaWiki.org (even if they share the same username).

LTR, RTL, BiDi issues

edit
  • Hallo! Sorry that I start with this but directionality should be an issue we should think about from the beginning.
  • It is easy to see how the MediaWiki evolved from a single language project using ISO/IEC 8859-1 titles to a multilingual project using en:UTF-8. There was a long way and I am happy to see how the software has improuved during the short time I am contributing here.
  • When comming to directionality we can see that the original assumptions do not cover the actual status any more. The contributors to LTR wiki's are not using exclusively LTR user names, are not writing exclusively LTR titles, are not making their comments in the summaries exclusively in LTR. At RTL wiki's they do not prefer all RTL textarea boxes (bugzilla:04011). This is more evident when looking at RTL wiki's where the majority of loged in users are using LTR user names, where a significant number of LTR pages exist, where talks are posted in LTR languages same as the notes in the summaries of the page changes.
  • If the wiki's come together the result will become more and more closer to a BiDi environment and some redesign should be made to the user interface. User:Tietew/RTL problem shows how many of the special pages look like at this point in time and what improuvements are possible with minimal efforts.
  • However there are some limitations when making global assumtions about the directionality of titles and / or "text blocks" from certain languages / wiki's as shown in bugzilla:02118#21 and the subsequent links.
  • I forsee that together with each title and generaly with each "object" MediaWiki would need to know the directionality of the title / object; values should be LTR, RTL, BiDI, unknown (which should be added later). This will enable correct rendering with "minimal" influence from the context.
  • At this moment in time the "uselang=xx" does not change the directionality of the (special) pages. In the future MediaWiki would not only need to support changing of directionality of a (special) page but would need to be able to mix different "blocks" within one page and to pass this directionality in variables which can be used in templates.
  • Typical examples of such bidirectional pages are
the extension made by ThomasV: see bugzilla:04104 – "render a text and its translation, on two columns"
many talk pages at RTL wiki's and at the multilingial projects
BiDi projects such as ku:, lad:, pa:, yi:
  • Some changes to the user interface will show that we leave the old kind of thinking behind. See bugzilla:04125 or identify other items at etc..
  • Many further developments, configurations and tests will need to become "BiDi complient" (see: bugzilla:04547). This is no big deal if the contributors to the RTL wiki's can come together, identify the problems and fix them one by one. I am very confident that we can do this job together.
best regards Gangleri | Th | T 04:55, 18 January 2006 (UTC)Reply
Directionality is definitely an issue that is not addressed by the current proposal. Frankly, it tends to give me a headache to even think about it, given that I find it extremely difficult personally to test RTL user interfaces -- I never know what is a bug and what is expected behavior, and the very oddness of it tends to frustrate and irritate me immensely. Nevertheless, I'll try to wrap my head around some of the problems involved, as we clearly cannot avoid the issue. I'm not sure we'll be able to solve all of the problems in the first implementation of this, though.--Eloquence 05:43, 18 January 2006 (UTC)Reply

Thanks Eloquence for the feedback. I started with BiDi issues just because they puzzeled me last year. Dealing with these topics requires a lot of patience because - as you said - you never know what is wrong: your action, the operating sytem, the browser, MediaWiki etc.
End of last year some contributors at various projects started some ideas about a BiDi workgroup. Finaly I hade some time to write some lines. I hope that together we will be able to solve the items. best regards Gangleri | Th | T 14:56, 18 January 2006 (UTC)Reply

Opened bugzilla:04742 with references to 04126 (and some minor issues 04050, 04039). Gangleri | Th | T 11:17, 24 January 2006 (UTC)Reply

Language preferences

edit

Comments moved from the proposal

I would recommend simply putting this on Special:Preferences. Adding a separate interface point for non-account preferences would be unnecessarily confusing, I think. --brion 19:29, 17 January 2006 (UTC)Reply

Yes, that is probably a more flexible approach. Changed.--Eloquence 05:52, 18 January 2006 (UTC)Reply

For users without languages preferences or unauthenticated users I think MediWiki should use the Accept-Language HTTP header provided by the client. BernardM 20:48, 17 January 2006 (UTC)Reply

As noted in the proposal, these headers can be used to detect which content language, if any, the user is presented with by default (i.e. when viewing the Main Page). However, I do believe that when explicitly navigating to an English page, I should see an English UI by default, when going to a German one, I should see a German UI, etc. I want to avoid the situation where I'm in an Internet cafe in the Netherlands and suddenly an article I read in an English wiki has a Dutch UI by default (but I don't speak Dutch). I find the behavior of Google to always redirect me to localized, national versions immensely annoying, for example.
Incidentally, doing it this way also makes it much easier to put pages in a Squid cache for anonymous users - no need to have separate copies for all available UI languages. Essentially, this is the way it works on the Wikimedia projects that are split into language databases, and seems to be generally well received.--Eloquence 05:52, 18 January 2006 (UTC)Reply
I prefer it to use wiki language for anon users too.
But there is a big difference with Google: in Google there is one common site (well, there are local sites too, but a lot of people just use google.com; plus the local sites are geographical rather than linguistic (look at google.be); while in Wikipedia there is a different site for each language; so by going to a site the anon user is implicitely asking for a given language. In Google that isn't the case, so using the browser defaults is better (note that there is a google similar to the wiki multi sites: google.com/intl/xx/ with xx a language code)
Srtxg 13:14, 23 May 2006 (UTC)Reply
The problem with having the interface language matching the subject language, is of course that the unwitting user following a link to another language will suddenly be confronted with an interface in that languages as well. Aliter 18:20, 23 May 2006 (UTC)Reply
Which is what happens already when you click an interwiki link. I don't think this is a problem, people can always push the "Back" button on their browser if the first shock of seeing squiggly chinese characters all over is overcome ;) --Mkill 23:58, 29 July 2006 (UTC)Reply
edit

Comment moved from proposal

It seems to me that it would be difficult to track down where the join markers are to do cleanup. --brion 19:32, 17 January 2006 (UTC)Reply

I've tried to address this -- since there can only be one join per page, it should be as simple as navigating to the page and removing the join. This can break a set if other pages are joined to the member you remove, but when that happens, it will be immediately visible, and if the joins were used correctly, it may very well be the intended outcome. I think the biggest challenge are the backend operations which have to be performed on saves, such as merging and splitting sets.--Eloquence 06:11, 18 January 2006 (UTC)Reply

It would be nice to have a feature that shows all joined pages at once, a "what joins here". This page should offer a checkbox to disjoin individual articles. Another question: How is the issue of two articles being joined to one page solved? If, for example the German articles "Recht" and "Gesetz" both carry a join tag to english "law"? From my experience in cleaning up interwikis, this would happen in a huge number of articles. --Mkill 00:03, 30 July 2006 (UTC)Reply

As to converging join-tags, I agree that this happens, I agree that it poses a problem. Imho, quite often the problem can be solved by splitting pages, such as making eng:"law" a disambuguation page pointing to en:"a law" (as passed by parliament) and en:"the law" (standing for a certain spirit, eventually laid down as body of texts ). I know that a language community may oppose splitting when the required precision of concepts, or their division in separable subconcepts, is unpopular. Yet, imho, this more often than not will help, or educate, people towards more clarity, better intercultural understanding, and more successful dialogue. --Purodha Blissenbach 10:06, 29 December 2006 (UTC)Reply

Innerlanguage links: why one would want them

edit

One of the things these innerLangLinks would be great for is new wikies. Most of us simply start from translating good articles to build up an informational skeleton that contains a huge number of empty links (we just leave them in place). One hopes that lately users may want to fill in an article for them. If a single database was used the system could immediately propose existing info in other languages for the empty link, asking the users to help with translations. The more co-operational chances we can suggest to our users, the more we are going to get them involved. Many people who are afraid of writing an article from scratch won't be afraid of simply making a translation. But most of then use wiki for a one-time reading and never ever get to imagine they could be helpful. Think about it --Bertodsera 20:13, 3 July 2006 (UTC)Reply

Is this that store it all in one database thought?

edit

Erik, is this your idea of storing all the data in one huge single database for all languages again? Don't even think of doing it - it'll fail disastrously because of contention issues within the database server. It's already bad enough with things split by language and splitting by namespace may be necessary for the biggest wikis. Jamesday 04:32, 8 February 2006 (UTC)Reply

This should really be an issue of using the correct abstraction. Whether the data is held in a single database or split across several should be transparent to clients of the "database services" layer. IMHO we should be striving for further abstraction so that alterations in one part of Mediawiki have minimal impact on other parts. HTH HAND —Phil | Talk 10:46, 8 February 2006 (UTC)Reply
Yes, the expression "one database" can only mean the way it's seen from client's POV. Otherwise your architechture will never be flexible enough for it to meet growing content size (no matter how you strive to further divide namespaces, you are going to found out that one of them has most of the weight/activity anyway). If you want to balance the load you can only investigate which queries are most frequently used, and make sure the involved data resides on different "inner bases". That is, you need a mid-layer of abstraction that will look as a single database for external users, while breaking the data in usable segments. It would be a nightmare if you had lots of joined queries, but here you basically access an article from a given index. So once you have an "index of indexes" (metaindex?) the rest is quite easy --Bertodsera 20:29, 3 July 2006 (UTC)Reply

ISO 639

edit
  1. This page appears to be following the idea that ISO 639-2 is the two-letter code. It isn't. 639-1 is the two-letter code; 639-2 is the three lettercode, augmented with 639-3 for additional codes.
  2. ISO 639 is a standard. Please, don't define the MediaWiki code as the code currently used by MediaWiki, but rather as the code defined by ISO 639 wherever such a code exists. Aliter 01:07, 28 February 2006 (UTC)Reply
ISO-639 is the basis of much of what we do with languages. ISO 639 is imperfect. The languages /dialects hat we use will be defined in a table. We will include references to standards that define languages. When we define something as a language where ISO-639 does not, it will be functionally exactly the same. It will just not have an ISO639 code. GerardM 14:20, 28 February 2006 (UTC)Reply
ISO-369-3 is meant to include all living languages at least, dialects and 'so-called dialects' as well. Thus we can, and should, request any code we need. Using area subdivision codes as e.g. in en-GB, or en-US, or en-US-TX, allows to some extent to specify dialectal or pronounciation varieties. -- Purodha Blissenbach 17:53, 26 March 2006 (UTC)Reply
Note also that, ISO 639-2 includes codes:
  • mul - (mutiple languages)
  • zxx - (no linguistic content)
  • und - (language undetermined)
So the Mediawiki-private extension Mult: is likely to be unneccessary. It could be replaced by mul: for true multilingual content and zxx: for templates that define e.g. colors or borders but do not display text on their own, or und: for templates taking parameters potentially comprising text. Note that for the latter, if they are fed lingustic content via parameters, it has to be accompanied by a lang code, and probably direction. Final html output should be something like this, as a simple sample scheme:
<div lang="mul">
* <span lang="en"> this is english </span>
* <span lang="la"> ecce lingua latina </span>
</div>

-- Purodha Blissenbach 17:53, 26 March 2006 (UTC)Reply

Sounds like a very good idea to me. —Nightstallion (?) 07:14, 27 March 2006 (UTC)Reply

Thanks, I only meant to go as far as this project not defining a different set of choices as standard, but since ISO 639 now should encompass all languages etc. I agree that it would be preferable to define ISO 639 as the authority here. Aliter 18:20, 23 May 2006 (UTC)Reply

File descriptions

edit
  1. Making languages share filenames means an English-language page will have to refer to "Simba.png" for a wild life photograph of a lion if the uploader happened to speak Swahili, and similar for other languages.
  2. Mult: descriptions would only make sense if they can be limited part that are likely to be translated, eg. a small set of templates. Aliter 01:07, 28 February 2006 (UTC)Reply
When a file has a name like "Simba.png" its name is Simba.png in any language. That is a given. When you assume that a small set of templates will give you translations, then I think you are barking up the wrong tree. Using Ultimate Wiktionary for Commons gives you more functionality and it allows to localise any keywoard. Furthermore it would create a powerfull feedback mechanism for WiktionaryZ. GerardM 14:25, 28 February 2006 (UTC)Reply
  • Filenames: I know this is a design issue, but in the abstract model, there's no reason for the user to see the names the files are stored under at all. The files can be 1.ext, 2.ext, 3.ext. What the user needs to see is the File:Title that will help him link it in the correct spot. That's, after all the function of the "filename": It's serves as a title for a filepage so pages can link to that file. Hence it should be no problem if each gives it a title that can be understood in a different language, just as long as they are linked to the same actual object.
  • Freedom of speech is a right that should be used wisely, but I agree that a small set of templates will not give US translations. I expect that's why I wrote that translations would only give us a small set of templates.

Aliter 18:20, 23 May 2006 (UTC)Reply

No translations

edit
  1. Meaning-tags have the same atom-meaning problem that combined wiktionaries have. Since you're not going for direct translation of pages/categories, adding them here doesn't seem to make sense either.
  2. Create a new language for a page should probably be "Edit page in language", and then include the existing pages marked, eg. in blue versus red. Just speaking of "new" pages means information about the page languages get distributed over multiple lists.
  3. While translation assignment might seem a nice idea, it's probably impossible to implement in a way that will not overflow editors of small languages. Under the title used here it also doesn't fit the "language-versions are not translations" scheme. Aliter 01:07, 28 February 2006 (UTC)Reply
Meaning-tags have a problem when they are only in one language. When a tag is used that refers to a DefinedMeaning in OmegaWiki, it is a matter of translating THAT DefinedMeaning and everyone can use whereever this tag is used. This will not overflow editors of smaller languages; they have to do it only once. GerardM 14:28, 28 February 2006 (UTC)Reply
  • I'm sorry, I should have written out "atom-meaning problem" for those with less understanding of the concept. The reason we gave up on the idea of an all-language translator, is that to do that we needed to establish the meanings that for each language would always be translated in the same way, meanings that are not divided by one of the languages - a-tomos - atom meanings. Unfortunately, every language added will divide quite a large number of the meanings into even smaller ones, as concepts in each language are defined slightly differently (or "represented differently" depending on your approach). That is the atom-meaning problem when defining a meaning-tag: What you conceive as one meaning in your language will not be atomic over all languages. The tags you add to all pages of that meaning, will not fit on some of those pages in other languages.
  • Of course, the "flooding" referred to the #translation assignment, now on this page, but applying it now to meaning-tags: Do you realize the English wikipedia is probably generating categories at a higher rate than a lot of smaller wikis generate pages? Smaller languages would indeed be flooded if they even tried to keep up with that. How will the meaning-tags be less time consuming? Aliter 18:20, 23 May 2006 (UTC)Reply
It appears to me that this sort of flooding will be only a temporary problem (nevertheless possibly lasting a decade or two). The english Wikipedia will earlier than others come to the point where the english language has been (near) exhausted, as far as categories are concerned, and instead of creating new articles featuring not-yet treated concepts, requiring additinal categories, they will to a larger proportion be enhancing existing articles. This should allow smaller language communities to catch up. Usually catching up is a bit quicker than starting from scratch, so one can expect the gap to close. Ok, this approach is somewhat wikipedia-centric, and matters may differ in other contexts. --Purodha Blissenbach 10:24, 29 December 2006 (UTC)Reply

Enhanced Translation

edit

I worked on an article with a friend about six months ago. I hope the article below can be of some help to this project.

m:Enhanced_Translation

--CryptoQuick 21:55, 28 March 2006 (UTC)Reply

Translation assignment

edit

One feature that would make a lot of sense in many applications is a translation assignment manager (TASMAN). A TASMAN would allow each user to define from which language(s) they are willing to translate content. Individual pages could be flagged as to be translated. Based on their language proficiency setting, translators could be notified (by e-mail or wiki messaging) exactly when a page in a language they can translate from is to be translated. In order to manage assignments, "in use" templates could be used, though a task management system such as Magnus Manske's Tasks extensions [1] might be useful for this purpose. (from the original draft by Eloquence)

This would be excellent; even if simply noted as metadata, and done at first by hand. +sj | Translate the Quarto |+ 16:39, 6 May 2006 (UTC)Reply
Note comment above (which predates moving this proposal to the talk page). Aliter 18:20, 23 May 2006 (UTC)Reply

Language set proposal

edit

I have looked at the innerlanguage links proposal and I am not very pleased with it for a number of reasons, and I wish to propose an alternative - I suppose, even now, as long as the new proposal has not gone live there could be still time to consider alternatives.

First, the problems I see with the proposed solution.

  1. The big problem with this "join:en:Main Page" type of syntax is that it is basically creating some sort of linked set; you are in the set if you have a link to any other member of the set. This is really messy. A little glance at the example Multilingual_MediaWiki#Innerlanguage links with two separate merged sets shows just how nasty this can become.
  2. There is a problem of sets becoming orphaned if a page within a linked-list part of the set is deleted. For example, suppose we have a set with links fr->en, de->en, en->gr, en->ru, and the en page is deleted, we are left with fr and de pointing nowhere, and the gr and ru pointed to from nowhere. Eek! This is why in linked lists you usually maintain a doublely linked list (i.e. with links in both directions), but that is not what is being proposed. I don't see how you do a recovery of the language set in this instance. Certainly, I hope I never have to work on the code which copes with this!
  3. What we wanted to create, it seems, was a system where instead of an all-to-all linking we set up some sort of all-to-one linking, but what is proposed is NOT an all-to-one solution. Sure, if ALL the pages link to the same original page (suggested to be the English one), then this is that, but as the Multilingual_MediaWiki#Innerlanguage links example shows, this is not always going to be the case. The syntax of the solution ensures that this is not the case. And in practice it will not often be the case - a user will find any other language which the page is in almost at random, and link to that, knowing that MediaWiki will do all the hard work in sorting out who is in and who is out of the language set. Yuk!
  4. I think the fact that the language that we choose on each page is arbitrary shows this is not the right solution.
  5. I think any programming involved with the linked sets will be really nasty
  6. A further really horrible yukiness which I forsee in this is that once out there like this, it will not be clear from looking at the markup what pages are in the set and which not. In the markup you will see, say "join:fr:Le Page Principal" but you don't see the list of languages. Sure, by viewing the page when rendered you will see the complete list, but at least with todays system the user has a nice simple way which is easy for him/her to understand of seeing all the languages which form part of the list.
  7. Another thing I don't like is the conceptual yukiness for the end user. Sure, we as techies understand in an absolute flash that this forms a linked language set which can be traversed, and that as long as there is a link to another language within the set then all the languages will be covered. I dread to explain this easily to an end user.
  8. Also from a syntax point of view, I think it is confusing to the user that to link to a set of languages, you link to a single completely arbitrary language within that set. I can forsee the situation where many confused users try to link to all the languages in the new syntax (I know the system will cope, but the user clearly is struggling). The syntax should somehow say simply that you are linking to a set of languages.
  9. Finally, it is not easy for a user to work with the complete set of pages of langauges. Suppose he wanted to break into two sets, for a particular page, languages a, b, c from languages d, e, f (I'm not sure why, but let's imagine that he wanted to), he would have to figure out how the two sets are linked together, since there is no way to work on the complete set of languages! Imagine with 100 languages, where the links were just done to random languages! Nightmare!

So, to summarise first the problems I see with the proposed solution:

  • syntactically messy
  • creates linked set, rather than all-to-one solution
  • complex linked set will create complex programming
  • arbitrary choice of language implies this is not the right solution
  • not clear to end user which languages are in the multi-language list
  • difficult for users to understand that innerlanguage link is building a linked set
  • linking to single arbitrary language when conceptually you want to link to a set of languages is confusing to user
  • impossible to easily work on set of language pages for a given article

Now I will give my alternative suggestion.

I propose that we create a new namespace, called, say Lang (or Language, or LanguageSet, or Set, or ls), and that within this namespace a new page is created whenever we wish to link 2 or more pages together by languages - the page would form a complete list of the pages within the langauge set.

For example, for our "Dog" page example, we have a page "Lang:dog" which contains simply the links to each of the pages which are part of this language set:

The first thing you will notice is that the page name does not have to be in any particular language, but the page could be called "Lang:doggiewoggie", with the same content.

On each of the individual pages, there would be a single link to the language set page. For example, on the English page, if the language set page is called "Lang:doggiewoggie", there would be the link:

The user should be able to add/change languages either by changing the language set page (the "Lang:dog" page) or by changing any of the individual articles. The change would propagate in either direction.

The advantages I see of this system are:

  • syntactically clean
  • creates a single well-defined all-to-one set
  • easy for programmers
  • the link on each page links to the set rather than to a language
  • extremely easy for end user to see which pages are in the multi-language list
  • conceptually very easy for end-users
  • extremely easy to work on the set of language pages for a given article

I'm sorry if this suggestion comes very late in the day. Nonetheless, this is such a major change to the way that multi-lingual wikis work, I think that even now, it is extremely important to get the fundamental platform correct. Mediashrek 10:29, 15 May 2006 (UTC)Reply

I expect the reason the scheme on the page is not described using a separate page, is that several earlier suggestions, on the topic of among others interwiki, implemented those in terms of separate pages, and they all failed because handling the information in two places appeared too cumbersum. As an example of this problem, consider how people can include an image from Commons: on the text page, but are expected to go to the image page at Commons to then indicate they're using it.
Still, you have some points. Can you revise them on the assumption that this Multilingual MediaWiki will actually offer you the tools to manage this stuff? After all, the wiki doesn't use a seperate page to create links, but as a tool, there is a page to find what links here. How much of the problems you see could be solved by creating a similar page for innerwiki links, or maybe other tools built-in in the wiki? Aliter 18:20, 23 May 2006 (UTC)Reply
I have been thinking some more about the scheme proposed above. I think the idea of a single link from each page towards a central page is correct (though the exact syntax may be argued about), but I'm not so sure about a write-enabled page. I think the language set page should be read-only. Why? For a few reasons:
  • you then have duplication of data. We know that duplication is a bad thing, and so if we stored the information only on each of the component pages, as LangSet:doggiewoggie type links, and yet made the language set page read-only then the data would only ever be in one place
  • I am also concerned about, if the language set page was changeable directly, how the original page would be modified. Say a language set page had on it en:dog, fr:chien and de:hond, and someone deleted the fr:chien line. Then, the software would have to not only save this page but modify the actual fr:chien page. Oooch. And what if someone was currently modifying the fr:chien page at the same time? Double ouch!
No, it seems that for both these reasons it would be better to have the LangSet:doggiewoggie page read-only. Indeed, it could even be a dynamically created page that contains only a transclusion of what-links-here. That last sentence sounds a bit confusing so I will explain what I mean. Instead of any of the language set pages actually physically existing, they would be rather like the special pages whose content is generated dynamically. The content of the language set page would simply display a list of each of the pages which have a link to that page.
Mediashrek 09:25, 27 May 2006 (UTC)Reply
I like your proposal better than the (complicated) original, that you want replaced. I also agree that the "LangSet:doggiewoggie" page should not be ever stored, but rather be a data base query result (in whichever form presented to the user). I'd like it to be editable (provided an apropriate access right of the editor) simply as a shortcut to editing a set of individual pages that have the real links. Of course, changing them must be wrapped in a transaction, which may fail in it entirety, if a single edit conflict occurs, or on several other errors. Then the editor needs to be presented an appropriate explanation, and two textareas, as usual.
I also strictly suggest, to allow a page be in multiple innerlanguage sets in general, and installations to disallow that individually. Think of a wiki wanting to store (simplistic) translations, along with short dictionary-like explanations, as almost all bilingual or n-lingual translational dictionaries do. Then you may have a page:
en:law
  1. a binding rule set forth in writing by a parliament, a king, a dictator, etc.
  2. all the codified binding rules of a society, or a state
  3. justice
  4. colloqial: collective for "law anforment agencies"
  5. physics, natural sciences: a known, scientifically proven, certain way of how nature behaves, most often put in a formula
  6. religion: ...
  7. ...
While of course adding translation links to individial meanings is desirable, there is no apparent reason, why this page shoud not be in several innerlanguage collections at a time, named e.g. "one law", "law (general)", "justice", "law enforment agency", "law of physics", "religious law", etc.. Of course, this is only possible with the altered scheme of noting such links, not with the original proposal. This is yet another reason why, imho this one is superior. --Purodha Blissenbach 11:41, 29 December 2006 (UTC)Reply

Regarding "Language set proposal"...

edit

I like this method, and its general cleanness. The only hitch I can immediately see with this is the question of how a user adding a new article might be able to check for the existence of the "Lang:dog" set page. I could search for English translations of the title for which I'd like to write an article, but I wouldn't find "doggiewoggie" or if someone decided to use their own language when writing the first article on a topic. If, for example, I wanted to add fr:chien and had a look for Lang:dog and the page was called doggiewoggie, I wouldn't have found it, and I'd go ahead and maybe try and create Lang:dog or maybe just create Lang:chien. --Domi 17:34, 22 May 2006 (UTC)Reply

I don't think it would be any harder than today's method. If you decided to write an fr article called "chien" you would look for the article "dog" in English, and on that page you would either find no language page link (so a language page does not yet exist, or exists but maybe does not include the English page, if say, only pl, de and zh are grouped), or you would find a language page link of the form "Lang:doggiewoggie" on the English page.
Today, if you create an article "chien" (in FR), you have to know somehow that "dog" is the english word likely to be used for this. In the proposed scheme, if you create the article "chien" and want to know if there is a translation page, you would go to the "dog" (en) article and see there that there is either no language page link or a named language page link "Lang: doglangpages" which would tell you the name. It would be possible that, say, fr and en, both link to "Lang: doglangpages" and, say, de and pl, link to "Lang: hond"; in that case, as soon as someone spotted that these 4 pages all belonged in the same lang set then "Lang: hond" could be made a redirect to "Lang: doglangpages", and ideally the de and pl pages changed to have "Lang: doglangpages" rather than the "Lang: hond" page. I think the problem you describe exists today with the current system: if you have a de and a pl page which contain language links to each other, and a fr and en page which contain language links to each other, it still requires some human to realise that they are actually all part of a single language set (and then they need to edit all 4 pages, requiring 4 different signons).
A few thoughts
  1. in which language would the new language page exist? e.g. is it in a language specific part of the site, e.g. "en.wikipedia.org/Lang:doglangpages" or in a generic part "www.wikipedia.org/Lang:doglangpages". The latter is more consistent with the idea that the language page is independent of a particular language, but the mess of multiple signons would be a real pain in the ****.
  2. is it REALLY possible that the Lang page would always be updated whenever one of the individual pages were changed AND vice versa? That would be ideal, but MediaWiki tends to run on a one-page-one-change type system, so that would mean having some robot picking up the different ends of things?
Mediashrek 09:01, 24 May 2006 (UTC)Reply

Keep It Simple

edit

(Also submitted as bug 6107 feature request)

Yes, multi-lingual wiki is a right pain in the ****. Setting up a single language wiki is ease itself (well done folks). Setting up a multi-lingual wiki is a nightmare (maybe if Jimbo and others worked in a multiligual enviroment like Belgium life would be different).

For example:

  • for every language you have to set up a new database (even if it is part of the same single database, I still have to go through the setup)
  • I have to have a separate copy of the MediaWiki code for every language or figure out the wizardry to use a single version with different LocalSettings per language
  • I have to figure out how I'm going to sort out the problem of multiple signons (each language will have separate signons - that is just so NOT what you want when you just want a site in 2 languages (say).

I am in Belgium where every site is always in French and Dutch and I am gradually figuring out each of the gotchas for my site www.treekee.com.

Next I want to figure out how to do it simply for a truly multinational site (70+ languages). At the very least I know I'm going to have to do everything 70+ times.

It should ideally be as easy as saying, in LocalSettings.php, that I want to have a multi-lingual wiki: $multiLang = true;

It's not. It's just a nightmare.

It seems to me that there are only 2 things which a MediaWiki admin really cares about in relation to this. There are I know already discussions of many of the multi-lingual aspects on the following URL m:Multilingual MediaWiki, but many of them are irrelevant as far as this request are concerned. The 2 things are: 1) everything in one single database, working as one cohesive whole, with one signon 2) each page when viewed to have all messages/menus/text in the correct language

I will look at each of these in turn.

Firstly, everything in one single database, working as one cohesive whole, with one signon.

I think the problem is not so difficult either.

There needs to be simply one extra column in the page table specifying the language. This column would simply contain the wiki code, "en", "fr" etc.

Then the URL needs to have some standard way for MediaWiki to extract the language of the current page. I would have thought that some system where the language code is in the URL would work well, e.g. http://www.mysite.com/index.php?title=Dog&lang=en http://www.mysite.com/index.php?title=Chien&lang=fr

These could be fairly easily transformed I would have though in the way that URLs are currently transformed to be more user friendly: http://www.mysite.com/en/wiki/Dog http://www.mysite.com/fr/wiki/Chien

The interwiki table could (if you really wanted) have entries added automatically during installation like: "fr" => "http://www.mysite.com/fr/wiki/$1" and "de" => "http://www.mysite.com/de/wiki/$1" though the observant reader will realise that even these are superfluous, since the only thing the MediaWiki software needs is a list of the wiki language codes and it can create these easily.

Now, what does the MediaWiki software do with this extra info? I think that either the $title could be passed as, say, "en:dog", or (better I think, since 2 parameters are treated as 2 parameters), that an extra parameter $lang would be passed to any routine that cared, with routines having optional language parameters received, myFunc($title, $lang="").

There would probably need to be a default language for any case where a language is not passed. e.g. in LocalSettings.php $multiLangDefaultLang = "en";

This could always be got via the new global $gblMultiLangDefaultLang.

Where the code finally cared about getting the right page from the now single db for ALL languages, it would be as simple as adding "AND langCode = $lang" to the SQL query.

OK, that very small change has now given us the ability to have a SINGLE database with ALL languages in. One signon comes as a freebee since the signon user table does not have a language component.

The other thing which a MediaWiki admin wants of course is for each of the pages to appear with all the messages/menus/text in the correct language. This is the 2nd of the requirements.

For this, I don't understand at all how the current messaging is done and I will have to leave it to someone else to comment. I know that you have to run a tool once at setup to get the messages in their correct language.

I would have thought though that with a little bit of thought this 2nd process would also be fairly easy.

http://xp.c2.com/DoTheSimplestThingThatCouldPossiblyWork.html

With these 2 things working 90% of users would have the multilingual wiki abilities they want. And I, facing setting up 70+ subwikis/databases etc. will be a VERY happy man.

Mediashrek 10:06, 27 May 2006 (UTC)Reply

language table data

edit

The language data currently in the MediaWiki PHP script is outdated.

A more current list of data is here on MediaWiki. See m:Template:N en and the related templates that perform switch on the language code (ISO 639-based, but there are a few exceptions, where an ISO 639-1 language code is followed by a variant code, instead of using a ISO 639-2 or -3 code). And there are a few test wikis that currently don't use any ISO639 code but refer to the English language namemore or less abbreviated.

These templates are esily convertible to tabular data (using "|" and "=" as field separators in a row).

the data for each code currently include:

  • English language name
  • Local language name (may include <span<gt;s oftext with several directionalityfor alternate names)
  • Alternate names separated by "&nbsp;/ "
  • Local language directionality
  • wikilink to an article describing the language preferably on Wikipedia (or may be on some other project like Wiktionnary).

(The template data include languages that are currently in tests, in addition to all active Wikipedias.) 86.221.32.107 23:33, 29 May 2006 (UTC)Reply

Collation

edit

w:en:Collation is important for proper alphabetization of categories., For example, in lanuguages like French and Greek, accented characters should be sorted together with unaccented ones. Presently, this is not the case, see for example s:el:Κατηγορία:Θρησκευτικά_κείμενα. Every language has a different collating scheme, so collating should follow the language of the page. Andreas 15:12, 6 June 2006 (UTC)

This is true and it is wrong. Collation is also part of the locale data and consequently there can be variations in it on a lower level than the language. But I agree with the sentiment. It is not only for categories that the sort order needs to be considered. GerardM 06:42, 7 June 2006 (UTC)Reply
edit

The innerlanguage links proposal could better be replaced by a system like Wikidata DefinedMeaning and translations as on WiktionaryZ. HenkvD 13:06, 9 June 2006 (UTC)Reply

yet another proposal for interlanguage linking

edit

I agree with other proposals that it is too messy to include the interlanguage link (join tag) in the normal text. This solution will most likely create not linked lists but bowls of spaghetti.

So my suggestion is to save the interlanguage link within the page's metadata, that is, add entries "joined to" and "joined by" in the Languages table. Note that a page can only be either joined to a single other page or joined by a number of pages in a different language each.

So, how are these metadata edited?

Put a link reading "edit" under the interwiki-link box. Clicking this link opens a special page, which displays the current interlanguage linking of the page. So, for de:London, this might look like:

en:London
fr:Londres
ja:ロンドン
pl:Londyn
ru:Лондон

Meaning: the German page is joined to en:London, which is not joined to any other page but joined by a number of pages in other languages. Note that the German page's metadata only includes the link to en, while the other information is gained from the metadata of the english page.

To join another page to this ring, say, fi:Lontoo, open the page, click edit under its interwiki box on the side frame, and click "join to other page" there. Select a language and a page name. If you would try to join fi:Lontoo to de:London, the software would automaticly detect that this page is joined to en:London, and add "joined to:en:London" in the finnish page and "joined by:fi:Lontoo" in the english page.

Thus, the software would keep the rings clean.

Special cases

edit

Moving a page: If a page is moved, the metadata of pages joined to/by it need to be updated.

Deleting a page: If a page that is joined to a different page is deleted, the "joined by" entry in the corresponding page needs to be deleted, too. If a page that is joined by other pages is deleted, all pages it was joined by will instead be joined to the page in the top of the list of it's "joined by"-pages.

Example: If en:London was deleted, all other pages about London would be joined to "fr:Londres" instead.

Joining a second page in an existing language: Imagine a user would, for example, try to join de:Paris to en:London. The software should display a message:

en:London is already joined by de:London. Do you wish to replace that article by de:Paris? Yes / Cancel

Adding rings: Assume there were two rings, a page en:London with fr, de, ru and ja linking to it and fi:Lontoo with no and sv linking to fi. If a user would join fi:Lontoo to en:London in this case, another message should be displayed:

Join all pages that are linked here to en:London too? Yes / No / Cancel
If yes is answered, fi, no and sv will be joined to en. If no is answered, fi will be joined to en and no will become the new hub, with sv linking to no.

--Mkill 12:35, 30 July 2006 (UTC)Reply

Specs updated

edit

The specs now accurately reflect the implementation strategy, which is to use a simple set-based system for joining languages.--Eloquence 08:47, 27 September 2006 (UTC)Reply

commons

edit

Congratulations to the Wikidata devs - you do a wonderful job. For this reason I hope WiktionnaryZ will be as soon as possible be recognized as a very revolutionary (good) project. But does exist a real (started) project to fully integrate wikidata (or the multilingual part of it) into Mediawiki ? Wikicommons to my opinion, needs urgently this feature, it's currently a media library for english speakers, but for the other one that's only an image bank. Does exists plans to bring wikicommons multilingual ? Kelson 10:46, 31 October 2006 (UTC)Reply

We have sent a proposal to internationalize Commons using WZ (now renamed OmegaWiki) to a potential funding organization; we'll see how it goes. MLMW as described here is well underway, I will post an update to mediawiki-l soon specifically related to multilingual image pages.--Eloquence 00:55, 13 December 2006 (UTC)Reply

proposal for some wikis

edit

Wiktionary interwiki links are like files in commons. They (must) link to exactly the same utf-8 name (not the translated one). So there is no need to have (at the end) the name but only the language wiktionary.

[[en:start]]
[[el:start]]
[[pl:start]]
must be
[[en:el:pl]]

This is for saving ("page working") space but mostly for typos.

Wikisources also must have one source since (let's say) greek national anthem in modern greek (monotoniko or polytoniko) "cannot" and "must not" be changed in other wikis. Like images or sounds must be preserved between wikis. So maybe the page is different but the main "source" must be one, like the page in commons maybe changed but the jpg along with the name are "statically linked". Maybe the commons must have something like odt or rtf or .somenewwikiextension in order to upload or/and edit one source but differentiate it in local wikisources by adding translations, transliterations, local notes etc. --Xoristzatziki 16:31, 1 March 2010 (UTC)Reply

Enabling subpages for main namespace to allow multilangual versions

edit

For example, it is possible to enable subpages in the main namespace:

$wgNamespacesWithSubpages[NS_MAIN] = 1;

Is it reasonable to create such a multilingual wiki?

It's what's done on this wiki, so I'd say it's reasonable. :) It's not necessarily the best way, though. —Emufarmers(T|C) 19:18, 29 December 2010 (UTC)Reply
Return to "Multilingual MediaWiki/Archive 1" page.