Topic on Talk:Content translation

The overused must be stopped: low quality translation

22
Alphama (talkcontribs)

Hello, I am from Vietnamese Wikipedia (viwiki). I am not sure where I can give my concerns to this tool. The tool can not prevented the overused or abused actions of editors in some cases.

When you translate a short article, '''this tool allows editors to publish the low-quality content regardless they have no efforts to translate''', only use automatically translated content from Google Translation. That is such a '''nightmare''' for small-and-medium projects where there are not enough administrators to patrol new content. In viwiki, we have thousands of articles like this, and the community can not follow and correct the writing style. I demand a method to prevent this.

I suggest 2 main things:

  1. An option that prevent editors to publish the content if it has only a single problem or a threshold of the number of problems.
  2. A configuration where local admins can set the usage right to certain users or user groups.

Thank you!

~~~~

Omotecho (talkcontribs)

Hello, @Alphama, personally, it is encouraging to know viwiki editors are interested in CX2. I am based in jawiki, and we, too, have been trying to avoid low quality articles produced via CX2.

By the way, two help pages, Help:Content translation/Translating/Initial machine translation as well as Help:Extension:ContentTranslation would be handy to see what criteria each local language community start/expand discussion to what degree.

For how strict each wiki decide the quality level of CX2 output. Jawiki has watched Indonedian wiki compromise to 70% accuracy, while zhwiki tried out with similar threshold, but raised the accuracy after experimental phase. Jawiki sees its discussion hanging mid-air, as diverse contents could require different accuracy rate, and finding “standard” is not easy. Cheers,

Alphama (talkcontribs)

I hope the tool can offer a configuration that local admins can set features as I mentioned. This helps to adapt to each wiki criteria.

Omotecho (talkcontribs)

Yes, @Alphama, you have the point. And still I wonder as it involves system patch-up that is centralized and involves Foundation’s Tech/Editing teams, what better workflow would be transparent and smooth for both local admins and those maintaining the system day and night. I am thankful you pointed my thoughts to the subject. Cheers,

YTRK (talkcontribs)

The English wikipedia may be a role model for this. It a) forbids the use of machine translation and b) restricts the publishing of articles from content translation to extended confirmed users. I do admit I have a strong feeling against machine translations, but as Omotecho has mentioned thresholds don't always work very well. As for the latter, if this is possible for the Vietnamese wikipedia, it would stop new editors immediately using the tool, and by the time they gain the status (30 days and 500 edits), we can hope that a fair portion of them would be familiar with the rules. For those who seems to have the proper ability but not the required time period or the number of edits, administrators are able to grant the status manually.

P.S. To be honest, I think the default setting for all languages should be not to allow machine translations. If a particular wikipedia wants to allow machine translations that is fine, but there's no need whatsoever for other wikipedias to be affected negatively. Changing something requires effort, which smaller wikipedias (and some larger ones) may not be able to afford. Surely eliminating risks should be made easier than introducing them.

Omotecho (talkcontribs)

Machine translation is a tech with double blades not quite _safe_ to toy with. I have a huge dilemma and CX2 itself has been/is based on goodwill that wikis with smaller editing population would speedily gain pages and attract visibility in longer term and gain population: while lower demography means less people power to clean up somebody else’s mistakes.

Or even jawiki has not retrieved quality degrade caused by CX2 output, which is still not quite visible volume-wise. Then, we lack dashboard to indicate pros/cons you need to keep in mind applying CX2. Nor coaching place where beginner editors will find steps to maneuvre with CX2 as initial editing stage which needs to pile up layers of editing effort by the original poster.

I keep using CX2 in a hope to pay back for those techs who created MT in the 1990s whom I worked with on thesauri. Coupled with speech recognition, MT (or its database) coined “interpreter” gadgets you bring to over seas trips, they are handy in a way or two for simple conversation...

Alphama (talkcontribs)

Generally, languages have their natural diversity. When using CX2 for a long time, I am afraid our Wiki projects will be like robot projects where languages can be convergence. Then, Google Translation uses Wiki content again to train its algorithms. Finally, we have such a "robot" scenario in the future for all Wikis.

Omotecho (talkcontribs)

@Alphama, I sense your bad dream could point out how we “discriminate” robot user account from applying CX2 to create pages semi-automatically. Never thought of that before...

RIT RAJARSHI (talkcontribs)

Instead of paragraph by paragraph translation, there should be sentence by sentence translation. RIT RAJARSHI (talk) 22:35, 12 September 2020 (UTC)

YTRK (talkcontribs)

A scary thought, that, @Alphama. But it does seem probable. I think I read somewhere (The Signpost perhaps?) that the incident over at the Scots wikipedia had affected machine translations...

@RIT RAJARSHI, why is it you suggest that? I personally feel that there are quite a few cases where the difference in things like the variety of sentence structure necessitates a change in what a single sentence contains.


Not to anyone in particlular: Speaking of the Scots wikipedia, it is said that the reason that the situation became so bad was because no one was there to right the wrongs, which I think can also be said for machine translations in many wikipedia projects. Yes the existence of the problem is known in this case, but the thing is, for this problem to occur, there is no need for someone with (inadequate but) some knowledge about the language; anyone can cause chaos by clicking the sufficient number of times until the time someone spots him/her and act to put a stop to it.

RIT RAJARSHI (talkcontribs)

@YTRK:

I thought a sentence-by-sentence translatioun would be easier to read, thus will reduce the chances of unmodified clicks. Personally I felt when I cannot get the scope of a sentence-by-sentence translation, I tend to lose my focus and I tend to leavc much of machine-translated texts unmodified. I see an wall of text where I cannot discriminate between sentences.

In other hands, just today I translated a mediawiki page (https://www.mediawiki.org/wiki/Translator_hub/bn) which provided a sentence-by-sentence translation feature on the top of that page. It was much less-pressurising task, and the translations were much better quality than several of my previous translations.

However you said it right that one sentence may change the meaning of a nearby sentence. Then a a preview panel may be used to show a machine-translated paragraph, with a bolding or highlight on current sentence.

RIT RAJARSHI (talkcontribs)

So basically I tried to mean a sentence-by-sentence approach may discourage leaving unmodified text, and may facilitate manual reading.

YTRK (talkcontribs)

Hmm, I see. It would also increase the number of clicks required, so it might be good in those terms too. However, it would be a hinderance to some editors including me, and also I feel this isn't technically possible. In the current system, one chunk consists of one row of wikitext, but to enable sentence by sentence translation, the system has to cut the chunks whenever there is a full stop (or in the case of Japanese, a circle) which would be pretty difficult.

As a (lesser) alternative to deal with the "wall of text", perhaps the lines could be spaced out a bit more? Double spacing would be too much so about 1.5 perhaps.

RIT RAJARSHI (talkcontribs)

@YTRK: Increasing line spacing wont be helpful for me, because in spite of line gap, where a sentence ends, just beside of that another sentence start. Especially in English to Bengali translation I do better when I can see one sentence in full (where words rearrange a lot, it consumes lot of short term memory). While doing this, I lose track of the sentence when it is surrounded by other sentences.

I basically use the machine translation to quickly get the hyperlinks, references, templates, tables etc. But when I make a click to an empty paragraph, the entire paragraph fills with unnatural and not-so-good machine translation. It makes the Rearrange process harder, and while rearranging machine translated terms I tend to mix-up between neighbouring sentences.

I would prefer any of the following solution.
Either break each sentence as a new line or paragraph


or, deeply bold the sentence I am selecting for translation.


or, It will show a grey preview of machine translated paragraph; but it will not "add" the text immediately. The user have to personally add texts sentence by sentence.

Such as an hypothetical example:


Saw tonight moon I . (grey) Red was it. Full was it moon.(grey)
(Manually corrected)
I saw moon tonight. Red was it. (grey) Full was it moon (grey).
(manually corrected)
I saw moon tonight. It was Red. Full was it moon
(manually corrected)
I saw moon tonight. It was Red. It was Full moon.

Regards.

RIT RAJARSHI (talk) 18:58, 14 September 2020 (UTC)

YTRK (talkcontribs)

All right. But as I've said it might be technically difficult.

By the way, if the low quality of machine translations are hindering you, you can choose not to use them and instead have the text copied (the links etc would still be modified). You ought to be able to adjust this in the right toppish area.

Omotecho (talkcontribs)

@RIT RAJARSHI, very nice of you to list which element helps you by applying CX2, snd I see your frustration that the different sentence structure stretches your short term memory to its limit.

As pointed by YTRK, I guess your better optionis just change the translation method (furthest right, smaller box) from Google translation to “copy original sentences” if you don’t mind. As shown above, links/weblinks are retained which saves a lot of time compared to the other option “start from blank”.

Of course, you need to look up dictionaries to define meaning in Bengali d to time, but your frustration on jumbled up word order would be released.

BTW, my small discovery; If you are using iPad/Mac, long tap a word and pop up gives you “search” function which will match the term to default thesauri on the device, or as you scroll down, jump to web search. I’ve found this trick not on WinPC, but on iPad/iPhone, and find it handier.

RIT RAJARSHI (talkcontribs)

{{Reply to|YTRK}} @YTRK Thank you and I am sorry that what I am asking is technically difficult. Currently I am preferring manual translation most of the cases.


However one feature already exists, is when I hover over or click a sentence, it shows a yellow-orange markup. This is somewhat helpful, and this algorithm may be improved to firmly select and highlight individual sentences


{{Reply to|Omotecho}} @Omotecho Thank you. “copy original sentences” I would try this feature. My curent setup has an excellent feature that is it converts the link to en.wikipedia.org articles into corresponding bn.wikipedia.org article. Would this conversion happen if I copy original sentences?


BTW I use Chrome browser on Windows. Although I have an androoid touch phone I scramble things in touchscreens so I use a very conventional big 'clicky' keyboard.

YTRK (talkcontribs)

Um, “copy original sentences” was precisely what I meant, so yes, the links and the like are changed.

The markup I had noticed, but I don't really know how that works? Perhaps it might be possible without too much of a hussle after all.

Omotecho (talkcontribs)

@YTRK, allow me interpret what you have meant (:

Yes, yellow back ground, or highlighting, matches up sentence set, or one chunk between left pane (translation original) and right pane ( machine translation or copied original text). As @RIT RAJARSHI told us, that will show us on which part of the paragraph we are working on. Yes, I find it handy to save short term memory, esp as I resort to use small screen on iPhone.

If it be of your help, I usually do my work in two phases: basic rule is that it is a custom/consensus on jawiki that you don’t edit somebody else’s sandbox:

  • as far as I work on CX2 before publishing/output to jawiki, I keep chunk order, not venture to move or destroy yellow highlighting;
  • Before the second phase, I dobpreparation: change page name on CX2 at the very top in biggish letters to start with User:Omotecho/sandbox/ +file name (in ja for me), before I push "publish" button. When I push and output, the system creates a sandbox as a subpage to my userpage automatically.
  • Then starts the 2nd phase: After publishing but still in my sandbox, I will continue final editing in my personal sandbox, usually adding bibliography to that biology or more See also items;
  • Finally I will move (rename) my translation to regular namespace, using "move" tool at the very top and rightest tab, but could depend on each wiki if a user moves or ask admin to arrange. Cheers, and wish everybody safe Thursday!
RIT RAJARSHI (talkcontribs)
Omotecho (talkcontribs)

@RIT RAJARSHI, forgive me, this tool we are using is jargoned as CX2, CX for content translation extension, and this one is based on version 1 (CX).

RIT RAJARSHI (talkcontribs)
Reply to "The overused must be stopped: low quality translation"