About this board

Pginer-WMF


Previous discussion was archived at User talk:Pginer-WMF/Archive 1 on 2015-07-10.

Nguyentrongphu (talkcontribs)

It has been 2 months. I want to know what's your solution to this, and what progress has been made to address the problem so far? Thank you!

Pginer-WMF (talkcontribs)

Hi @Nguyentrongphu. After receiving input from the Vietnamese community we adjusted the limits to the amount of unmodified Machine Translation allowed. I have also created a ticket to explore how to detect copied content out of Content Translation.

Right now the team is focused right now on the mobile support for Section Translation, but the whole area of Machine Translation limits is something we want to focus in the near future.

Nguyentrongphu (talkcontribs)

I just come up with a new idea that would be easier to implement. The current limit right now for Vi Wikipedia is 90%. If unmodified content is higher than 90%, copying is disabled. In other words, one can only copy if they meet the threshold % to publish. I don't think this solution would negatively affect good-faith users, or at least, its benefit outweighs its little inconvenience (one can always manual translate first before trying to copy things).

Pginer-WMF (talkcontribs)

Thanks for the suggestion. That's an approach we can consider. There are a couple of aspects that we need to consider in more detail:

  • Initial and intermediate states. When adding the first paragraph users will be at a 100% Mt temporarily until they start editing the paragraph. Even users making a good use o the tool would have copying disabled initially which can affect their workflow. To reduce this issue we can apply this limitation only when the translation has already 2 paragraphs added.
  • Feedback. Making copying disabled in some cases can result in working intermittently for the user. We may need to provide some feedback to clarify why the copy functionality was not working.

In general I think that including a tag into the copied text is a measure that is less intrusive while it can be effective, but this suggestion is an alternative to consider as part of the exploration.

Nguyentrongphu (talkcontribs)

"Initial and intermediate states. When adding the first paragraph users will be at a 100% Mt temporarily until they start editing the paragraph. Even users making a good use o the tool would have copying disabled initially which can affect their workflow. To reduce this issue we can apply this limitation only when the translation has already 2 paragraphs added." -> Now that I think about it, it's not a good idea because one can bypass it easily by copying each paragraph one by one.

Like I said, a little inconvenience is a good trade-off to prevent abuse. How hard is it really to translate 10% of contents before starting to copy things? Not really hard in my opinion.

Nguyentrongphu (talkcontribs)

There are pro's and con's in both methods. Tag method is less intrusive, sure but, it also has multiple implications and serious consequences:

First: it leaves wide open for abusers to continue to abuse the CT system.

Second: it continues to place heavy work load on patrollers in recent changes or new pages. It's not clear at what % they start to cheat (copy and publish right away); for example, at 100%, 94%, or 99%...? It's very time consuming to determine this; one has to read each individual article to determine how good is the translation. It becomes impossible to do when many people are abusing this. Vi Wikipedia has caught many different users that have been abusing this loophole for years. The reason is that people didn't notice this loophole for years until recently. On average, each user has 5k of badly translated articles. Let's say there are 10 users like that. It would mean around 50k of badly translated articles!

Pginer-WMF (talkcontribs)

I agree there are pros and cons with each approach, and I think it is important to surface them as part of the exploration. Thanks for sharing your thoughts on this.

Regarding your first point, any approach is about adding barriers to make it hard for users to do the wrong thing, but there may always be a workaround. When the barrier is based on a generic mechanism that others may also use, it is more likely to have also generic solutions available such as this browser extension. However, including a more specific tag is something that requires a specific effort to detect.


Regarding your second point, the less intrusive solution allows to be more flexible about the percentage to catch. Maybe any content copied can result in the page being added to a category but those with higher percentages of unmodified MT are blocked directly.


This is a complicated space. I think we need to explore different options in detail.

Nguyentrongphu (talkcontribs)

I doubt many people know about that browser extension. Sure, there are always ways to cheat regardless of barrier. However, a barrier is good enough when it can stop most of the abusers. We (Vi Wikipedia) can deal with 1 or 2 remaining abusers that find a way to cheat the barrier. Currently, the barrier we have is not sufficient, so we need a better barrier.


"Regarding your second point, the less intrusive solution allows to be more flexible about the percentage to catch. Maybe any content copied can result in the page being added to a category but those with higher percentages of unmodified MT are blocked directly." -> I like this idea. However, I'm not sure how feasible this is (technical aspect). Also, a tag is sufficient, no need to add to a category. And if this method is too hard or impossible in technical aspect, my method is a sound alternative.

Nguyentrongphu (talkcontribs)

Feedback: you guys can put a "warning" somewhere in CT, easy to see, saying: "you have to translate at least 10% contents before able to copy".

Reply to "Any news?"

About the enablement of Section translation in Wikipedia

8
Rodney Araujo (talkcontribs)

Hello Pginer, i have a question, ¿when Section translation could be enabled for Spanish Wikipedia? Thanks.

Pginer-WMF (talkcontribs)

Thanks @Rodney Araujo for your interest in the tool and your help on the project.

We just enabled Section Translation on Bengali Wikipedia as an early release, and we plan to improve the tool further before considering other wikis. There are several areas of the tool that need improvement and we want to hear from a smaller group of editors first to iterate and make the tool better. So it may still take several weeks until we can move to the next stage.

Hearing about the interest on the tool is very encouraging for us, and it is very useful to know where there are users interested in the tool that can help us make it better. So we'd definitely consider Spanish Wikipedia as we plan for enabling more wikis.

Thanks!

Rodney Araujo (talkcontribs)
Rodney Araujo (talkcontribs)

@Pginer-WMF: Are you going to communicate with Spanish Wikipedia community about Section translation?

Pginer-WMF (talkcontribs)

Right now the tool is in a very early stage with several important aspects still missing. Once we complete a few cycles of development we'll be doing wider announcements.

Currently we have just announced the enablement on Bengali Wikipedia. Once we get feedback from this community to improve the tool we'll consider expanding (and communicating) to other wikis.

DRIS92 (talkcontribs)

Bonjour, Je vois que la traduction de es à fr est possible.

Pginer-WMF (talkcontribs)

Our tools integrate multiple translation services. You can get a list of the services and the supported languages in the documentation. Due to limitations of the test instance for Section Translation, only those languages supported by Apertium, Matxin and OpusMT are available. However, all services will be available when the tool is available on a real wiki.

Pginer-WMF (talkcontribs)

@Rodney Araujo Section translation is now available on Test Wikipedia. This is a test environment better integrated with our infrastructure, where more machine translation services are available. Thus, you can try the tool using Google translate or Yandex in more languages and without the need to create a separate account. The results are still published in the test instance (not the real wiki) but you can copy the resulting content anywhere else.

I hope this is useful until we enable the tool in more Wikipedias.

Reply to "About the enablement of Section translation in Wikipedia"

আপনার জন্য একটি পদক!

1
RIT RAJARSHI (talkcontribs)
রোজেত্তা পদক
For working at Content Translation Project.
Reply to "আপনার জন্য একটি পদক!"
RZuo (talkcontribs)
Reply to "A barnstar for you!"

Please join the discussion on my proposal

2
HaussmannSaintLazare (talkcontribs)
HaussmannSaintLazare (talkcontribs)

Hello Pginer-WMF !!!

Please write your impressions about my proposal that I introduced the other day.

Thank you.


Reply to "Please join the discussion on my proposal"

About the specifications of MediaWiki

1
HaussmannSaintLazare (talkcontribs)
Reply to "About the specifications of MediaWiki"
Charminku (talkcontribs)
Reply to "A cupcake for you!"
Charminku (talkcontribs)
Best Wishes My side.... From India !!! Charminku (talk) 14:41, 4 April 2020 (UTC)
Reply to "A cupcake for you!"

question about bug report Topic:Virkxk0741k38j8l

3
HaussmannSaintLazare (talkcontribs)
Pginer-WMF (talkcontribs)

Hi!

If I understand correctly, the problem is that Japanese Wikipedia abuse filter number 13 triggers a warning in Content translation.

It seems (based on what I understood form automatic translation) that filter 13 is intended to warn about short articles being created. Content translation tries to check early the contents against abuse filters for each paragraph in order to surface issues as early as possible and be able to tell where the problems are. This approach is beneficial in many cases, for example the source article may have a youtube link which is ok with the source wiki but is prevented in the target one. By warning early and showing the specific paragraph we can help the user know where to look for such link and fix it. However, some abuse filters expect to evaluate the whole article, and checking just a paragraph can result in a false positive. In your case, the filter considers that the evaluated text is too short for an article, but it was actually just a paragraph. Unfortunately, the edit filter infrastructure does not allow to differentiate both kinds of filters.

I'll bring this to discuss with the team again to see if there is something actionable we can do. to improve the situation.

Thanks!

HaussmannSaintLazare (talkcontribs)
Reply to "question about bug report Topic:Virkxk0741k38j8l"

question about Pavanaja's report

1
HaussmannSaintLazare (talkcontribs)

15:04, 31 March 2020 (UTC)

Hello Pginer-WMF !!!

I have a question about User:Pavanaja's report.

Pavanaja reported in Topic: Vj1ouv741wrgjo64 that when trying to do a new translation in Content Translation, the translation could not be started if another user had already started the same translation.

I did the same test and found the same symptoms.

I asked a question because I felt wondering at that time.

During the follow-up test, the message "Critical error: Content translation failed to load due to internal error." was displayed and translation could not be started.

However, this message does not tell whether the translation could not be started due to a bug in the content translation tool or the specification of the content translation tool.

So, please tell me whether this symptom is due to a bug or a spec.

Thank you !!!

--HaussmannSaintLazare (talk) 15:04, 31 March 2020 (UTC)

(original in Japanese)

Hello Pginer-WMF !!!

User:Pavanajaのレポートについての質問です。

PavanajaはTopic:Vj1ouv741wrgjo64で、コンテンツ翻訳で新しい翻訳をしようとしたときに、他の利用者が既に同じ翻訳を開始していると翻訳を開始できないことを報告しました。

私も、追試験をしてみたところ、同じ症状が発生しました。

その際に疑問に感じたことがあるので質問します。

追試験をした際に、「Critical error: Content translation failed to load due to internal error.」というメッセージが表示されて翻訳を開始することができませんでした。

しかし、このメッセージでは、翻訳を開始できないのがコンテンツ翻訳ツールのバグによるものなのか、それともコンテンツ翻訳ツールの仕様によるものなのかが分かりません。 そこで、この症状がバグによるものなのか仕様によるものなのかをおお教えください。

Thank you !!!

Reply to "question about Pavanaja's report"