It has been 2 months. I want to know what's your solution to this, and what progress has been made to address the problem so far? Thank you!
Topic on User talk:Pginer-WMF
Hi @Nguyentrongphu. After receiving input from the Vietnamese community we adjusted the limits to the amount of unmodified Machine Translation allowed. I have also created a ticket to explore how to detect copied content out of Content Translation.
Right now the team is focused right now on the mobile support for Section Translation, but the whole area of Machine Translation limits is something we want to focus in the near future.
I just come up with a new idea that would be easier to implement. The current limit right now for Vi Wikipedia is 90%. If unmodified content is higher than 90%, copying is disabled. In other words, one can only copy if they meet the threshold % to publish. I don't think this solution would negatively affect good-faith users, or at least, its benefit outweighs its little inconvenience (one can always manual translate first before trying to copy things).
Thanks for the suggestion. That's an approach we can consider. There are a couple of aspects that we need to consider in more detail:
- Initial and intermediate states. When adding the first paragraph users will be at a 100% Mt temporarily until they start editing the paragraph. Even users making a good use o the tool would have copying disabled initially which can affect their workflow. To reduce this issue we can apply this limitation only when the translation has already 2 paragraphs added.
- Feedback. Making copying disabled in some cases can result in working intermittently for the user. We may need to provide some feedback to clarify why the copy functionality was not working.
In general I think that including a tag into the copied text is a measure that is less intrusive while it can be effective, but this suggestion is an alternative to consider as part of the exploration.
"Initial and intermediate states. When adding the first paragraph users will be at a 100% Mt temporarily until they start editing the paragraph. Even users making a good use o the tool would have copying disabled initially which can affect their workflow. To reduce this issue we can apply this limitation only when the translation has already 2 paragraphs added." -> Now that I think about it, it's not a good idea because one can bypass it easily by copying each paragraph one by one.
Like I said, a little inconvenience is a good trade-off to prevent abuse. How hard is it really to translate 10% of contents before starting to copy things? Not really hard in my opinion.
There are pro's and con's in both methods. Tag method is less intrusive, sure but, it also has multiple implications and serious consequences:
First: it leaves wide open for abusers to continue to abuse the CT system.
Second: it continues to place heavy work load on patrollers in recent changes or new pages. It's not clear at what % they start to cheat (copy and publish right away); for example, at 100%, 94%, or 99%...? It's very time consuming to determine this; one has to read each individual article to determine how good is the translation. It becomes impossible to do when many people are abusing this. Vi Wikipedia has caught many different users that have been abusing this loophole for years. The reason is that people didn't notice this loophole for years until recently. On average, each user has 5k of badly translated articles. Let's say there are 10 users like that. It would mean around 50k of badly translated articles!
I agree there are pros and cons with each approach, and I think it is important to surface them as part of the exploration. Thanks for sharing your thoughts on this.
Regarding your first point, any approach is about adding barriers to make it hard for users to do the wrong thing, but there may always be a workaround. When the barrier is based on a generic mechanism that others may also use, it is more likely to have also generic solutions available such as this browser extension. However, including a more specific tag is something that requires a specific effort to detect.
Regarding your second point, the less intrusive solution allows to be more flexible about the percentage to catch. Maybe any content copied can result in the page being added to a category but those with higher percentages of unmodified MT are blocked directly.
This is a complicated space. I think we need to explore different options in detail.
I doubt many people know about that browser extension. Sure, there are always ways to cheat regardless of barrier. However, a barrier is good enough when it can stop most of the abusers. We (Vi Wikipedia) can deal with 1 or 2 remaining abusers that find a way to cheat the barrier. Currently, the barrier we have is not sufficient, so we need a better barrier.
"Regarding your second point, the less intrusive solution allows to be more flexible about the percentage to catch. Maybe any content copied can result in the page being added to a category but those with higher percentages of unmodified MT are blocked directly." -> I like this idea. However, I'm not sure how feasible this is (technical aspect). Also, a tag is sufficient, no need to add to a category. And if this method is too hard or impossible in technical aspect, my method is a sound alternative.
Feedback: you guys can put a "warning" somewhere in CT, easy to see, saying: "you have to translate at least 10% contents before able to copy".