This page is a translated version of the page Moderator Tools/Automoderator and the translation is 25% complete.
Outdated translations are marked like this.

Moderator Tools 的团队正在探索为维基媒体项目构建一个“自动管理(automoderator)”的工具。 该工具将允许监督者根据机器学习模型的评分来配置对错误编辑的阻止或恢复措施。 简而言之,我们正在构建一种软件:它可以执行与反破坏机器人类似的功能,如ClueBot NGSeroBOTDexbot,但所有语言社区都可以使用。 A MediaWiki extension is now under development - Extension:AutoModerator .

我们的设想是:“如果我们让社区能够自动防止或修正明显的破坏行为,监督者将在其他活动上有更多的时间。”

我们将在2023年剩下的时间里研究和探索这一想法,预计能够在2024年初开始工程工作。

Communities can volunteer to get Automoderator at their Wikipedia.

  • 2024年6月: Turkish Wikipedia starts testing Automoderator.
  • 2024年2月: Designs have been posted for the initial version of the landing and configuration pages. Thoughts and suggestions welcome!
  • 2024年2月: 我们已经发布了测试阶段的初始成果
  • 2023年10月: 我们正在收集对测量计划的输入和反馈,以决定我们应使用哪些数据来评估该项目,并已提供测试数据以收集对Automadator决策的意见。
  • 2023年8月: We recently presented this project, and other moderator-focused projects, at Wikimania. 您可以在此处找到会话记录。

計劃動機

Wikimania presentation (13:50)

A substantial number of edits are made to Wikimedia projects which should unambiguously be undone, reverting a page back to its previous state. Patrollers and administrators have to spend a lot of time manually reviewing and reverting these edits, which contributes to a feeling on many larger wikis that there is an overwhelming amount of work requiring attention compared to the number of active moderators. We would like to reduce these burdens, freeing up moderator time to work on other tasks.

Indonesian Wikipedia community call (11:50)

Many online community websites, including Reddit, Twitch, and Discord, provide 'automoderation' functionality, whereby community moderators can set up a mix of specific and algorithmic automated moderation actions. On Wikipedia, AbuseFilter provides specific, rules-based, functionality, but can be frustrating when moderators have to, for example, painstakingly define a regular expression for every spelling variation of a swear word. It is also complicated and easy to break, causing many communities to avoid using it. At least a dozen communities have anti-vandalism bots, but these are community maintained, requiring local technical expertise and usually having opaque configurations. These bots are also largely based on the ORES damaging model, which has not been trained in a long time and has limited language support.

目標

  • Reduce moderation backlogs by preventing bad edits from entering patroller queues.
  • Give moderators confidence that automoderation is reliable and is not producing significant false positives.
  • Ensure that editors caught in a false positive have clear avenues to flag the error / have their edit reinstated.
  • Are there other goals we should consider?

設計研究

 
A PDF of design principles for the Automoderator system
 
Desk research for the Automoderator project

We delved into a comprehensive design research process to establish a strong foundation for the configuration tool for Automoderator. At the core of our approach is the formulation of essential design principles for shaping an intuitive and user-friendly configuration interface.

We looked at existing technologies and best practices and this process is known as desk research. This allowed us to gain valuable insights into current trends, potential pitfalls, and successful models within the realm of automated content moderation. We prioritized understanding the ethical implications of human-machine learning interaction, and focused on responsible design practices to ensure a positive and understandable user experience. We honed in on design principles that prioritize transparency, user empowerment, and ethical considerations.

模型

This project will leverage the new revert risk models developed by the Wikimedia Foundation Research team. There are two versions of this model:

  1. A multilingual model, with support for 47 languages.
  2. A language-agnostic model.

These models can calculate a score for every revision denoting the likelihood that the edit should be reverted. We envision providing communities with a way to set a threshold for this score, above which edits would be automatically prevented or reverted.

The models currently only support Wikipedia, but could be trained on other Wikimedia projects. Additionally they are currently only trained on the main (article) namespace. Once deployed, we could re-train the model on an ongoing basis as false positives are reported by the community.

Before moving forward with this project we would like to provide opportunities for testing out the model against recent edits, so that patrollers can understand how accurate the model is and whether they feel confident using it in the way we're proposing.

  • 你对这些模型有什么担忧?
  • 您或您的社群能接受的最大假阳性报告比例是多少?

可能的解決方案

 
Diagram demonstrating the Automoderator software decision process
 
An illustrative sketch of what the community configuration interface could look like for this software.

We are envisioning a tool which could be configured by a community's moderators to automatically prevent or revert edits. Reverting edits is the more likely scenario - preventing an edit requires high performance so as not to impact edit save times. Additionally, it provides less oversight of what edits are being prevented, which may not be desirable, especially with respect to false positives. Moderators should be able to configure whether the tool is active or not, have options for how strict the model should be, determine the localised username and edit summary used, and more.

 
Example of what Automoderator will look like reverting an edit.

Lower thresholds would mean more edits get reverted, but the false positive rate is higher, while a high threshold would revert a smaller number of edits, but with higher confidence.

While the exact form of this project is still being explored, the following are some feature ideas we are considering, beyond the basics of preventing or reverting edits which meeting a revert risk threshold.

测试

If communities have options for how strict they want the automoderator to be, we need to provide a way to test those thresholds in advance. This could look like AbuseFilter’s testing functionality, whereby recent edits can be checked against the tool to understand which edits would have been reverted at a particular threshold.

  • How important is this kind of testing functionality for you? Are there any testing features you would find particularly useful?

Community configuration

A core aspect of this project will be to give moderators clear configuration options for setting up the automoderator and customising it to their community’s needs. Rather than simply reverting all edits meeting a threshold, we could, for example, provide filters for not operating on editors with certain user groups, or avoiding certain pages.

  • 在使用此软件之前,您认为需要哪些配置选项?

误报报告

机器学习模型并不完美,因此我们应该预计会有一定数量的假阳性报告。 There are at least two things we need to consider here: the process for a user flagging that their edit was falsely reverted so it can be reinstated, and providing a mechanism for communities to provide feedback to the model over time so that it can be re-trained.

The model is more sensitive to edits from new and unregistered users, as this is where most vandalism comes from. We don't want this tool to negatively impact the experience of good faith new users, so we need to create clear pathways for new users to understand that their edit has been reverted, and be able to reinstate it. This needs to be balanced with not providing easy routes for vandals to undo the tool's work, however.

Although these models have been trained on a large amount of data, false positive reporting by editors can provide a valuable dataset for ongoing re-training of the model. We need to figure out how to enable experienced editors to send false positive data back to the model so that it can improve over time.

  • How could we provide clear information and actions for editors on the receiving end of a false positive, in a way which isn’t abused by vandals?
  • 您对误报有哪些担忧?

Designs

Our current plans for Automoderator contains following components.

Landing page

A landing page with information about Automoderator, a way to appeal the bot’s decisions, and a link to configure the bot.

Configuration page

The configuration page, which will be generated by Community Configuration . In the MVP, admins will be able to turn Automoderator on or off, configure its threshold (i.e. how it should behave), and customize its default edit summary and username. We anticipate that we'll add more configuration options over time in response to feedback. Once the page is saved, if the user has turned Automoderator on, it will start running immediately.

其他开放性问题

  • 如果您的社区使用了志愿者维护的防破坏机器人,您对该机器人的体验如何? 如果它停止工作,你会有什么感觉?
  • 你认为你的社区会使用它吗? 它如何与您的其他工作流程和工具相适应?
  • 我们还应该考虑哪些我们没有在上面记录的内容?