Moderator Tools/Automoderator/Testing/ha
Don taimakawa al'ummomi su gwada da kimanta daidaiton Automderator , muna samar da maƙunsar gwaji tare da bayanai kan gyare-gyaren da suka gabata da kuma ko Automoderator zai mayar da su ko a'a.
Automoderator’s decisions result from a mix of a machine learning model score and internal settings. While the model will get better with time through re-training, we’re also looking to enhance its accuracy by defining some additional internal rules. For instance, we’ve observed Automoderator occasionally misidentifying users reverting their own edits as vandalism. To improve, we’re seeking similar examples and appreciate your assistance in identifying them.
Note that this test does not necessarily reflect Automoderator's final form - we will be using the results of this test to make it better!
How to test Automoderator
- If you have a Google account:
- Yi amfani da hanyar haɗin yanar gizo na Google Sheet da ke ƙasa kuma yi kwafinsa
- Kuna iya yin haka ta danna Fayil> Yi Kwafi ... bayan buɗe hanyar haɗin.
- After your copy has loaded, click Share in the top corner, then give any access to swaltonwikimedia.org (leaving 'Notify' checked), so that we can aggregate your responses to collect data on Automoderator's accuracy.
- "A madadin, za ku iya canza 'Gabaɗaya damar shiga' zuwa 'Duk wanda ke da hanyar haɗin yanar gizo' kuma ku raba hanyar haɗi tare da mu kai tsaye ko akan wiki."
- Yi amfani da hanyar haɗin yanar gizo na Google Sheet da ke ƙasa kuma yi kwafinsa
- A madadin, yi amfani da hanyar haɗin fayil ɗin .ods don zazzage fayil ɗin zuwa kwamfutarka.
- After adding your decisions, please send the sheet back to us at swaltonwikimedia.org, so that we can aggregate your responses to collect data on Automoderator's accuracy.
After accessing the spreadsheet...
- Follow the instructions in the sheet to select a random dataset, review 30 edits, and then uncover what decisions Automoderator would make for each edit.
- Jin kyauta don bincika cikakkun bayanai a cikin shafin 'Edit data & scores'.
- Idan kana son sake duba wani saitin bayanai da fatan za a yi sabon kwafin takardar don guje wa rikice-rikicen bayanai.
- Join the discussion on the talk page.
Alternatively, you can simply dive in to the individual project tabs and start investigating the data directly.
We welcome translations of this sheet - if you would like to submit a translation please make a copy, translate the strings on the 'String translations' tab, and send it back to us at swaltonwikimedia.org.
If you want us to add data from another Wikipedia please let us know and we would be happy to do so.
About Automoderator
Automoderator’s model is trained exclusively on Wikipedia’s main namespace pages, limiting its dataset to edits made to Wikipedia articles. Further details can be found below:
Internal configuration
In the current version of the spreadsheet, in addition to considering the model score, Automoderator does not take actions on:
- Edits made by administrators
- Gyaran da bots suka yi
- Gyaran da ke mai da kai
- Sabbin ƙirƙirar shafi
The datasets contain edits which meet these criteria, but Automoderator should never say it will revert them. This behaviour and the list above will be updated as testing progresses if we add new exclusions or configurations.
Caution levels
In this test Automoderator has five 'caution' levels, defining the revert likelihood threshold above which Automoderator will revert an edit.
- At high caution, Automoderator will need to be very confident to revert an edit. This means it will revert fewer edits overall, but do so with a higher accuracy.
- At low caution, Automoderator will be less strict about its confidence level. It will revert more edits, but be less accurate.
The caution levels in this test have been set by the Moderator Tools team based on our observations of the models accuracy and coverage. To illustrate the number of reverts expected at different caution levels see below:
Daily edits | Daily edit reverts | Average daily reverts by Automoderator | |||||
---|---|---|---|---|---|---|---|
Very cautious
>0.99 |
Cautious
>0.985 |
Somewhat cautious
>0.98 |
Low caution
>0.975 |
Not cautious
>0.97 | |||
English Wikipedia | 140,000 | 14,600 | 152 | 350 | 680 | 1,077 | 1,509 |
French Wikipedia | 23,200 | 1,400 | 24 | 40 | 66 | 98 | 136 |
German Wikipedia | 23,000 | 1,670 | 14 | 25 | 43 | 65 | 89 |
Spanish Wikipedia | 18,500 | 3,100 | 57 | 118 | 215 | 327 | 445 |
Russian Wikipedia | 16,500 | 2,000 | 34 | 57 | 88 | 128 | 175 |
Japanese Wikipedia | 14,500 | 1,000 | 27 | 37 | 48 | 61 | 79 |
Chinese Wikipedia | 13,600 | 890 | 9 | 16 | 25 | 37 | 53 |
Italian Wikipedia | 13,400 | 1,600 | 40 | 61 | 99 | 151 | 211 |
Polish Wikipedia | 5,900 | 530 | 10 | 16 | 25 | 35 | 45 |
Portuguese Wikipedia | 5,700 | 440 | 2 | 7 | 14 | 21 | 30 |
Hebrew Wikipedia | 5,400 | 710 | 16 | 22 | 30 | 38 | 48 |
Persian Wikipedia | 5,200 | 900 | 13 | 26 | 44 | 67 | 92 |
Korean Wikipedia | 4,300 | 430 | 12 | 17 | 23 | 30 | 39 |
Indonesian Wikipedia | 3,900 | 340 | 7 | 11 | 18 | 29 | 42 |
Turkish Wikipedia | 3,800 | 510 | 4 | 7 | 12 | 17 | 24 |
Arabic Wikipedia | 3,600 | 670 | 8 | 12 | 18 | 24 | 31 |
Czech Wikipedia | 2,800 | 250 | 5 | 8 | 11 | 15 | 20 |
Romanian Wikipedia | 1,300 | 110 | 2 | 2 | 4 | 6 | 9 |
Croatian Wikipedia | 500 | 50 | 1 | 2 | 2 | 3 | 4 |
... | ... | ... | ... | ... | ... | ... | ... |
All Wikipedia projects | 538 | 984 | 1,683 | 2,533 | 3,483 |
This data can be viewed for other Wikimedia projects here.
Score an individual edit
We have created a simple user script to retrieve a Revert Risk score for an individual edit.
Simply import User:JSherman (WMF)/revertrisk.js into your commons.js with mw.loader.load( 'https://en.wikipedia.org/wiki/User:JSherman_(WMF)/revertrisk.js?action=raw&ctype=text/javascript' );
.
You should then find a 'Get revert risk score' in the Tools menu in your sidebar. Note that this will only display the model score, and does not take into account Automoderator's internal configurations as detailed above. See the table above for the scores above which we are investigating Automoderator's false positive rate.
Initial results
Quantitative
22 testing spreadsheets were shared back with us, totalling more than 600 reviewed edits from 6 Wikimedia projects. We have aggregated the data to analyse how accurate Automoderator would be at different caution levels:
Not cautious (0.97) | Low caution (0.975) | Somewhat cautious (0.98) | Cautious (0.985) | Very cautious (0.99) |
---|---|---|---|---|
75% | 82% | 93% | 95% | 100% |
In our Moderator Tools/Automoderator/Measurement plan we said that we wanted the most permissive option Automoderator could be set at to have an accuracy of 90%. Matakan 'Ba a hankali' da 'Ƙananan hankali' suna a fili a ƙasa da wannan, wanda ba abin mamaki ba ne kamar yadda ba mu da cikakkun bayanai daga abin da za mu zaɓi waɗannan matakan farko. Za mu cire madaidaicin 'Ba a hankali', saboda kuskuren 25% ya yi ƙasa sosai ga kowace al'umma. Za mu riƙe 'Ƙarancin taka tsantsan' a yanzu, kuma mu saka idanu kan yadda daidaiton sa ke canzawa yayin da ƙirar ƙira da haɓakawa na Automoderator ke faruwa har zuwa turawa. Muna son yin kuskure a gefen Automoderator ba cire gyare-gyare mara kyau ba, don haka wannan shine fifiko a gare mu mu ci gaba da bita.
When we have real world accuracy data from Automoderator's pilot deployment we can investigate this further and consider changing the available thresholds further.
Qualitative
On the testing talk page and elsewhere we also received qualitative thoughts from patrollers.
Overall feedback about Automoderator’s accuracy was positive, with editors feeling comfortable at various thresholds, including some on the lower end of the scale.
Wasu masu gyara sun nuna damuwa game da ƙarar gyare-gyaren Automoderator a zahiri zai koma zama kaɗan. Wannan wani abu ne da za mu ci gaba da tattaunawa da al'umma. From our analysis (T341857#9054727) we found that Automoderator would be operating at a somewhat similar capacity to existing anti-vandalism bots developed by volunteers, but we’ll continue to investigate ways to increase Automoderator’s coverage while minimising false positives.
Next steps
Dangane da sakamakon da ke sama, muna jin kwarin gwiwa ga daidaiton samfurin kuma muna shirin ci gaba da aikinmu akan Automoderator. Yanzu za mu fara aikin fasaha akan software, yayin da muke bincika ƙira don ƙirar mai amfani. Muna sa ran sabuntawa na gaba da za mu raba zai ƙunshi na'urori masu daidaitawa don amsawa.
In the meantime please feel free to continue testing Automoderator via the process above - more data and insights will continue to have a positive impact on this project.