Talk:ORES

Latest comment: 1 year ago by Tiffatk3 in topic ORES on EverybodyWikis ?
Warning Warning: The ORES infrastructure is being deprecated by the Machine Learning team, please check wikitech:ORES for more info.

This talk page is intended to be used to discuss the development of ORES. For the bots/tools that use ORES, please contact their owners/maintainers. See ORES/Applications for a list of tools/bots that use ORES.

Welcome to the new home of ORES

edit

Watch this flow board to get notifications about updates related to ORES. Halfak (WMF) (talk) 22:34, 16 May 2017 (UTC)Reply

Also, check out our team's new home at mw:Wikimedia Scoring Platform team Halfak (WMF) (talk) 22:35, 16 May 2017 (UTC)Reply

Join my Reddit AMA about ORES

edit

Hey folks, I'm doing an experimental Reddit AMA ("ask me anything") in r/IAmA on June 1st at 21:00 UTC. For those who don't know, I create artificial intelligences that support the volunteers who edit Wikipedia like ORES. I've been studying the ways that crowds of volunteers build massive, high quality information resources like Wikipedia for over ten years.

This AMA will allow me to channel that for new audiences in a different (for us) way. I'll be talking about the work I'm doing with the ethics and transparency of the design of AI, how we think about artificial intelligence on Wikipedia, and ways we’re working to counteract vandalism. I'd love to have your feedback, comments, and questions—preferably when the AMA begins, but also on this Flow board.

If you'd like to know more about what I do, see my WMF staff user page, this Wired piece about my work or my paper, "The Rise and Decline of an Open Collaboration System: How Wikipedia’s reaction to popularity is causing its decline". EpochFail (talk) 15:40, 24 May 2017 (UTC)Reply

We're live! https://www.reddit.com/r/IAmA/comments/6epiid/im_the_principal_research_scientist_at_the/ EpochFail (talk) 21:02, 1 June 2017 (UTC)Reply

Ready for translation!

edit

@Amire80

Hi, if you find yourself with a free moment, would you mind marking this page as ready for translation? (Really, I should just request pagetranslate privileges for myself on this wiki, cos I like to dabble in i18n code.) Thank you!

See also https://phabricator.wikimedia.org/T163786 Adamw (talk) 22:35, 29 June 2017 (UTC)Reply

I've attempted to do most of the preparatory work (plus copyedits/fixes), but I'm not sure about a few translation items, such as whether or not to tvar the draftquality and other keywords, and whether to tvar some of those Enwiki links. So, I'll leave it for Amire80 to give a final pass through. Quiddity (WMF) (talk) 00:20, 30 June 2017 (UTC)Reply

Patterns based on sessions from anonymous users?

edit

(This is a question emerging from a discussion in Catalan Wikipedia.)

Imagine this situation: an anonymous (IP) user makes 5 edits during one session. They are all subtle vandalism, introducing wrong words and concepts that require certain knowledge to detect (i.e. that Town X is not in the coast).

Now imagine, that from these five edits, three are detected and reverted, but the other two still remain in place. It could be that the other two edits are legitimate, but chances are that the other two edits done by the same user in the same session are vandalism too, just not detected.

Is ORES analyzing this type of situations already? If not, is this a pattern that could be considered? QuimGil (talk) 05:38, 6 July 2017 (UTC)Reply

ORES currently doesn't analyze beyond a single edit. This is something we're hoping to look into thought. For Phab:T155756, we're planning to take a whole session of edits and build features from them. Once we have that infra, we can experiment with other models too. EpochFail (talk) 16:45, 22 August 2017 (UTC)Reply

Sharing 'reverted' or 'damaging'/'good faith' across projects with the same language

edit

Hello. I was wondering if we could share the results of campaigns across projects in the same language. Spanish Wikipedia and Wikibooks, for example, both have the 'reverted' model already built, however the edit tagging is being very slow. Since we operate in the same language, I assume that we can share some things; right? If not, do we have to request a reverted model for each sister project/damaging-good faith for each project as well? Thank you. —MarcoAurelio (talk) 13:19, 16 July 2017 (UTC)Reply

I'm not sure how a model would work cross-project. We've never tried that before. It sounds like we ought to do some research before we try it. EpochFail (talk) 16:43, 22 August 2017 (UTC)Reply

Rebuild 'reverted' model

edit

Is it possible to rebuild a 'reverted' model to finetune it or does it finetune itself and with the help of new labeled edits? Thanks. —MarcoAurelio (talk) 13:20, 16 July 2017 (UTC)Reply

It gets fine-tuned with new edits. Is there a problem with one of the reverted models? EpochFail (talk) 16:41, 22 August 2017 (UTC)Reply
Thank you for your reply. Not a problem but "roughness" sometimes, so to speak (which is to be expected as the docs says) . But I was not aware that it gets finetuned itself with new edits. In any case I hope we can finish the damaging/good faith labeling on eswiki soon to have better heuristics. Regards. —MarcoAurelio (talk) 17:21, 22 August 2017 (UTC)Reply
So it doesn't automatically get fine-tuned with new edits, but we can always retrain the model with new data. If you'd like us to give that a try, we can add it to the backlog. It's not too difficult to do. EpochFail (talk) 17:26, 22 August 2017 (UTC)Reply
I'm leaving that to @-jem- because his bot, PatruBOT, is using the currently avalaible data from ORES at eswiki. If he thinks it'd be worth requesting a retrain of the reverted model for his bot to be more accurate until the damaging&good faith campaing is not active, his choice, as long as the labelling campaign do not reset and all our job gets lost. —MarcoAurelio (talk) 17:32, 22 August 2017 (UTC)Reply
Thanks, @MarcoAurelio. Well, as the other campaigns may still be some weeks or months ahead, and we still have some false positives in the reverted model, if the model can be improved with not so much work, I (and the eswiki community) will appreciate it. Or maybe you can give me some hints about fine-tuning the bot with ORES information beyond the use of the reverted probability. I'll keep on reading. Regards. -jem- (talk) 09:15, 24 August 2017 (UTC)Reply
As long as we don't loose the already-done tagging of edits (we're at 70% now) I have no objections to rebuild the reverted model. Maybe we can exclude es:User:PatruBOT from the rebuild so its false positives do not "contaminate" the results? —MarcoAurelio (talk) 09:18, 24 August 2017 (UTC)Reply
When PatruBOT commits a false-positive, is the good edit generally restored via a second revert? If so, we're likely already excluding them from the model. Whenever a reverted edit is later restored by someone other than the original author, we exclude that edit as a revert example for the model. EpochFail (talk) 10:53, 24 August 2017 (UTC)Reply
I think most of those good edits are restored when people check their watchlists, but pages with no active watchers will remain without the edit. But the same can happen with human mistakes, so I think things are acceptable as is. And if I can help you by explaining (privately and away from vandals) how PatruBOT works, I'll be glad to do it. -jem- (talk) 22:39, 24 August 2017 (UTC)Reply
edit

w:MediaWiki:Eri-rcfilters-beta-description-ores is displayed at w:Special:Preferences#mw-prefsection-betafeatures. It and the various language versions have obsolete links on "ORES" to meta:Objective Revision Evaluation Service instead of mw:ORES. I'm an English Wikipedia admin and could create a local message there with the new link but a central fix for all wikis would be much better. PrimeHunter (talk) 21:33, 3 August 2017 (UTC)Reply

Patch filed, thanks! Quiddity (WMF) (talk) 22:00, 3 August 2017 (UTC)Reply
edit

Is the github.com/wiki-ai page the right place to link? EEggleston (WMF) (talk) 17:46, 9 August 2017 (UTC)Reply

It's not a bad place to link. We keep all of our primary repos within that organization. EpochFail (talk) 16:41, 22 August 2017 (UTC)Reply
Unfortunately, this is out-of-date now. The wiki-ai organization still shows which repos we work in, but the most current code for those projects will be in the `wikimedia` organization.
We need to create a new entry point for developers. Adamw (talk) 18:02, 27 November 2018 (UTC)Reply

No more automatic bots for all pages

edit

Only for ended and serious wikipedia pages if is necessary

Please is very ridiculous authoritarian to much fast reversions 190.109.240.180 (talk) 23:01, 14 August 2017 (UTC)Reply

Forgot to watch this page!

edit

Sorry for the delayed responses. Working through old questions now. EpochFail (talk) 16:42, 22 August 2017 (UTC)Reply

Is there a correlation between the editing environment and draftquality?

edit

I'd like to know whether draftquality is the same for new accounts that create new articles in the visual editor as it is for new accounts that create new articles in the older wikitext editors (at the English Wikipedia).

@Nettrom, I'm assuming that this is outside the scope of your current projects. @Neil P. Quinn-WMF, is this something that you could do? I'm not sure how much work this would be, but I assume that it's not very difficult. Whatamidoing (WMF) (talk) 02:02, 30 August 2017 (UTC)Reply

I'm very interested in all VE research results. Please ping me if this project goes forwards.
Confounding variables of using different self-selected populations would produce unreliable results, especially because some percentage of those "new accounts" actually represent experienced editors. (Paid editors in particular abuse throw-away accounts for each new article.) However I have a fix. You can do a retroactive controlled study. You have to ignore whether an article was created using VE, and look for any difference in draftquality between the experimental and control groups of the May 2015 study of VisualEditor's effect on newly registered editors.
Comparing control group wikitext articles against experimental group wikitext+VE articles will cut your signal strength in half, but it's the only way to avoid junk data due to skewed population selection. Alsee (talk) 08:11, 7 September 2017 (UTC)Reply
Your idea might measure the added value for VisualEditor's contribution (e.g., if you were trying to show that a new editor using VisualEditor is more likely to properly format a citation), but that's not actually my goal. I'm thinking that the chosen editing environment might be a useful marker. Whatamidoing (WMF) (talk) 15:34, 7 September 2017 (UTC)Reply
There's obviously no value in collecting data showing that experienced editors produce higher quality drafts than new users.
That is likely what we would get if we ran your proposed data-collection without modification. An experienced user with a new-account is more likely to know how to switch to the secondary editor. This can introduce an experienced-user bias in the study's population-selection.
(Collecting reliable data from the wild isn't easy.) Alsee (talk) 15:08, 8 September 2017 (UTC)Reply
My proposal is to study an objective, non-speculative condition: "new accounts that create new articles". Whatamidoing (WMF) (talk) 15:23, 8 September 2017 (UTC)Reply
What value would that have, if it merely establishes that experienced users produce higher quality drafts than new users?
We can get much more valuable results by re-examining data from the controlled study. Alsee (talk) 16:54, 8 September 2017 (UTC)Reply
You are welcome to re-examine that old data if you want to.
You are also welcome to study whether new accounts that you believe to be experienced editors actually produce higher quality drafts. However, it sounds circular to me: How will you divide the brand-new accounts into "experienced" and "new" editors? By looking at the quality of the draft. What are you going to study? Whether the ones that you labeled "experienced", on the basis of their higher quality drafts, produced higher quality drafts than the ones that you labeled "new", on the basis of their lower quality drafts. If you did not find a perfect correlation in such a study, then you would probably want to look for an arithmetic error.
I do not want to discourage you from researching whatever interests you, but your question does not interest me. Whatamidoing (WMF) (talk) 18:25, 8 September 2017 (UTC)Reply
What??? Do you understand why you're going to get junk data?
For new accounts, you can't distinguish experienced editors from new editors. It's a confounding factor. You're proposing to use biased populations.
I also don't understand why you seem actively-averse to looking at high quality data. Alsee (talk) 04:43, 10 September 2017 (UTC)Reply
Again, I'm not trying to distinguish experienced editors from new editors.
I'm trying to find out whether new accounts (=an objective, unbiased, machine-identifiable state that is only partially correlated with the actual experience level of the humans who are using those accounts) and either use, or don't use, the newer editing software, produce the same or different results on the specific measure of ORES draftquality.
As a side note, it sounds like you're assuming that experienced editors are more likely to switch to visual editing than new editors. I don't think that there is any data to support your assumption. Whatamidoing (WMF) (talk) 22:28, 11 September 2017 (UTC)Reply
Hypothesis 1: Experienced users are more likely to know how to switch to VisualEditor. Draftquality for new-accounts using VisualEditor will skew high, because you're measuring more experienced editors in VE vs newbies in wikitext.
Hypothesis 2: Experienced editors overwhelmingly prefer wikitext. Draftquality for new-accounts using VisualEditor will skew towards 'suck', because you're measuring more experienced editors in wikitext vs newbies in VE.
I find it hard to imagine any valid use for the results when you don't know what you're measuring. I can however imagine some invalid uses for a collection of random numbers.
Edit: Perhaps it would aid my understanding if you identified how you wanted to use the data, rather defining the data to be collected. It's the intent here, which will help me understand if I'm mistaken. Alsee (talk) 00:26, 12 September 2017 (UTC)Reply

Labeling gadget diffs in reverse order?

edit

I noticed that recently the labeling gadget http://labels.wmflabs.org/ui/ has reversed sides for before and after diffs, except on cases when there is a new page creation. Looks like a bug. Can someone verify? I am doing good/bad rating on lvwiki tasks.

This is a vandalism: https://lv.wikipedia.org/w/index.php?diff=2674922 But on labeling page the "after" is shown in first column. Papuass (talk) 09:19, 31 August 2017 (UTC)Reply

This is a huge bug! Thank you for reporting it! EpochFail (talk) 13:22, 31 August 2017 (UTC)Reply
We're working on a bug fix deployment right now. It should be ready in a couple of minutes. It looks like 21 labels have been submitted since the bug was introduced. I'll be removing those. There will be an announcement going out soon. EpochFail (talk) 13:34, 31 August 2017 (UTC)Reply
Glad to help Papuass (talk) 13:49, 31 August 2017 (UTC)Reply
Fixed. See https://phabricator.wikimedia.org/phame/post/view/69/wikilabels_incident_reversed_diffs/
Thanks again for reporting. Your timely notice was invaluable! EpochFail (talk) 14:04, 31 August 2017 (UTC)Reply

Catalan Wikipedia

edit

There was recently a suggestion in the Catalan Wikipedia to start using ORES again after months of inactivity. We have an edit type campaign, but there haven't been any damaging-goodfaith campaigns, which seems unusual according to ORES/Get support. Furthermore, we do not seem to have language support for Catalan. Are we on the right track? What should we do to reactivate the labelling project? Thanks in advance. Townie (talk) 15:34, 15 October 2017 (UTC)Reply

As far as I can see the labeling campaign is 8% done: http://labels.wmflabs.org/stats/cawiki/. In order to label more go to http://labels.wmflabs.org/ui/cawiki/ Ladsgroup (talk) 09:16, 23 October 2017 (UTC)Reply
@Ladsgroup: So should we continue and end this campaign, or ask for language support first? Townie (talk) 18:01, 23 October 2017 (UTC)Reply
Both would be great and needed for the advanced support, we already have the generated list for Catalan: m:Research:Revision scoring as a service/Word lists/ca So review of that would be awesome (See ORES/BWDS review for more info) Ladsgroup (talk) 18:30, 24 October 2017 (UTC)Reply
@Townie: just checked up on this and it seems like we're blocked on you or some other Catalan speaker reviewing the word lists Ladsgroup linked to above. EpochFail (talk) 15:48, 11 December 2017 (UTC)Reply
@EpochFail: Thanks a lot! I will try to fill the lists as soon as possible. Townie (talk) 16:12, 11 December 2017 (UTC)Reply
@EpochFail: I think I'm done with the bad words list and the informal one. Many of the words in the generated list were neither informal nor bad words, simply words which vandals use without being any type of slur. I left them there, feel free to take them out if necessary. Townie (talk) 17:47, 11 December 2017 (UTC)Reply
Townie, that's perfect. Indeed, our "BWDS" script picks up a lot of non-bad words that vandals use. Your help in filtering them out is greatly appreciated. :) EpochFail (talk) 17:57, 11 December 2017 (UTC)Reply
Thank you for your replies. Shall I open a task in Phabricator so that Catalan is integrated in revscoring? Townie (talk) 20:28, 16 December 2017 (UTC)Reply
Sorry for the late reply. We're actively working on Catalan and I filed the task. T182612, T182611. Nothing to do but wait on us right now. We'll have more for you soon :) EpochFail (talk) 21:28, 19 December 2017 (UTC)Reply

Ednita Nazario

edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Yo creo que la informacion de Ednita Nazario (Cantante Puertoriqueña) debe de ser editada. Numero uno, la primera seccion deberia llamarse "Principios de Carreras" y no "comienzo de su carrera" porque se ve mas formal. Segundo, dividir las secciones por cada 5 años ya que ella tiene muchos discos y no se le esta dando suficiente atencion a cada uno. Tercero, cambiar los headers con los años (por ejemplo "1970-1975- seguido por los nombres de los albums"). Cuarto, editar el ultimo header para añadir su nuevo album "Una Vida" y su Autobiografia. Quinto, editar la seccion de Desnuda porque el link es hacia otra cosa que no es su album. Intento editarlo y cada vez que lo hago el bot me lo altera. Gracias 40.132.158.86 (talk) 16:44, 25 October 2017 (UTC)Reply

I see this was also posted to the correct page, https://es.wikipedia.org/wiki/Discusión:Ednita_Nazario
Thank you for your interest! Adamw (talk) 21:17, 26 October 2017 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Difference between EndPoints in the Sample Queries

edit

Hi ORES team, what's the difference between EndPoint:http://ores.wmflabs.org/v3/scores and https://ores.wikimedia.org/v3/scores in the sample query section?

One is under wmflabs.org, the other is under wikimedia.org Xinbenlv (talk) 23:23, 31 October 2017 (UTC)Reply

Luckily, we're just finishing up an FAQ for ORES. See ORES/FAQ#What deployments of ORES are there? for what you are looking for. :) EpochFail (talk) 20:45, 2 November 2017 (UTC)Reply

Nikki. La Voz Teens (Colombia)

edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


I made and edition inserting the link to the Nikki's wikipedia page. Nikki was part of the program and the link wasn't wrong nor had a bad intention. 181.56.91.4 (talk) 23:23, 6 November 2017 (UTC)Reply

I'm not sure what the question is. EpochFail (talk) 00:54, 7 November 2017 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Setter Motocicleta

edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


I´have been trying to update the page of Setter Motocicletas in Spain, and I´ve been rejected... I can update and include all new information about Setter brand, own of my family.

What can we do? thanks

Jose Jsanchezsantonja (talk) 00:31, 9 November 2017 (UTC)Reply

I'm sorry to say that this is not the right place to ask. EpochFail (talk) 10:46, 9 November 2017 (UTC)Reply
I'm sorry to say that this is not the right place to ask. EpochFail (talk) 10:50, 9 November 2017 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

"El Real Colegio Convictorio de Nuestra Señora de Monserrat"

edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Realicé una modificación en "El Real Colegio Convictorio de Nuestra Señora de Monserrat" referida al Virreinato del Río de la Plata.La referencia a ese territorio es errónea porque el Virreinato del Río de la Plata es de 1776 y el colegio fue creado en 1687, cuando los territorios correspondían al Virreinato del Río de la Plata. Paraquaria (talk) 15:27, 10 November 2017 (UTC)Reply

I'm sorry but this isn't the right place to raise your concern. Please consider editing the talk page to comment on content of a specific article. EpochFail (talk) 11:17, 13 November 2017 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Juan de Garay

edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


I have trying to update the page, and write some names of sons and datters to Maria de Garay (datter to Juan de Garay "el mozo") that were not in the text before. But the changes has been deleted. Arjd1977 (talk) 12:13, 12 November 2017 (UTC)Reply

Hi Arjd1977, I'm sorry but this isn't the right place to raise your concern. Please consider editing the talk page to comment on content of a specific article. EpochFail (talk) 11:16, 13 November 2017 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.
edit

Does ORES has a recommended way to set header so that we can identify ourselves when calling this API? (and get higher QPS limit)? Thank you! Xinbenlv (talk) 21:22, 22 November 2017 (UTC)Reply

Yes. Please include an email address and some description of your project in the User-agent string. See API:Main_page#Identifying_your_client for some tips on what to include in a good User-agent. EpochFail (talk) 16:24, 25 November 2017 (UTC)Reply

Malo

edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


No se por que borras la información que yo publico si es verídica.. Deberías de informarte PatruBot. Por favor no borres lo que publique 200.66.41.167 (talk) 17:55, 20 January 2018 (UTC)Reply

Please see
es:Usuario discusión:PatruBOT
for discussion of its activities. EpochFail (talk) 18:17, 22 January 2018 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

La Informacion es Erronea

edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


El nombre correcto del paraje es Flecha de Nueva Umbría, no del El Rompido, y este paraje se encuentran en termino municipal de Lepe. Cartaya no tiene nada!! No nos quite a los leperos lo que es nuestro y mucho menos cambiando el nombre. Gracias SergiodelabellaLepe (talk) 17:10, 22 January 2018 (UTC)Reply

Please see the discussion on the relevant talk pages in Spanish Wikipedia. EpochFail (talk) 18:19, 22 January 2018 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Títere

edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


I am trying to change one page and the i get a error. The page says me that i am a "titere" so, i need help Jogamau (talk) 12:43, 29 January 2018 (UTC)Reply

Where? wargo (talk) 12:45, 29 January 2018 (UTC)Reply
Page is «Juego de bienes públicos» Jogamau (talk) 12:56, 29 January 2018 (UTC)Reply
On which wiki? wargo (talk) 13:00, 29 January 2018 (UTC)Reply
Wikipedia Jogamau (talk) 13:19, 29 January 2018 (UTC)Reply
Please HELP Jogamau (talk) 14:00, 29 January 2018 (UTC)Reply
Contact with filter operators about you are blocked by filter 36. wargo (talk) 20:16, 29 January 2018 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

Editar Articulo

edit

Trato de editar el Articulo llamado Miguel Castro Reynoso, ya que se marco como spam y me indicaron que debia de hacer algunas correcciones en dicha página, sin embargo, cuando trato de editar y corregir me marca como vandalismo y revierte mi cambio. Fridaonz (talk) 22:14, 7 February 2018 (UTC)Reply

Intento editar la ficha de Veganismo pero me está dando errores continuados y resulta imposible editar. ¿Cual es el problema? Pontesalpublicidad (talk) 11:34, 8 February 2018 (UTC)Reply

EUTM Malí

edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Good morning.

I am the Deputy Public Affairs Office from EUTM Mali.

I am trying to change/update the information that is in the wikipedia concerning this topic, but it is the third time the bot erases all what i have modified, and refers me to this page.

Is there any way of doing this, as i have to change the Spanish/English/French pages?

Regards. DPAO EUTMMali (talk) 11:20, 13 February 2018 (UTC)Reply

Hi DPAO EUTMMali. It seems that you might have a conflict of interest with the subject matter of the article since you are directly involved. Please consider posting a message about the changes you'd like to make on the relevant talk pages (e.g. en:Talk:EUTM Mali, fr:Discussion:Mission de formation de l'Union européenne au Mali, es:Discusión:EUTM Malí). EpochFail (talk) 15:28, 13 February 2018 (UTC)Reply
Good morning.
Thanks for the answer.
As i don't know who wikipedia exactly works...i will try to materialize your words...
Regards. DPAO EUTMMali (talk) 09:18, 14 February 2018 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

reversal of changes

edit

Hello, I wanted to update data and I put the references, I do not know if you change my changes, I would like you to check. The information is correct and so are the sources. Lorely De Leon (talk) 07:32, 27 February 2018 (UTC)Reply

Automated assessment on en.wiki

edit

See discussion at en:Wikipedia:Village pump (idea lab)#Automated article assessment. The idea is for a bot to use ORES to generate project assessments on article talk pages, flagged as "bot assessed". The bot would periodically reassess articles until a human removed the "bot assessed" flag. This must have been considered before? Any comments there would be welcome. Thanks, ~ Aymatth2 (talk) 12:15, 1 April 2018 (UTC)Reply

It's a good idea IMO. How can we help? EpochFail (talk) 19:13, 9 April 2018 (UTC)Reply
The discussion has moved over to :en:Wikipedia:Village pump (proposals)#Automated article assessment, where some possible improvements (and possible problems) have been identified. Any thoughts you could give there on what would or would not work, and what the practical implementation steps could be, would be more than welcome. Aymatth2 (talk) 01:34, 10 April 2018 (UTC)Reply

Japanese Wikipedia

edit

There is no Japanese Wikipedia in support table? I am not developer. This is curious.

If someone indicates a link about this, I appreciate that and follow it. Netkawai (talk) 07:16, 13 April 2018 (UTC)Reply

Thanks for your note! We're stuck on an issue we can't review with our current support for Japanese Wikipedia. Can you check out Phab:T133405 and see if you can help us? EpochFail (talk) 14:15, 13 April 2018 (UTC)Reply

Coverage of ORES (and hackathon) in a Catalan newspaper

edit

https://www.ara.cat/tecnologia/Wikipedia-reinventa_0_2018798151.html Halfak (WMF) (talk) 13:46, 23 May 2018 (UTC)Reply

ORES was also mentioned here:
among many others. It is certainly an interesting project :) Townie (talk) 18:41, 23 May 2018 (UTC)Reply
\o/ Thanks for sharing Halfak (WMF) (talk) 19:38, 23 May 2018 (UTC)Reply

Help to support ORES at Galician wiki

edit

Hi, I need help to get ORES support at Galician wikipedia, could somebody help me? I created https://phabricator.wikimedia.org/T201142 and https://phabricator.wikimedia.org/T201146, but nothing happens... Elisardojm (talk) 17:44, 3 September 2018 (UTC)Reply

The product teams that deal with this likely have a backlog. It might take a few weeks or months for the project to be picked up, as it is a feature request and not a bug. Daylen (talk) 00:00, 4 September 2018 (UTC)Reply
Ok, thanks! Elisardojm (talk) 01:00, 4 September 2018 (UTC)Reply
Hey! Thanks for the ping! I just got back from a vacation and will be picking up new tasks soon. I just added an update to both. EpochFail (talk) 14:09, 4 September 2018 (UTC)Reply

Question from a student doing independent research

edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


The following is an email conversation that I had with Mark Wang about ORES. I'm posting this so that it'll maybe gain some long-term usefulness. As you can see at the end of the thread, Mark agreed to me posting this publicly.

On Sat, Nov 17, 2018 at 8:50 PM Wang, Mark <> wrote:

Hi Scoring Platform Team!

I'm Mark, a CS student at Brown Univ. I'm working on experimenting with applying Snorkel (a framework for leveraging unlabeled data with noisy label proposals) to detect paid Wikipedia edits. I've got a few selfish requests / questions for you guys.

Snorkel code : https://github.com/HazyResearch/snorkel Snorkel paper : https://arxiv.org/pdf/1711.10160.pdf

Some selfish questions:

1) Is it possible for me to have access to edits and page-stats data that you work with? I can scrape them myself (with a reasonable crawl rate), but of course, it's less convenient and I'll end up working with less data.

2) How do you represent revisions? I'm thinking about using character embeddings here. What are some methods that worked well for you guys? And what should I probably not try?

3) What features seem to be strongly informative in your models for detecting low-quality edits?

4) Any additional recommendations /advice?

Thank you in advance for your time, Mark Wang


On Mon, Nov 19, 2018 at 4:49 PM Aaron Halfaker <ahalfaker@wikimedia.org> wrote: Hi Mark!

Thanks for reaching out! Have you seen our recent data release of known paid editors? https://figshare.com/articles/Known_Undisclosed_Paid_Editors_English_Wikipedia_/6176927

1) I'm not sure what page stats you are looking for, but you can see the features we use in making predictions by adding a "?feature" argument to an ORES query. For example, https://ores.wikimedia.org/v3/scores/enwiki/21312312/damaging?features shows the features extracted and a "is this edit damaging" prediction for https://en.wikipedia.org/wiki/Special:Diff/21312312

2) A revision is a vector that we feed into the prediction model. We do a lot of manual feature engineering, but we use vector embeddings for topic modeling. We're actually looking into just using our current word2vec strategies for implementing better damage detection too. See https://phabricator.wikimedia.org/T197007

3) Here's an output of our feature importance weights for the same model. This is estimated by sklearn's GradientBoosting model.

feature.log((temporal.revision.user.seconds_since_registration + 1)) 0.131
feature.revision.user.is_anon 0.036
feature.english.dictionary.revision.diff.dict_word_prop_delta_sum 0.033
feature.revision.parent.markups_per_token 0.029
feature.revision.parent.words_per_token 0.028
feature.revision.parent.chars_per_word 0.027
feature.log((wikitext.revision.parent.ref_tags + 1)) 0.026
feature.revision.diff.chars_change 0.026
feature.revision.user.is_patroller 0.026
feature.english.dictionary.revision.diff.dict_word_prop_delta_increase 0.025
feature.log((wikitext.revision.parent.chars + 1)) 0.023
feature.log((AggregatorsScalar(<datasource.tokenized(datasource.revision.parent.text)>) + 1)) 0.023
feature.log((AggregatorsScalar(<datasource.wikitext.revision.parent.words>) + 1)) 0.023
feature.revision.parent.uppercase_words_per_word 0.022
feature.log((wikitext.revision.parent.wikilinks + 1)) 0.021
feature.log((wikitext.revision.parent.external_links + 1)) 0.02
feature.log((wikitext.revision.parent.templates + 1)) 0.02
feature.wikitext.revision.diff.markup_prop_delta_sum 0.02
feature.english.dictionary.revision.diff.non_dict_word_prop_delta_sum 0.02
feature.log((AggregatorsScalar(<datasource.wikitext.revision.parent.uppercase_words>) + 1)) 0.018
feature.revision.diff.tokens_change 0.018
feature.log((wikitext.revision.parent.headings + 1)) 0.017
feature.wikitext.revision.diff.markup_delta_sum 0.015
feature.revision.diff.words_change 0.015
feature.english.dictionary.revision.diff.dict_word_delta_sum 0.015
feature.english.dictionary.revision.diff.dict_word_prop_delta_decrease 0.015
feature.english.dictionary.revision.diff.non_dict_word_prop_delta_increase 0.015
feature.revision.diff.markups_change 0.014
feature.english.dictionary.revision.diff.dict_word_delta_increase 0.014
feature.wikitext.revision.diff.markup_prop_delta_increase 0.013
feature.wikitext.revision.diff.markup_delta_increase 0.012
feature.wikitext.revision.diff.number_prop_delta_sum 0.011
feature.wikitext.revision.diff.number_prop_delta_increase 0.011
feature.english.dictionary.revision.diff.non_dict_word_delta_sum 0.011
feature.wikitext.revision.diff.number_delta_increase 0.01
feature.revision.diff.wikilinks_change 0.01
feature.revision.comment.has_link 0.01
feature.english.dictionary.revision.diff.dict_word_delta_decrease 0.01
feature.revision.page.is_mainspace 0.009
feature.wikitext.revision.diff.number_delta_sum 0.009
feature.wikitext.revision.diff.markup_prop_delta_decrease 0.008
feature.english.dictionary.revision.diff.non_dict_word_prop_delta_decrease 0.008
feature.revision.page.is_articleish 0.007
feature.revision.diff.external_links_change 0.007
feature.revision.diff.templates_change 0.007
feature.revision.diff.ref_tags_change 0.007
feature.english.informals.revision.diff.match_prop_delta_sum 0.007
feature.english.informals.revision.diff.match_prop_delta_increase 0.007
feature.wikitext.revision.diff.number_prop_delta_decrease 0.006
feature.revision.comment.suggests_section_edit 0.006
feature.english.dictionary.revision.diff.non_dict_word_delta_increase 0.006
feature.wikitext.revision.diff.markup_delta_decrease 0.005
feature.revision.user.is_bot 0.005
feature.revision.user.is_admin 0.005
feature.english.badwords.revision.diff.match_prop_delta_sum 0.005
feature.wikitext.revision.diff.number_delta_decrease 0.004
feature.wikitext.revision.diff.uppercase_word_prop_delta_sum 0.004
feature.revision.diff.headings_change 0.004
feature.revision.diff.longest_new_repeated_char 0.004
feature.english.badwords.revision.diff.match_prop_delta_increase 0.004
feature.english.informals.revision.diff.match_delta_increase 0.004
feature.english.dictionary.revision.diff.non_dict_word_delta_decrease 0.004
feature.wikitext.revision.diff.uppercase_word_delta_sum 0.003
feature.wikitext.revision.diff.uppercase_word_prop_delta_increase 0.003
feature.revision.diff.longest_new_token 0.003
feature.english.informals.revision.diff.match_delta_sum 0.003
feature.wikitext.revision.diff.uppercase_word_delta_increase 0.002
feature.wikitext.revision.diff.uppercase_word_prop_delta_decrease 0.002
feature.english.badwords.revision.diff.match_delta_sum 0.002
feature.english.badwords.revision.diff.match_delta_increase 0.002
feature.wikitext.revision.diff.uppercase_word_delta_decrease 0.001
feature.english.informals.revision.diff.match_prop_delta_decrease 0.001
feature.revision.page.is_draftspace 0.0
feature.revision.user.has_advanced_rights 0.0
feature.revision.user.is_trusted 0.0
feature.revision.user.is_curator 0.0
feature.english.badwords.revision.diff.match_delta_decrease 0.0
feature.english.badwords.revision.diff.match_prop_delta_decrease 0.0
feature.english.informals.revision.diff.match_delta_decrease 0.0

4) You'll note that time since registration and is_anon are strongly predictive. They don't overwhelm the predictions -- we can still differentiate good from bad among newcomers and anonymous editors. But the model generally doesn't predict that an edit by a very experienced editors is bad regardless of what's actually in the edit. The more we can move away from relying is_anon and seconds_since_registration, the more we'll be targeting the things that people do -- rather than targeting them for their status. See section 7.4 our systems paper for a more substantial discussion of this problem.

-Aaron


On Mon, Nov 19, 2018 at 6:47 PM Wang, Mark <> wrote:

Thanks a bunch for your help Aaron! This is all very informative.

One more question from me: May I borrow your features? And if so, is accessing them through the API the preferred method of access for an outsider?

Thanks again, Mark


On Tue, Nov 20, 2018 at 11:07 AM Aaron Halfaker <ahalfaker@wikimedia.org> wrote:

Say, I'd like to save this conversation publicly so that others might benefit from it. Would you be OK with me posting our discussion publicly on a wiki?

On Tue, Nov 20, 2018 at 10:06 AM Aaron Halfaker <ahalfaker@wikimedia.org> wrote: Yes. That is a good method for accessing the features. You'll notice that the features that the API reports are actually just the basic reagents for the features the model uses.

For example, we have features like this:

  • words added
  • words removed
  • words add / words removed
  • log(words added)
  • log(words removed)
  • etc.

In all of these features, the basic foundation is "words added" and "words removed" with some mathematical operators on top. So we only report those two via the API. To see the full set of features for our damage detection model, see https://github.com/wikimedia/editquality/blob/master/editquality/feature_lists/enwiki.py See also a quick overview I put together for feature engineering here: https://github.com/wikimedia/revscoring/blob/master/ipython/feature_engineering.ipynb

If I wanted to extract the raw feature values for the English Wikipedia "damaging" model, I'd install the "revscoring" library (pip install revscoring) and then run the following code from the base of the editquality repo:

$ python
Python 3.5.1+ (default, Mar 30 2016, 22:46:26) 
[GCC 5.3.1 20160330] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from editquality.feature_lists.enwiki import damaging
/home/halfak/venv/3.5/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
>>> from revscoring.extractors import api
>>> import mwapi
>>> extractor = api.Extractor(mwapi.Session("https://en.wikipedia.org"))
Sending requests with default User-Agent.  Set 'user_agent' on mwapi.Session to quiet this message.
>>> list(extractor.extract(123456789, damaging))
[True, True, False, 10.06581896445358, 9.010913347279288, 8.079927770758275, 3.4965075614664802, 2.772588722239781, 5.402677381872279, 2.70805020110221, 1.791759469228055, 2.1972245773362196, 7.287484510532837, 0.3940910755707484, 0.009913258983890954, 0.06543767549749725, 0.0, 2.0, -2.0, 0.04273504273504275, 0.15384615384615385, -0.1111111111111111, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1, 1, False, False, False, False, True, False, False, 11.305126087390619, False, False, 0, 0, 0, 0.0, 0.0, 0.0, 0, 0, 0, 0.0, 0.0, 0.0, 0, 0, 0, 0.0, 0.0, 0.0, 0, 0, 0, 0.0, 0.0, 0.0]

This extracts the features for this edit: https://en.wikipedia.org/w/index.php?diff=123456789

-Aaron


Hi Aaron:

Thank you so much! This is all so helpful. And of course, feel free to publicize any of our conversations.


Mark Halfak (WMF) (talk) 21:54, 20 November 2018 (UTC)Reply

The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

What changes in probabilities are significant?

edit

As part of our work with the community wish list on SVWP , we're going to develop a gadget that gives an editor feedback using the article quality assessment. The idea is to show the quality before and after the edit. A question that has arisen is what changes to show and with what precision. Is it reasonable to show the difference for all probabilities, regardless of how small that difference is? My worry is that small changes to the probabilities may not be significant and could be misleading. Could someone give me some help with what changes are useful to show in a case like this? Sebastian Berlin (WMSE) (talk) 10:37, 2 April 2019 (UTC)Reply

I've developed a Javascript gadget that does something very similar to what you are planning. See https://en.wikipedia.org/wiki/User:EpochFail/ArticleQuality I wonder if we could make a modification to this tool to support what you are working on.
I've been using a "weighted sum" strategy to collapse the probabilities across classes into a single value. See this paper and the following code for an overview of how it works for English Wikipedia.
WEIGHTED_CLASSES = {FA: 6, GA: 5, B: 4, C: 3, Start: 2, Stub: 1}
weightedSum = function(score){
  var sum = 0
  for(var qualityClass in score.probability){
    if (!score.hasOwnProperty(qualityClass)) continue;
    var proba = score.probability[qualityClass]
    sum += proba * WEIGHTED_CLASSES[qualityClass]
  }
  return sum
}
This function returns a number between 1 and 6 that represents the model's prediction projected on a continuous scale.
Now, how big of a change matters? That's a good question and it's a hard one to answer. I think we'll learn in practice quite quickly once we have the model for svwiki. EpochFail (talk) 14:53, 2 April 2019 (UTC)Reply
That looks very interesting. I found (part of?) your script earlier, but I haven't had time to go figure out exactly what's going on there. I'll have a look and see what bits are reusable for this. I'd guess that the backend stuff (API interaction, weighting etc.) should be fairly similar.
I like the idea of just having one number to present to the user, along with the quality. From what I've understood, the quality levels aren't as evenly spaced on SVWP as on ENWP; it goes directly from Stub to equivalent to B. I don't know if and how this would impact the weighting algorithm, but maybe that will become apparent once it's in use. Sebastian Berlin (WMSE) (talk) 08:24, 3 April 2019 (UTC)Reply
We can have non-linear jumps in the scale. E.g. {Stub: 1, B: 4, GA: 5, FA: 6} EpochFail (talk) 13:10, 3 April 2019 (UTC)Reply
Dear all. I am not sure, if this thread is still active, however, I have a student working for an interface for representing the quality of Wikidata items. I would be happy to meet and to talk about it. We are based in Berlin :) Claudiamuellerbirn (talk) 11:13, 2 November 2019 (UTC)Reply

It seems the Phabracator is calling ORES JADE now?

edit

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.


Per https://phabricator.wikimedia.org/project/profile/2872/ For a time, JADE was named "Meta-ORES". JADE stands for Judgement and Dialog Engine. Should we change the Title of this? Xinbenlv (talk) 22:34, 8 April 2019 (UTC)Reply

ORES is not Jade, and Jade is not ORES. They are related but not the same thing.
There is also a page for Jade.
But there is a request phab:T153143 that ORES query results should include JADE refutations. 94rain Talk 07:53, 9 April 2019 (UTC)Reply
OK, I see. I was trying to find some documentation of JADE, where can I find it? thank you! Xinbenlv (talk) 15:50, 9 April 2019 (UTC)Reply
Jade#Subpages and Extension:JADE are all that I can found. 94rain Talk 13:50, 10 April 2019 (UTC)Reply
The discussion above is closed. Please do not modify it. No further edits should be made to this discussion.

ORES downtime on July 16th @ 1500 UTC

edit

Hey folks,

We expect a couple minutes of downtime while we restart a machine tomorrow (Tuesday, July 16th @ 1500 UTC).   Halfak (WMF) (talk) 21:45, 15 July 2019 (UTC)Reply

The maintenance is done and it doesn't appear that there was any downtime. Halfak (WMF) (talk) 15:03, 16 July 2019 (UTC)Reply

PSA: Switch from using ores.wmflabs.org to ores.wikimedia.org

edit

Hey folks! I've been debugging some issues with our experimental installation of ORES in Cloud VPS recently. It looks like we're getting *a lot* of traffic there. I just want to make sure that everyone knows that the production instance of ORES is stable and available at ores.wikimedia.org and that ores.wmflabs.org will be going up and down as we use it to experiment with ORES and new models we'd like to bring to production. EpochFail (talk) 14:10, 8 October 2019 (UTC)Reply

Edit summaries classifying edits as damaging or good-faith?

edit

Is it suggested to add keywords in your edit summaries undoing other people's edits such as "damaging", "vandalism", or "good-faith" to help train ORES? Enervation (talk) 19:22, 9 August 2020 (UTC)Reply

We don't process edit summaries to look for such annotations. Instead, the team is developing mw:Jade, a system for explicitly saying what was going on in an edit while undoing it. They are pretty close to deploying a pilot. EpochFail (talk) 14:19, 10 August 2020 (UTC)Reply

ORES on FANDOM wikis?

edit

I edited both ORES/FAQ and ORES/Get support pages to mention FANDOM wikis as well.

ORES AI service is used by Wikimedia Foundation's projects, most notably English Wikipedia, but what about FANDOM's Unified Community Platform (UCP) wikis?

I'm curious about machine learning assisting local wiki moderators and admins, and even SOAP (formerly VSTF) members finding spam and vandalism as well as disruptive edits. 36.74.43.247 (talk) 05:40, 3 September 2020 (UTC)Reply

ORES could totally be used on any wiki. Are the FANDOM wikis running MediaWiki? If not, there would need to be some engineering work to develop the API connectors that would allow ORES to pull data from whatever wiki platform FANDOM is running on. If not, I think the hardest part is going to be getting a server to run it as we couldn't host a model for FANDOM in Wikimedia's production installation. Depending on the # of edits, ORES can run on relatively minimal hardware. EpochFail (talk) 14:53, 10 September 2020 (UTC)Reply

ORES template not refreshing

edit

I made a minor edit to the ORES template to update the team name that maintains ORES. Despite refreshing the page, the changes are not reflected. Can someone look into this? Chtnnh (talk) 13:00, 19 March 2021 (UTC)Reply

Should ORES be aggressive to catch vandalism or should ORES be less aggressive to be nice to newcomers?

edit

Imagine you’ve just spent 10 minutes working on what you earnestly thought would be a helpful edit to your favorite article. You click that bright blue “Publish changes” button for the very first time, and you see your edit go live! Weeee! But 10 seconds later, you refresh the page and discover that your edit has been reverted.

Actually, an AI system - called ORES- has contributed to the judgement of hundreds of thousands of edits on Wikipedia. ORES is a machine learning system that automatically predicts edit and article quality to support editing tools in Wikipedia.

I'm exploring strategies for tuning ORES predictions about quality and vandalism to your needs and I'd like to work with you. I am are looking for editors to discuss the values of Wikipedia as it relates to ORES.

If you are interested in participating, please fill out the short survey below. Thanks! https://docs.google.com/forms/d/e/1FAIpQLSe7itK8GM6Y7vgWdtcFXXnsJ8iWe9ysjQI8S1KVtomfonbkxw/viewform EpochFail (talk) 19:56, 19 March 2021 (UTC)Reply

We have to stay at least as supportive to beginners as we ere in the beginning hen we haad to grow from small to gigantic like we are now. Practically everyone has good will  Klaas `Z4␟` V13:40, 20 March 2021 (UTC)Reply

Outdated study

edit

w:Wikipedia talk:IPs are human too#Outdated study might be of interest to you? (P.S.: No preview button here?) --143.176.30.65 21:23, 5 June 2021 (UTC)Reply

ORES on EverybodyWikis ?

edit

Our wikis are available on 25+ language and have thousand of articles.

We are using MediaWiki and many extensions used by the foundation.

Our most used wiki (english version has hundreds edits per day) WikiMaster (EverybodyWiki) (talk) 20:27, 13 September 2023 (UTC)Reply

This is a great reading spot!! Gotta love the WiKi, I read from WiKi just about every night!! Tiffatk3 (talk) 04:36, 19 December 2023 (UTC)Reply
Return to "ORES" page.