I'm looking into the feasibility of designing an anti-vandal bot for Wikidata (potentially using OpenAI tech if I can secure funding). I understand that you have built tools for handling revscoring in the general case but Wikidata items are a special snowflake in some senses. Are there tools being built that would work for Wikidata. [[Machine learning models]] lists "wikidatawiki-goodfaith" which is I assume what I want. But is that publicly usable? Thanks for any direction.
Talk:Machine Learning
Howdy folks. 1) Was just wondering how ORES is trained? For example, the "Vandalism" tags in PageTriage's Special:NewPagesFeed. Is there software somewhere where volunteers are presented with various pages and click a "vandalism yes/no?" button? If so where is the software? I'd like to check it out. 2) What are the PageTriage ORES configuration settings such as false positive target rate? I assume that this is a setting that can be adjusted up/down, which I assume is how anti-vandalism bot ClueBot NG achieves such a good false positive rate. I assume it's a tradeoff between false positives, and letting stuff slip through the cracks. 3) Any other hints about how ORES works in relation to PageTriage? I'll probably write some documentation about it. Thanks.
I believe this is where edits are labeled, per wiki and on some quite old selection of edits. IDK if some other training options are in place. Would love to learn too!
Hey @Novem Linguae and @Ponor, me and some researchers are currently working on a project where we build a system that facilitates curating up-to-date data for training and evaluating ML models used in Wikipedia, including but not limited to ORES. We plan to recruit a small group of people for pilot testing around June. Please let me know if you're interested in participating or learning more about the project. Thanks!
Hi @Novem Linguae, there is some general information here. The models are sometimes trained using human curated training data (like @Ponor mentioned). Other times, data such as whether an edit was reverted is used. The models just output probability, the thresholds are hardcoded in the mediawiki extension itself.
Additionally, we are planning on deprecating the current ORES/Revscoring models in favor of more modern models such as RevertRisk and Outlink Topic Model which cover multiple languages and take advantage of tool such as BERT. The ORES models will still be available for legacy reasons but we won't be updating them.
Hey 👋
we at the WMDE Wikidata team will soonish (this year?) introduce a new feature that will affect how we evaluate the quality of Items. That means, we will probably need to retrain / update the articlequality model for Wikidata and maybe others.
For that reason in particular, I want to get somewhat deeper machine learning knowledge. I've done Andrew Ng's Introduction to Machine Learning course a few years ago, before we last retrained and extended that model for Wikidata.
So I was wondering if you had any courses you would recommend? Especially for someone like me, who wants to upskill in order to work with / contribute to the articlequality model in particular and WMF/Wikimedia ML infrastructure in general.
Thanks!
Hey Michael! I don't know about courses, but Sebastian Raschka is probably the best ML educator out there right now. He has a course and a book I believe.
Thank you! I will look him up :)
I noticed that the damaging filter disappeared from the Recent Changes page on the English Wikipedia this morning (Eastern Time). Only the user intent prediction remains. Is there a rationale or announcement behind this change? Thanks!
Hi Tzusheng, my apologies on the late reply, I was out of the office. I haven't heard anything about a change and on our end the ML team hasn't change anything. That said also don't see the damaging filter in the Recent Changes page. Let me investigate and get back to you.
Here is the ticket: https://phabricator.wikimedia.org/T331045
Hi All! I'm back from vacation!
After far too long we just published a page on our machine learning modernization work, which includes modern serving, model cards, and other plans. I hope this sparks some interesting discussions with you all about what we are working on and how we can work together.
My apologies but no update this week because I have been out sick.
Hope you get fully well soon!
- We have hit another milestone for Lift Wing. Our ORES model infrastructure hosts ~110 machine learning models and for the first time all of those models are also publicly available on Lift Wing through the API Gateway. We still need to work out the details of reasonable rate limiting and optimizing performance (maybe use the API Gateway as a simple cache?), but you can access all the models right now without any internal WMF permissions. We will have some tutorials up soon for the community and an ask for testers to help us out.
- To be clear: if folks are currently using ORES, nothing will change with ORES for at least an entire year. We are working on a complete year-long migration plan for ORES users to Lift Wing that includes outreach, tutorials, and technical support, and that plan will only begin once Lift Wing is officially launched in a few months.
- After talking to AI ethics experts, wiki community members, WMF staff members, and frankly anyone else who would talk to us, we are working on generating model cards for all models on Lift Wing. The goal is for the model cards to be the main point of contact for questions, discussions, and ultimate governance around machine learning models hosted by WMF. The model cards will be individual articles on Mediawiki.org to make it easy for the community to use the tools they are familiar with. We should have some things to show everyone very soon.
- NLLB-200 deployment
- Major process continues on getting the NLLB-200 deployment live and running for the Content Translation Tool. I am confident we will make the January 1st deadline.
- API Gateway
- Tobias and Hugh made some major processes over the last two days and they are working on a patch that will allow the API Gateway to be used with Lift Wing. Specifically, when the patch is tested and rolled out Lift Wing will effectively be silently soft-launched on the API Gateway, making over 100 machine learning models available to everyone. The timeline for the patch being pushed to production is a few days.
- After the patch is released I will start publishing some tutorials on getting started using Lift Wing and will ask folks both inside WMF and the community to start experimenting to help find bad user experiences and technical bugs.
- Add-A-Link
- Steady progress on the Add-A-Link models. Kevin continues to train and deploy new models while evaluating their performance.
- Model Cards
- We are having weekly standup meetings on model cards as we start to make them. We should start with the first of the model cards published in the next two weeks.
- Lift Wing
- The current focus of the Lift Wing work is on model performance and the k8s 1.23 upgrade.
- Model Performance: Some of the larger models we are currently working with the Research team on are ~4GB loaded, which is causing prediction times to be over ten seconds. This is obviously too slow for any real-world case and we are exploring which out of a wide variety of strategies for improvement is best, from breaking the large model into smaller ones, optimizing the structure of the models, increasing the number of pods, etc.
- K8s 1.23 update: Luca and Yannis are working through a list of tasks as part of the 1.23 update. We are making solid progress but it is also a major task.
- DSE Cluster
- Work has now started on tackling what we have called the “Kerbarrier”, which is the fact that Kubernetes and Hadoop use very different security models. Kubernetes uses a certificate-based approach while Hadoop uses a symmetric key cryptographic approach (called Kerberos). Building a way of bridging the gap between these two approaches so nodes on clusters can access HDFS has been a major challenge we have to know we would need to solve eventually and one of the reasons for starting the DSE Cluster experiment.
It is Thanksgiving week for me so a smaller ML team weekly
-Benthos work is on hold. We’ve been experimenting with Benthos as a lightweight tool to stream model prediction scores to the larger event stream. It looks like it would work great, but we are putting the work on hold. The Data Engineering team is working on Flink, which would solve all the functionality we were thinking of using Benthos for. Flink isn’t ready yet, however, based on our timelines we can hold off having a streaming solution while the Data Engineering team gets Flink ready and then use that. If that doesn’t work we can always fall back to Benthos.
- We are deploying some brand new models into production as part of an agile development process, specifically, Revert Risk (language-agnostic prediction an edit is reverted) and Outlink Topic (language-agnostic prediction of an article’s topic). I’ll talk about these models more in the future, but for now, I wanted use them to highlight the improvements Lift Wing has created in model deployment.
Previously, deploying models like these on ORES would take a few days for each model. Okay, but lots of room for improvement. With Lift Wing, deploying these models takes less than an hour:
- Upload new model to Thanos Swift. (~10min)
- File a patch to deployment charts to update STORAGE_URI, wait for ML SRE +2 and merge (~10min if ML SRE is available)
- Deploy to staging (ml-staging-codfw) and test the model. (~10min)
- Deploy to production (ml-serve-eqiad &ml-serve-codfw) and test the model (~20min)
I have let writing this weekly update slide and my apologies for that. Here is this week's weekly update and I will make an effort keep posting these since folks seem to find them useful.
- NLLB-200 model
- Context: Currently the Content Translation Tool uses Meta’s NLLB-200 model for translation between smaller languages. The model is already in production. However, the model is currently hosted on Meta’s AWS account and we have been informed that hosting will end Jan 1st. The goal has been to migrate the NLLB-200 model onto WMF’s AWS account before Jan 1st to prevent any loss of service in the Content Translation Tool.
- We have the model working on WMF’s AWS Sagemaker. We can hit it and get a prediction. It isn’t MVP yet, but it proves we can do it. Now it is about configuring things correctly and connecting the Content Translation Tool to it. We are on track for making the deadline of Jan 1st.
- I have purposely separated the work on the NLLB-200 model into two parts: 1) resolving the crisis of having the model on Meta’s AWS account when that account is going to end in 45 days and 2) the ongoing maintenance, development, and support of both the AWS instance and NLLB specifically. The benefit is that we were able to move fast and are currently on track to resolve the crisis before the deadline. The cost is that there are important conversations about supporting this model and supporting AWS that we aren’t having, but will eventually have to be worked out.
- There is also the issue of cost, since we don't use AWS as a regular course of work, we don't have large budget for it. I'm monitoring the cost and we will see how things go.
- It is worth noting that moving the NLLB-200 model off AWS will spark a larger conversation around the WMF’s policy towards open source. WMF has long used AMD GPUs because their drivers and software are open sources. However, many large models, including NLLB-200 at best assume but at worst require NVIDIA GPUs, which are only partially open source.
- AWS Gateway
- Context: There has been a plan for over two years for an API Gateway (api.wikimedia.org) where the public and other users can have access to all the APIs provided by WMF. We are working on connecting Lift Wing to that API Gateway as a prerequisite for an MVP launch.
- This is moving forward, but slowly. Notably, Tobias had to devote 50% of the time he was working on the AWS Gateway with Hugh to work on the NLLB-200 model.
- Add-A-Link
- Context: As part of the Structured Tasks project in the Product Department, Add-A-Link uses machine learning to recommend easy edits to new editors to make the onboarding process easier and more mobile-friendly.
- 6th round of model training has started, making it about ~120 models trained out of ~300. The models are live and in production, but not on Lift Wing. The reason was that the project started before Lift Wing wasn't active. However, migrating the models to Lift Wing and decommissioning the current model serving system is in our plans for the future.
- Model Cards
- Context: As the first step in our efforts to be a best practice public example of applied ethical ML, we are creating a wiki model card for every model hosted on Lift Wing. We have been working on a proof of concept for a few months and are now starting on rolling out model cards into production.
- We had a kickoff meeting for working on production model cards last week. Currently, the team (Chris, Hal, Issac, and Kevin) have been discussing some of the practicalities of the model card design (i.e. do we even need programmatic content before models are trained on Train Wing?)
- Following agile, the current step is for Kevin to try to make one model card and we’ll discuss and iterate on the card next week.
- Lift Wing
- Context: Lift Wing is the Kubernetes cluster for hosting and serving production machine learning models at WMF. It is close to an MVP launch.
- Work continues on experimenting with using Benthos for streaming Lift Wing model predictions to the Eventgate. Specifically, working with the Observability team on monitoring the application.
- DSE Cluster
- Context: The DSE Cluster is an experiment with a cross-team shared cluster between Machine Learning and Data Engineering. The goal is to benefit from economies of scale and cross-team experience by building a single cluster that both hosts machine learning and data engineering tasks.
- Two weeks ago we were at a decision point: does the DSE cluster fork from the greater SRE k8s processes and systems so it can upgrade from 1.16 to 1.23. We’ve made that decision, the DSE Cluster is not going to fork. The cost of the fork is too high for any real benefits. Instead, the DSE Cluster will work with WMF’s other teams with clusters to upgrade to 1.23 together. Luca and Janis are leading this effort.
- Importantly and more broadly, a k8s special interest group (SIG) has been created with the goal of coordinating and organizing cross-team k8s efforts across the Foundation. This group met yesterday and made a number of decisions on the structure of the group and its scope.