ORES

From Wikitech

The ORES infrastructure is being deprecated in favor of Machine Learning/LiftWing, as described in Machine_Learning/Modernization. The Machine Learning team has the following high level timeline:

  1. Deploy all ORES/Revscoring models on Lift Wing.
    • Please check Machine Learning/LiftWing/Usage about how to query the new models (from the internal WMF network, from the outside Internet, and from the Wikimedia Cloud Infra).
    • All ORES models are currently being served by Lift Wing!
  2. Build a Kubernetes service called ores-legacy, that offers the same interface as https://ores.wikimedia.org/ but that calls Lift Wing in the background (more details in https://phabricator.wikimedia.org/T330414).
  3. Move the ORES Mediawiki Extension to Lift Wing (same interface and configuration but different backend called).
  4. Deprecation of the revision-score stream from https://stream.wikimedia.org/. We are going to provide streams related to single model scores, rather than a single one. If you need a particular stream for a specific model, please ping the Machine Learning team.
    • DONE
  5. Move https://ores.wikimedia.org/ to https://ores-legacy.wikimedia.org (via DNS CNAME) so users that have not yet migrated to Lift Wing will be transparently migrated.
    • DONE
  6. Migrate all clients to the Lift Wing API. This is a long term process, and it will likely take several months.
    • IN PROGRESS
  7. Decommission https://ores-legacy.wikimedia.org. After this deadline all clients will need to use Lift Wing, no more ores-like APIs will be available.
    • Scheduled for 2024 (we don't have a clear timeline yet).

Machine Learning contacts

For any question/doubt/etc.. please reach out to us:

Bot/tool owners

If you are an owner of a tool that uses ORES (bots, dashboards, etc..) or if you have a service that depends on ORES and you have concerns / doubts / questions, we invite you to:

Example: migrating a Bot from ORES to Lift Wing

Let's imagine that AwesomeBot is a tool that runs on Toolforge, that listens to revision-create events and calls ORES for goodfaith and damaging scores for every revision id (taking appropriate actions afterwards).

The first thing to do is to read the link about differences between Lift Wing and ORES: Machine Learning/LiftWing/Usage#Differences using Lift Wing instead of ORES. We discover that our bot indeed uses a single call to retrieve multiple scores from ORES, so we'll have to adjust to retrieve the goodfaith and damaging scores separately (in two HTTP calls).

We then read the second link, and discover that the WMF Research Team released a new model called "Revert Risk language agnostic", a modern version of goodfaith and damaging combined (namely with a single score we can build our anti-vandalism tool). The model was trained with better and fresher data, plus is it more efficient and modern compared to the other ones. We suggest you to think about moving away from goodfaith and damaging if possible (they will be deprecated in the future).

At this point, we know that our bot can either make two calls to Lift Wing (to retrieve a goodfaith and damaging score for a rev-id) or a single one (revertrisk score), but we have to figure out how to call Lift Wing. In the link Machine Learning/LiftWing/Usage there are all the information needed, that in our case are:

  • Since the bot runs on Toolforge, we'll have to use the external endpoint (API Gateway). Docs in https://api.wikimedia.org/wiki/Lift_Wing_API
  • Our bot is a low traffic one, so the restriction of 50k requests/hour fit the anonymous authentication use case. If we want to make more than those 50k, we'll have to request an authentication token following Machine Learning/LiftWing/Usage#Authentication
  • We are almost done! We can just change our HTTP calls in the bot code to call api.wikimedia.org instead of ores.wikimedia.org and the work is completed.

Streams deprecation

The revision-score stream (available via https://stream.wikimedia.org/) contains, for each revision-id of most of the wikis, a list of scores related to multiple models. For example, rev-id 123456 from enwiki will be associated with the scores from goodfaith, damaging, reverted, etc.. This way of doing thing is not great in terms of maintainability, since is makes it very hard to disentangle/deprecated/etc.. models from the stream. It makes also harder to track down users of specific models, since the consumers are very generic and they don't carry any indication of their data interest (for example, a client could consume the whole revision-score stream only to get goodfaith scores).

The Machine Learning team is planning to deprecate the revision-score stream, and to create smaller streams (one for each model) as requested from the community. Please reach out to us if you use the revision-score stream so we can figure out how to proceed!

Model deprecation

The following ORES revscoring-based models are also being deprecated:

  • editquality goodfaith
  • editquality damaging
  • editquality reverted

With "deprecated" we mean that the models will be available on Lift Wing for the time being, but the ML team is not going to improve them further (re-training, add support for new Wikis, new data labeling, etc..). We are not going to remove them from Lift Wing without a community consultation/approval.

As stated above, these models have been deployed to Lift Wing so they are available for the time being, but if you rely on them we suggest to follow up with the ML team in a Phabricator task (with the #Machine-Learning-Team project tag).

We'd like to move clients to more modern models (see below) as soon as possible. Tentative deadline: January 2025.

The Research team created a new family of models to replace the functionality of the editquality ones (goodfaith, damaging and reverted), calling them Revert Risk. The idea is to have a single score instead of multiple ones, and there are several reasons for that: for example, ORES relies on (relatively) small manually annotated data (which is good in terms of precision) that makes very difficult to retrain the models, add new languages and capture data/behavior drifts. The idea with Revert Risk models is to use revision reverts as  "implicit annotations", allowing us to train on large data and for all languages. Also, we noticed that there was a huge (inverse) correlation between goodfaith and damaging models, so basically people tend to think about goodfaith as 1-damaging. The Revert Risk models capture different signals, but the final intent is the same, so you can assume similar usage.

If we consider the damaging model as prediction for reverts (which makes sense, because damaging revisions must be reverted), Revert Risk is outperforming ORES in almost all scenarios. That being said, if we check revisions by revision, there may be cases where ORES captures certain vandalism that Revert Risk is still missing. We have built a version that solve that issue (the Multilingual Revert Risk), that relies on Large Language Models (same family of models used by ChatGPT), but the serving time for this still slow (1s as median), so we are working on making this faster and making the simpler model (called Language Agnostic Revert Risk) better on catching certain types of vandalism that we are currently missing.

To summarize: the ML and Research teams suggest to use the aforementioned Revert Risk models instead of the revscoring editquality ones for any current or future project in the Wikimedia community. If you want to migrate over to the new models, please create a task in Phabricator with the #Machine-Learning-Team project tag and we'll help you!

Where can I find revscoring model binaries?

The ML team collects all models deployed on Lift Wing in https://analytics.wikimedia.org/published/wmf-ml-models/

Guide to migrate from ORES to Lift Wing

In case you are migrating your client/application from ORES to Lift Wing the following may help you.

We could describe ORES as a service aggregator while Lift Wing holds Machine Learning microservices. This means that for a single ORES query we may require to perform multiple Lift Wing requests.

In the following table, on the left we have some requests made to ORES and on the right how the same data would be retrieved from Lift Wing. Both are made using curl. It is important to note that requests to ORES are made using the GET method, while in all Lift Wing requests we use POST with the post data including the revision id and any other additional data.

ORES Lift Wing
curl https://ores.wikimedia.org/v3/scores/enwiki/12345?models=goodfaith curl https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-goodfaith:predict -X POST -d '{"rev_id": 12345}'
curl 'https://ores.wikimedia.org/v3/scores/enwiki/12345?models=goodfaith|damaging' curl https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-goodfaith:predict -X POST -d '{"rev_id": 12345}'

curl https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-damaging:predict -X POST -d '{"rev_id": 12345}'

curl https://ores.wikimedia.org/v3/scores/enwiki/12345/goodfaith?features=True curl https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-goodfaith:predict -X POST -d '{"rev_id": 12345, "extended_output": "True"}'

Old ORES documentation

Manuals
Incidents
See also