Wikilabels

From Wikitech

Wikilabels is one of stand-alone services that is being used gather data from users to build AI models for ORES and it's being maintained by Wikimedia Scoring Platform team. It's currently hosted on Nova_Resource:wikilabels (Cloud VPS)

Technical details

  • There are several instances:
    • wikilabels-03.wikilabels.eqiad1.wikimedia.cloud: The main node and uses Postgresql (wikilabels-database-02) to work. It's accessible from labels.wmflabs.org
    • wikilabels-staging-02.wikilabels.eqiad1.wikimedia.cloud: The staging node, uses similar setup and accessible from labels-staging.wmflabs.org
    • wikilabels-backups.wikilabels.eqiad1.wikimedia.cloud: The nodes that keeps daily database backups of the main node. Accessible from wikilabels-dumps.wmflabs.org
    • wikilabels-database-02.wikilabels.eqiad1.wikimedia.cloud: Postgres database node that is the backing store for the uwsgi applications.

Initialize a VM

From your local laptop/workstation, checkout the deploy repository and make sure that you can ssh to the target cloud VPS instance. Then create a Python venv and install fabric3. This will allow you to do the following:

fab initialize_server:hosts="wikilabels-03.wikilabels.eqiad1.wikimedia.cloud"

You also need to place OAuth keys in a specific file (a random key is good):

elukey@wikilabels-03:~$ cat /srv/wikilabels/config/config/99-oauth.yaml
# These creditials are intended to be used when testing the local, development
# version of Wiki Labels.  Do not use these credentials in a production
# environment.  They will redirect users to localhost:8080 expecting to find
# Wiki Labels there.
oauth:
  key: xxx
  secret: xxxx

You'll also need to create a file named 98-database.yaml with the following content:

# These credentials are intended to be used on labels.wmflabs.org.  They are
# sensitive and should never be commited to a public repository.
database:
  user: u_wikilabels
  dbname: u_wikilabels
  password: REDACTED

Deployment guide

After things getting merged in the main repo. You need to update the deploy repo.

cd wikilabels-wmflabs-deploy/
git pull
cd submodules/wikilabels
git pull
cd ../..
git add wikilabels
git commit

Then write something like "Bumping wikilabels to HEAD"

git push
fab stage

Now it's in the staging node. log it (using !log wikilabels in #wikimedia-cloud channel in IRC) Test it and if it works fine move to prod

git checkout deploy
git rebase origin/master
git push -f origin deploy
fab deploy

And log it!

A new labeling campaign

You need to first introduce a new campaign:

$ ssh wikilabels-03.eqiad1.wikimedia.cloud
ladsgroup@wikilabels-02$ cd /srv/wikilabels/config
ladsgroup@wikilabels-02:/srv/wikilabels/config$ sudo -u www-data /srv/wikilabels/venv/bin/wikilabels new_campaign wikidatawiki "Edit quality (5k, 2018)" damaging_and_goodfaith DiffToPrevious 1 50
{'form': 'damaging_and_goodfaith', 'id': 38, 'view': 'DiffToPrevious', 'active': True, 'name': 'Edit quality (5k, 2018)', 'tasks_per_assignment': 50, 'labels_per_task': 1, 'wiki': 'wikidatawiki', 'info_url': None, 'created': datetime.datetime(2018, 7, 11, 13, 39, 54, 282569)}

Note the id (38 in this case). And now you need to load the data into the campaign. Download the file in the home directory:

ladsgroup@wikilabels-03:/srv/wikilabels/config$ less ~/wikidatawiki.autolabeled_revisions.125k_2018.review.json | sudo -u www-data ../venv/bin/wikilabels task_inserts 38

Restarting the service

Any time the connection PostgreSQL is broken, we need to restart the wikilabels service:

service uwsgi-wikilabels-web restart

Dumping and restoring or Migrating the database

The uwsgi app on wikilabels-03 uses a Postgres database as a backing store. This used to be a clouddb instance, but as of November 2022, is a separate VM, wikilabels-database-02.

Database credentials

The uwsgi app keeps its database configuration in two files, /srv/wikilabels/config/default-db-config.yaml and /srv/wikilabels/config/config/98-database.yaml.

The first file contains host information and user credentials, though the latter are unused. The actual username, password and database (inside of Postgres) to use are in the second file, 98-database.yaml.

Dumping data

To dump the data in an easily restored format, use the pg_dump tool. You can run this on either wikilabels-03 or wikilabels-database-02:

$ pg_dump -U u_wikilabels -h wikilabels-database-02 u_wikilabels -f pg_dump-$(date -Is).sql

Note that the database name is u_wikilabels, just like the user name.

The above command will prompt you for the password of the u_wikilabels user, and then dump the database content as a series of SQL commands to stdout, and we redirect that to a timestamped file. The total amount of data is about 100MB.

Restoring data

To restore the saved data, copy the file to a convenient host (the database host itself is usually easiest). If necessary, create the user and database on the new Postgres instance (as root):

$ sudo -u postgres createuser u_wikilabels
$ sudo -u postgres psql
psql (13.8 (Debian 13.8-0+deb11u1))
Type "help" for help.

postgres=# \password u_wikilabels
Enter new password for user "u_wikilabels": 
Enter it again: 
postgres=# exit
$ sudo -u postgres createdb -O u_wikilabels u_wikilabels

This creates the u_wikilabels user, sets their password, and then creates a new database also called u_wikilabels owned by the just-created user.

You can then restore the saved data by piping the dump file into an appropriate psql connection command:

psql -h localhost -W -d u_wikilabels  -U u_wikilabels  < pg_dump-[timestamp].sql

This will display some messages on stdout about what SQL commands are run (SET, CREATE TABLE etc).

Once this is done, the uwsgi application can be pointed at the new DB by editing the host setting /srv/wikilabels/config/default-db-config.yaml and restarting the application.

Incidents

See also