Page MenuHomePhabricator

ores-beta.wmflabs.org is unreachable
Closed, ResolvedPublic

Description

This URL should load the beta version of ORES deployments. Currently it spins. It could be that the wsgi service is not responding.

Event Timeline

Mentioned in SAL (#wikimedia-releng) [2021-04-17T07:23:56Z] <Majavah> restart uwsgi-ores on deployment-ores01 for T280420

I tried to restart it, without success:

Apr 17 07:24:18 deployment-ores01 systemd[1]: uwsgi-ores.service: Failed with result 'timeout'.

killed the uwsgi process and restarted, but it seems that any http query to port 8081 hangs. I see the following in the logs:

2021-04-17 07:24:20,082 WARNING ores.scoring_context: Loading model arwiki_goodfaith with sub-process
2021-04-17 07:24:20,194 WARNING revscoring.scoring.environment: Differences between the current environment and the environment in which the model was constructed environment were detected:
 - revscoring_version '2.8.0' mismatch with original environment '2.8.2'
 - python_build ('default', 'Sep 27 2018 17:25:39') mismatch with original environment ('default', 'Apr  5 2021 09:00:41')
 - version '#1 SMP Debian 4.9.189-3+deb9u1 (2019-09-20)' mismatch with original environment '#1 SMP Debian 4.19.171-2~deb9u1 (2021-02-08)'
 - release '4.9.0-11-amd64' mismatch with original environment '4.19.0-0.bpo.14-amd64'
 - platform 'Linux-4.9.0-11-amd64-x86_64-with-debian-9.12' mismatch with original environment 'Linux-4.19.0-0.bpo.14-amd64-x86_64-with-debian-9.3'

Maybe not releated, but worth to note :)

Celery fails with:

Apr 13 17:31:19 deployment-ores01 systemd[1]: Started Celery workers.
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]: Process Process-30:
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]: Traceback (most recent call last):
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]:   File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]:     self.run()
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]:   File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]:     self._target(*self._args, **self._kwargs)
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]:   File "/srv/deployment/ores/deploy-cache/revs/257a349d02347537c1cbb5d6a4a367ccaf08a3cb/ores/scoring_context.py", line 278, in load_model_and_queue
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]:     model = Model.from_config(config, key)
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]:   File "/srv/deployment/ores/deploy-cache/revs/257a349d02347537c1cbb5d6a4a367ccaf08a3cb/venv/lib/python3.5/site-packages/revscoring/scoring/models/model.py", line 131, in from_config
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]:     return Class.load(stream)
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]:   File "/srv/deployment/ores/deploy-cache/revs/257a349d02347537c1cbb5d6a4a367ccaf08a3cb/venv/lib/python3.5/site-packages/revscoring/scoring/models/model.py", line 104, in load
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]:     model = pickle.load(f)
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]:   File "/srv/deployment/ores/deploy-cache/revs/257a349d02347537c1cbb5d6a4a367ccaf08a3cb/drafttopic/feature_lists/euwiki.py", line 7, in <module>
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]:     filename="euwiki-20200501-learned_vectors.50_cell.10k.kv", mmap='r')
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]:   File "/srv/deployment/ores/deploy-cache/revs/257a349d02347537c1cbb5d6a4a367ccaf08a3cb/venv/lib/python3.5/site-packages/revscoring/datasources/meta/vectorizers.py", line 80, in load_gensim_kv
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]:     speficies file path of the binary")
Apr 13 17:31:43 deployment-ores01 celery-ores-worker[23526]: FileNotFoundError: Please make sure that 'filename'                                     specifies the word vector binary name                                     in default search paths or 'path'                                     speficies file path of the binary
Apr 17 10:46:58 deployment-ores01 systemd[1]: Stopping Celery workers...
Apr 17 10:46:58 deployment-ores01 systemd[1]: Stopped Celery workers.
Apr 17 10:46:58 deployment-ores01 systemd[1]: Started Celery workers.
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]: Process Process-31:
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]: Traceback (most recent call last):
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]:   File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]:     self.run()
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]:   File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]:     self._target(*self._args, **self._kwargs)
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]:   File "/srv/deployment/ores/deploy-cache/revs/257a349d02347537c1cbb5d6a4a367ccaf08a3cb/ores/scoring_context.py", line 278, in load_model_and_queue
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]:     model = Model.from_config(config, key)
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]:   File "/srv/deployment/ores/deploy-cache/revs/257a349d02347537c1cbb5d6a4a367ccaf08a3cb/venv/lib/python3.5/site-packages/revscoring/scoring/models/model.py", line 131, in from_config
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]:     return Class.load(stream)
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]:   File "/srv/deployment/ores/deploy-cache/revs/257a349d02347537c1cbb5d6a4a367ccaf08a3cb/venv/lib/python3.5/site-packages/revscoring/scoring/models/model.py", line 104, in load
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]:     model = pickle.load(f)
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]:   File "/srv/deployment/ores/deploy-cache/revs/257a349d02347537c1cbb5d6a4a367ccaf08a3cb/drafttopic/feature_lists/euwiki.py", line 7, in <module>
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]:     filename="euwiki-20200501-learned_vectors.50_cell.10k.kv", mmap='r')
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]:   File "/srv/deployment/ores/deploy-cache/revs/257a349d02347537c1cbb5d6a4a367ccaf08a3cb/venv/lib/python3.5/site-packages/revscoring/datasources/meta/vectorizers.py", line 80, in load_gensim_kv
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]:     speficies file path of the binary")
Apr 17 10:47:24 deployment-ores01 celery-ores-worker[10152]: FileNotFoundError: Please make sure that 'filename'                                     specifies the word vector binary name                                     in default search paths or 'path'

Also this is the status of uwsgi and celery after the restart:

www-data 10152  1.9 11.3 1585260 923940 ?      Ss   10:46   0:10 /srv/deployment/ores/deploy-cache/revs/257a349d02347537c1cbb5d6a4a367ccaf08a3cb/venv/bin/python3 /srv/deployment/ores/deploy/venv/bin/celery worker --app ores_c
elery.application --loglevel ERROR
www-data 10237  0.2  0.0      0     0 ?        Z    10:47   0:01  \_ [celery] <defunct>
www-data 10338  1.0  3.4 937220 283452 ?       Ss   10:49   0:04 /usr/bin/uwsgi --die-on-term --ini /etc/uwsgi/apps-enabled/ores.ini
www-data 10416  0.3  0.0      0     0 ?        Z    10:49   0:01  \_ [uwsgi] <defunct>

@Halfak can it be something related to your last change? Maybe it is missing something?

elukey@deployment-ores01:/srv/deployment/ores/deploy$ sudo find -name euwiki*
./submodules/articlequality/articlequality/feature_lists/euwiki.py
./submodules/articlequality/model_info/euwiki.wp10.md
./submodules/articlequality/tuning_reports/euwiki.wp10.md
./submodules/articlequality/models/euwiki.wp10.random_forest.model
./submodules/drafttopic/model_info/euwiki.articletopic.md
./submodules/drafttopic/model_info/euwiki.drafttopic.md
./submodules/drafttopic/drafttopic/feature_lists/euwiki.py
./submodules/drafttopic/models/euwiki.drafttopic.gradient_boosting.model
./submodules/drafttopic/models/euwiki.articletopic.gradient_boosting.model
./submodules/assets/word2vec/euwiki-20201201-learned_vectors.50_cell.10k.kv   <====== this is not "euwiki-20200501-learned_vectors.50_cell.10k.kv"

Aha! It does seem like there is a mismatch here. I'm not sure why it appears that the submodules are not being updated. That might be a red herring. This code and these assets should be in alignment and they are not. I'll go digging. Thanks @elukey

I confirmed that some code was not updated for these models and that is causing the issue. I have a change in progress that should resolve the issue. I'd like to keep this task open until we can get ores-beta back online.

elukey claimed this task.

Fixed with a deployment, see T278723