Scripts used for analysis in T280152 Mitigate breaking changes from the new Wiki Replicas architecture and Wiki_Replicas_Cross-DB_Query_Data report.
Excuse my python, this is just getting the job done, not a production service.
Some files excluded from the repository for privacy reasons (like the original data). Reach out if you need it.
Generate an environment, and install dependencies before running the scripts.
source wikireplicas-queries-env/bin/activate
pip install -r requirements.txt
- filter_multi_from_distinct_user_queries.py
- how_many_multi_from_user_queries.py
- distinct_user_queries_with_stripping.py
- unique_queries_when_removing_literals.py
- get_users_info.py
- make_html_report.py
- make_wikitext_report.py
- make_csv_for_public_viewing.py
18758 (764 multi, 0.04797315691589915s per row)
Found 764 multi DB queries
Found 2937 multi DB queries from all 60007 queries
3858 unique out of 18758
169 unique multi DB queries out of 3858 unique queries
See joaquin/user-data.json
See joaquin/report.html
See joaquin/report.wiki
See joaquin/multiuserqueriesstrippedpublic.csv