Wikidata:Tools/ItemSubjector

Python console tool written by User:So9q with help from User:Ainali that helps add main subject (P921) to groups of items in a semi-automatic way. Source on github. It has since been rewritten as a webapp as Wikidata Topic Curator and use of the original console tool is discouraged.

Installation and setup edit

Installation instructions can be found on GitHub. The script can be run locally or in the PAWS web shell or better yet in the WCS Toolforge Kubernetes console. Authentication is by way of a bot password. It only needs to be granted the "Edit existing pages" right.

Impact edit

Total number of additions so far is currently unknown (because this query times out) but based on the edit counts of So9q and Jsamwrites in Wikiscan we have passed 1 million as of 13/10 2021.

Scientific articles edit

As of 2021-10-29 we are approaching 3M edits made with the tool in total, see [1] and [2]. You can also track the number of scholarly articles missing any subject which as of this writing is 25.7M down from 27M when the tool was made a few weeks ago. As of 2021-11-15 we passed below 25M.

The improvements are clearly visible in Scholia and after using the tool to improve common subjects like COVID-19 pandemic (Q81068910) (Scholia) WDQS times out because of the amount of articles linked to the subject. The author, So9q, is planning to test out a new faster and more optimized query engine like QLever in Toolforge as an alternative to the old and less optimized BlazeGraph during the fall of 2021.

As of 2021-02-25 we are at 24,040,800 articles missing main subject! That means the tool has now been used to improve at least 3M items, which corresponds to 11% of all 27M.
Update: As of 2022-05-31 down to 23,1 M scholarly articles without main subject (P921).

The results are clearly visible in Scholia which now has much better connections e.g. between medicines and articles and various subjects.

As of 2021-03-02 all ~350 essential medicine (Q35456) have been matched to ~900k scientific articles.

Scholarly journals edit

As of 2021-01-01 ItemSubjector can match together journals and a sparql subset of items. As of this date the number of journals without any P921 was 85,175 out of a total of 96,945 journals. Getting this to 0 is important because then we can improve ItemSubjector further to match subjects and articles coming from a subset of journals based on their P921. E.g. match "API" to articles from IT related journals only.