Commons:Bots/Requests/SchlurcherBot8

SchlurcherBot (talk · contribs)

Operator: Schlurcher (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: Add structured data based on information provided on file description page according to Commons:Structured data/Modeling

Examples
  • Adding P7482 / Q66458942 (own work by original uploader) to files that were uploaded by the author and declared as own work
Out of scope:
  • Any information that cannot be derived from the Commons file description page (like information linked to the picture ID in Wikidata)
  • Copying descriptions to captions (due to the discussion on Village Pump regarding this)

Automatic or manually assisted: Automatic

Edit type (e.g. Continuous, daily, one time run): Batches based on prepared lists

Maximum edit rate (e.g. edits per minute): 30

Bot flag requested: (Y/N): N (Bot has flag already)

Programming language(s): Bash + QuickStatements, later Pywikibot

Schlurcher (talk) 14:22, 6 January 2020 (UTC)[reply]

Discussion

This request is motivated by a discussion on User_talk:JarektBot#original_creation_by_uploader. The intended task is to add structured data based on information provided on file description page. The first task envisoned is to add P7482 / Q66458942 (own work by original uploader) to all files that fulfill the following requirements:

  1. File does not have property P7482 / Q66458942 already
  2. File uses template {{Own}} in the source field of the information template
  3. Username of the uploader is equal to or par of the username given in author field of the information template

Currently, I will use a bash script that queries the Commons API to check for all 3 conditions. Edits will be added to a list that can be processed through QuickStatements batch runs. A first batch run was performed under my username: Bot Test Run. Moving forward the edits are planned under the bot username (with flag). Further structured data statements are expected to be added. The task is similar to the recent actions of BotMultichill (talk · contribs) and BotMultichillT (talk · contribs). However, Multichill (talk · contribs) works on files that use both {{Own}} and {{Self}} (without checking for condition number 3 above). So some broader coverage is expected with this task. The task is expected to be broadned over time to include structured data derived from the author, source, data and license infromation. Once structured data on Commons is properly implemented in Pywikibot, the bot might switch to this framework (as used for the other tasks of the bot). --Schlurcher (talk) 14:22, 6 January 2020 (UTC)[reply]

Please make a test run. --EugeneZelenko (talk) 15:45, 6 January 2020 (UTC)[reply]
I have completed the test run on my main account. Please see here: Bot Test Run. I need to get autoconfirmed status with my bot on wikidata in order to be able to use Quick Statements on Commons. I'll soon complete a test run on the bot account. --Schlurcher (talk) 18:56, 6 January 2020 (UTC)[reply]
Can you share a link to the source code like this? Multichill (talk) 20:11, 6 January 2020 (UTC)[reply]
I do not have a link to the source code. The bash script runs on a single file name and perform the following actions (sequentially with check for no errors):
  1. Contact Commons API to get file desciption information in XML format. Extract Uploader, Author and Source.
  2. Contact CommonsEntities API to get sturcutred data and media ID in JSON format. Extract media ID and existing statement information
  3. Check and add statements for QS:
  1. Check that there is no existing statement
  2. Check that condition described in this request are fulfilled.
  3. Wirite out statement to be used in QS
This way I can repeat the script for each file of concern. The file list was downloaded from: Wikimedia Commons Dumps (all file names). Hope this helps. --Schlurcher (talk) 14:06, 8 January 2020 (UTC)[reply]
In case your bot can do more than 30/min it would be great to increase the rate, since at 30/minute it would take about 2 years to do 27M files. Also I would also suggest to add the creator (P170) as well, since you already know who that user is. There is a page Commons:Structured data/Modeling/Author discussing ways to model the author, but there is more activity on the talk page. I proposed a new property author's wikimedia username to make adding author information easier, but the proposal was rejected. Multichill (talk · contribs) favors a different scheme, which we should probably adopt. However we should probably make sure, the discussion on that page reaches some sort of consensus and that Commons:Structured data/Modeling/Author is updated. When I was adding some P7482s, the discussion was still ongoing. --Jarekt (talk) 20:34, 6 January 2020 (UTC)[reply]
@Jarekt: Thanks for highlighting this. I would prefer that the discussion reaches some sort of consensus before we start rolling this out. So, for now, I prioritized a date implementation, which I did map to inception (P571) --Schlurcher (talk) 15:11, 8 January 2020 (UTC)[reply]
Commons:Structured data/Modeling/Date is a bit more clear, the only potential issue is that inception (P571) and wikidata does not handle dates with hours, minutes and seconds (HH:MM:SS), as the highest precision is day. The string encoding date looks like "+2020-01-08T00:00:00Z" and one could add HH:MM:SS to it, but since the string always ends with "Z" it suggests that the time is in UTC timezone (the time in London), while on Commons HH:MM:SS is assumed to be in unspecified local timezone. So for time being I would only work with dates that do not specify time of the day. See Help:Dates. --Jarekt (talk) 16:59, 8 January 2020 (UTC)[reply]
Thanks. I have implemented dates with a precision of day, month and year. Dates with precision to the hours will be skipped. I have also added Commons:Structured data/Modeling to the task description for clarity. --Schlurcher (talk) 19:27, 8 January 2020 (UTC)[reply]

@EugeneZelenko: I have performed a test run on the bot account. Results are here: [1] --Schlurcher (talk) 19:21, 8 January 2020 (UTC)[reply]

Looks OK for me. Pity that time from EXIF could not be used. --EugeneZelenko (talk) 15:01, 9 January 2020 (UTC)[reply]

If there are no objections, I think task should be approved. --EugeneZelenko (talk) 16:17, 15 January 2020 (UTC)[reply]

@EugeneZelenko: I did compare my result with the results from BotMultichillT (talk · contribs) as well as with his code. As far as I see we will add the same information, so there should be no conflict between the bots. We use slightly different page generators, so there should be some synergy. My bot will not add author information, as this information is currently not correctly displayed on the structured data tab in commons. The qualifiers are currently not displayed which makes this difficult to read. See File:Coucouron - église 01.JPG as an example. The author information on structured data is blank even though BotMultichillT added this in the backend. So I leave this to BotMultichillT until it is fixed for Commons. Generally, I do agree and think that my bot is ready. --Schlurcher (talk) 08:35, 17 January 2020 (UTC)[reply]