Question about deletion criteria
Hi! I recently discovered Wikidata:Requests for permissions/Bot/Dexbot 13
and I had a question: How does it handle the case of an item where a user deletes the claims and sitelinks from an item, but the former sitelinks still exist on the client project?
Hi, no it doesn't delete unless it's a sitelink removal by deletion. Have you seen the bot doing otherwise?
No. I just learnt about this bot as part of processing a request for undeletion, and I couldn't see any test for that specific condition in the code, just:
That code is extremely old. I just double checked and the new one checks for existence of the page and if it exists, it skips deletion.
OK, thanks. Good to know.
I don't know how to find the current code for your bot, so I was relying on the link in the RFP.
You are absolutely right. I just updated the gist.
Thanks. One case that especially concerns me is where an editor finds a duplicate item and, instead of merging, blanks it, transferring the sitelinks to another item. I'm happy to know that such items aren't quietly deleted by a bot, but instead have some change of being found and fixed.
Correct references through bot
Hi! I remember you run a very efficient bot and in the past I asked you some fixes which were very efficient. Now I mostly do fixes through QuickStatements, which is a very good tool, but isn't still able to fix references leaving the statements unchanged. I sometimes notice big groups of items (thousands and tens of thousands) having references which are imprecise or wrong and I don't know who to ask for correction. Could I slowly report you some notable cases of references to be fixed, so that we can slowly deal with them through your bot? I think it is crucial for our data quality having references which are exactly correct, whilst at the moment this fact often doesn't happen. Thank you very much in advance!
Hey sure. I try to write something but I want to know the exact framework so I don't need write similar code every time, so I would write something general and use that every time.
Can you give me a couple of examples?
OK, great! So, here is a detailed panoramic of the situation. I see three main types of errors to be corrected:
For whichever question, ask me! When you have the bot ready, please start with some test-edits, so that I can have a look. Thank you very much in advance!
Thanks. I try to tackle it next weekend. This weekend I'm drowning in something personal.
Hi! Any updates? Obviously no urgence, as I said - just a little message in order not to forget myself the issue :)
Hey, sorry. I have been doing a million things and have been drowning in work but will get to it ASAP. I took some vacation for volunteer work :)
But it's on my radar, always has been. Don't worry.
Again. I have not forgotten about this. One day I will get it done. It's just there are so many things to do :(
Okay, one part is done: The bot now takes a SPARQL query and removes references that are exact duplicates. here's an example
. I will write more in next weekends.
Very good, thanks!
And the second type Let me know if we want to clean up more. First type is very similar to the second one. So consider that done as well. Let's do this then.
I'm doing them one by one because there's so many of them and for example the P902 took a day to finish. The P863 is underway
Right now I'm cleaning up the third part of type two () but I will get to others soon.
Very good P4459!
Done now, Gosh it took days :))) Let me fix type one now.
Can you give me a SPARQL query for the first type? I'm not good at queries involving refs :(
The third type is not that hard. I thought it's done. Let me double check and clean the mess.
Re-reading what you wrote for the third type a couple of times and now I get what you want but it's pretty complex. I'll try to see what I can do about it next weekend.
Hi! When you have time, could you have a look at these three?
They are probably less difficult than point 3 above, which I understand is quite difficult. See you soon!
Hey, Sure. Just give me a week or two.
Wrote something that can cleanup duplicates and subsets (e.g. if the reference is fully covered in another reference and more). I already started the bot and it's cleaning. Will continue but I don't think I can clean up more than that as it gets really really complicated.
Perfect! When it finishes, could you schedule it as periodic maintenance (e.g. once a month)? This would assure us the stability of the quality.
It works based on SPARQL queries. Which queries you want me to run regularly?
Maybe after the cleanup Dexbot is doing now it won't be necessary anymore; I think that these redundant references have been inserted due to an error by Reinheitsgebot, so maybe the error has been solved and the cases won't surge again. Maybe, however, I will give you other queries (of third type) in the future if I find similar problems with different properties.
Hi, you mean the Czech part? I just fixed it and running it again. Everything else has been for really long time now.
Hi, thanks for the talk yesterday on ORES, and I hope you didn't mind my questions/comments. :-)
I've been working on adding interwikis to new articles for some Wikipedias (and also Commons!) to Wikidata items, but I'm wondering if there are better ways of doing it (currently I just auto-search for matches, and manually say yes/no to add them, within a python script). I've just proposed a potential Outreachy project to improve the current codes I'm using, see https://phabricator.wikimedia.org/T290718
. As part of that, I'm wondering if machine learning might be applicable here - it feels like there's a great training set with all of the other articles that already have sitelinks, which could then be used to assess how good potential matches are, and maybe the highest confidence matches could then be added automatically, so only lower confidence ones need manual checking. I know of machine learning, though, but not how to actually do it!
If you think this might be possible, would you be interested in being a co-mentor for the Outreachy project, and we can make it a bit more ML-focused?
Thanks. it is a great idea and I added it to my list of work to be used by AI. There are several ways to attack the problem but definitely building a machine learning system would help. I suggest not to add it to this outreachy work but part of that outreachy work would be to make the code pluggable, so later a service/API can be built and then your code easily use their recommendations. How does that sound? cc @Lydia Pintscher (WMDE)
Thanks for the reply. I can't see a clear way to make the code 'pluggable' - I think that if we're going to use ML, then it has to be built in, or perhaps there has to be a clear way to query it, with a yes/no answer, that maybe could be added as a proceed/stop check.
Two questions about archiving
Hi Ladsgroup, I have two questions about archiving sections using ckb:بەکارھێنەر:Dexbot/Archivebot
. First, As far as I understand, the bot archives level two sections only (or by default). Is there a way to determine the level of sections ourselves? For example, I want to archive level four for this page
. Second, isn't this template
working on the bot to delay archive time? Thank you in advance!
Deletion on fa.wikipedia ineffective on Wikidata
Hi! I've just noticed in the page history of Theophilos of Athens (Q12874082)
that the article from fa.wikipedia got deleted in 2018 but remained present on Wikidata. Is it a know problem, or has it been already solved? Can a bot clear eventual similar cases? Thank you as always!
Hey, stuff like that can happen for various reasons. Network partition, lag in the database, bugs in the code, permission issues (the IP being blocked in Wikidata maybe?), etc. If it happens all the time, then we should look into it but there will be always cases that fail. Maybe we should have a way to spot and clean up.
Backup of the sanctioned user (Mr,p_balçi) with both an account and an IP
Hi, the mentioned user, who was expelled from Wikipedia, has once again attempted to sabotage articles and spare games with multiple IPs and accounts.
Some of her spares were blocked in the same month and many others are not . He has bypassed the block more than ten times. I request that his main account be closed endlessly and globally.
My guess is that if inspected, a lot more will be discovered.
In several cases, it has been blocked due to obscenity.
Several accounts under her have been closed globally (due to obscenity) :
Please handle thanks
Hello, please mention these cases in fawiki's وپ:دبک. It doesn't belong here.
Hello, you are a spare user.@ZEP55
is the user:@Quakewoody
you blocked. Please close your spare account.
Hello, what? I joined recently and I'm not edited here.
آیا این کاربر چنین اجازه ای دارد؟
در حالی که این زاپاس خودش بازرسی پرونده دارد
و حداقل در موارد زیر قطعیت یافته که زاپاسهای او بوده؟
و چرا بعد از سه ماه به خرابکاری های او رسیدگی نمی شود؟ همین امروز چند زاپاس جدید از او بی پایان بسته شد
این یک نمونه
آیا عزمی برای پایان اخلالگری های او نیست؟
حقیقتا من جز شما کاربری را پرتلاش و مورد اعتماد ندیدم آقایان اعصاب کاربران برایشان اهمیتی ندارد
این فرد تعداد زیادی ترول سراسری و غیرسراسری بسته شده دارد. به طور مکرر به کاربران شمالی، کرد، ارمنی و غیره توهین می کند و با آی پی اخلالگری می کند و در ویکی های فرعی به کاربران توهین می کند.
امکانش هست به بازرسی او ورود کنید سپاس
Solomon Hill (Ilam Province) (Q15975213)
Thanks. It looks complicated. It's the first Iran's national heritage and used to be in Iran's borders (or parts of it still is? I check) but now it's mostly in Iraq. I ask people who know better than I do.
People say its country should be Iraq but technically still an Iranian national heritage that fell over to Iraq after changes in borders mid-20th century.
The id seems to be invalid? Should be at least two digits according to the constraint on Iranian National Heritage registration number (P1369)
. If it used to be in Iran you should add the Iran info too, qualify both statements with start/end time and make the current one preferred.
Done. I think the constraint regex is wrong. Changed it to [0-9] so it accepts 01 too (maybe it should accept one digit instead? I don't know).
I think you messed up the ranks. Was your intention to state that this item was never in Iran? I currently completely ignore the one digit entries on Commons because it will be mostly mistakes.
Fixed the rank
Can you explain to me why you gave them an indef block? I seem to miss a recent discussion that might be considered harassment/intimidating behaviour. But I am probably missing something?
Hey, the one that triggered it was Wikidata:Requests_for_deletions/Archive/2021/02/20#Q105443300
(the anti-LGBT behavior) but two other reasons: The user is indef blocked in five other wikis as well for harassment + the user has a history of being blocked for vandalism/edit warring in here. If you feel it's too much, feel free to reduce it to some other time.
Okay, that is malicious indeed. Indef is long, but let they themselves start an unblocking procedure if they want to proceed here.
View edit history of this page.