User talk:Citation bot

This is an old revision of this page, as edited by ClueBot III (talk | contribs) at 07:09, 30 July 2021 (Archiving 2 discussions to User talk:Citation bot/Archive 26. (BOT)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.


Latest comment: 2 years ago by BrownHairedGirl in topic Jobs being dropped again



Note that the bot's maintainer and assistants (Thing 1 and Thing 2), can go weeks without logging in to Wikipedia. The code is open source and interested parties are invited to assist with the operation and extension of the bot. Before reporting a bug, please note: Addition of DUPLICATE_xxx= to citation templates by this bot is a feature. When there are two identical parameters in a citation template, the bot renames one to DUPLICATE_xxx=. The bot is pointing out the problem with the template. The solution is to choose one of the two parameters and remove the other one, or to convert it to an appropriate parameter. A 503 error means that the bot is overloaded and you should try again later – wait at least an hour.

Or, for a faster response from the maintainers, submit a pull request with appropriate code fix on GitHub, if you can write the needed code.

Performance improvements needed

They are, and it seems to be related to Lighttpd parameters. AManWithNoPlan (talk) 19:12, 5 May 2021 (UTC)Reply

https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web/Lighttpd Look at PHP and the "Default_configuration" area that starts collapsed. AManWithNoPlan (talk) 19:16, 5 May 2021 (UTC)Reply

Template type

Status
new bug
Reported by
ATS (talk) 15:44, 29 May 2021 (UTC)Reply
We can't proceed until
Feedback from maintainers


Bot is incorrectly changing cite newspaper to cite news at Bianca Ryan and Cami Bradley, et al.

{{cite newspaper}} is just a typo catcher for {{cite news}}. newspaper is on 6500 pages and news is on 1.2 million pages. AManWithNoPlan (talk) 16:48, 29 May 2021 (UTC)Reply
Understood; however, if one of the params is |newspaper=, then the bot should not be changing the type. ATS (talk) 19:20, 29 May 2021 (UTC)Reply
I'm pretty sure that there is a general consensus to replace template redirects with canonical template names. I feel like that guidance used to exist at WP:BRINT or somewhere similar, and is the reason that Wikipedia:AutoWikiBrowser/Template redirects exists. Using canonical template names eases maintenance burdens and makes articles more consistent for later editors. – Jonesey95 (talk) 20:11, 29 May 2021 (UTC)Reply
I'm pretty sure WP:NOTBROKEN is relevant here. —David Eppstein (talk) 20:21, 29 May 2021 (UTC)Reply
Agreed, it's pretty damned annoying. Not as bad as accessdate→access-date, but quite irritating. Minor not-even-cosmetic crap like this and this really gets on my nerves. Please, make it stop. — JohnFromPinckney (talk / edits) 21:27, 29 May 2021 (UTC)Reply
Precisely. 🤪 ATS (talk) 22:20, 29 May 2021 (UTC)Reply
The bot was set to consider the edits to be non-cosmetic if there was a mix of cite news and newspaper, I have changed it to no longer do that going forward with newly submitted runs. Harmonizing citation styles is generally consider to be a good idea. AManWithNoPlan (talk) 00:33, 30 May 2021 (UTC)Reply
WP:COSMETICBOT is clear that harmonizing template names is a cosmetic change, because it (1) does not change the visible rendering, (2) does not change categories or search engine results, (3) does not change the categorization of problems needing the attention of editors, and (4) does not relate to egregiously bad html. Your opinions on what is or is not a good idea are irrelevant. These edits should not be done without other more substantive edits, and maybe not even then. —David Eppstein (talk) 07:19, 30 May 2021 (UTC)Reply

{{cite newspaper}} only exists to take care of pages that still link to the deprecated template. Is should not be used for new stuff. https://en.wikipedia.org/wiki/Wikipedia:Templates_for_deletion/Log/2007_October_13#Template%3ACite_newspaper AManWithNoPlan (talk) 14:14, 2 June 2021 (UTC)Reply

[Citation needed]. "Deprecated" when and by whom? The TFD you point to was for a different template, not for the redirect. —David Eppstein (talk) 15:24, 2 June 2021 (UTC)Reply
The redirect was put in soon after the deprecation and removing of the the cite newspaper template. AManWithNoPlan (talk) 16:13, 2 June 2021 (UTC)Reply
What deprecation? You still haven't provided any backing for your claim that the use of the redirect, added in 2008, is in any way deprecated. Pointing to a TFD about a different template with the same name, held prior to the creation of the redirect, is irrelevant. —David Eppstein (talk) 16:29, 2 June 2021 (UTC)Reply
A couple months after the deprecation, someone unrelated to the deprecation came along and created the redirect. It does not make the redirect deprecated, but does point to a desire to standardize on as few templates as possible, particularly when they serve no unique purpose. AManWithNoPlan (talk) 18:39, 2 June 2021 (UTC)Reply
As far as I can see, there was no "deprecation" at all. Ever. There was a deletion of a template. That template was not deprecated; it was completely removed from the project. Later on another template reused that name. I don't see why you persist in thinking the decision to remove one template has any relevance for the later use of the same name by a different template. —David Eppstein (talk) 18:46, 2 June 2021 (UTC)Reply
I'm curious as to where this "deprecation" exists. Tyrone Madera (talk) 16:40, 9 July 2021 (UTC)Reply

throttling big category runs

has there been any more thought to adding some capability to throttle big category runs so they don't completely take down the bot for everyone else? not trying to say the category runs aren't producing good edits or anything, just that it would be nice if both category runs and individual page requests could flow together.  — Chris Capoccia 💬 13:03, 5 June 2021 (UTC)Reply

100% agree--Ozzie10aaaa (talk) 13:42, 5 June 2021 (UTC)Reply
There should be multiple instances of the bot (one-or-more for multiple-page runs, one specifically single page runs), or a better scheduler. Headbomb {t · c · p · b} 21:30, 5 June 2021 (UTC)Reply
I agree with Headbomb. There is a lot of cleanup work to be done, and if editors have job to feed it with, the bot should the capacity to handle that work. There is enough work available for several instances of the bot. --BrownHairedGirl (talk) • (contribs) 17:36, 9 July 2021 (UTC)Reply
I wanted to second that. In my opinion priority should be given to individual page runs such as those triggered from the toolbar, and there seems to be a lot of throttling. Are there any technical limitations to increasing the number of instances or changing the scheduler? RoanokeVirginia (talk) 18:24, 23 July 2021 (UTC)Reply
It seems to me that one easy way to reduce the load would be for one or more of the heavy users of this bot to set up a clone of it for their own use.
I would like to to that for my bare-URL chasing jobs, where i have tens of thousands of pages lined up after pre-parsing for bare URLs.
Would anyone be able to help me through the steps? @Headbomb and AManWithNoPlan: could either of you help me with that? --BrownHairedGirl (talk) • (contribs) 21:50, 23 July 2021 (UTC)Reply

Right now, one editor (@Abductive) has the bot running two huge jobs simultaneously. See current bot contribs: the bot is currently processing both Category:1959 deaths (3603 pages) and Category:University of California, Berkeley alumni (3810 articles). That's a total of 7,413 articles. If the bot was doing nothing else, it would take more than a whole day to process that lot.

Surely it should not be possible to lock up the bot like that? --BrownHairedGirl (talk) • (contribs) 11:31, 24 July 2021 (UTC)Reply

How is that possible? The bot blocks a second huge run. Grimes2 (talk) 11:46, 24 July 2021 (UTC)Reply
@Grimes2: I also thought it was impossible. But in this case, the bot has not blocked the second big run. --BrownHairedGirl (talk) • (contribs) 12:10, 24 July 2021 (UTC)Reply
Is the bot locking out new requests? Each requested run gets interleaved with the other runs, so the size of the run shouldn't matter. Abductive (reasoning) 15:18, 24 July 2021 (UTC)Reply
@Abductive: all I know is that the bot locked out new requests from me until one of your runs had finished, several hours later.
And yes, the runs are interleaved, but since there were 2 other jobs running that meant that your two requests were taking 2 out of every 4 slots, instead of one out of three. --BrownHairedGirl (talk) • (contribs) 18:18, 24 July 2021 (UTC)Reply
My guess is that the throttling is accomplished by estimating how long a run will take, then preventing a user from requesting another run in that time interval, rather than somehow checking if the bot is still running a job. Abductive (reasoning) 15:45, 24 July 2021 (UTC)Reply
Not in my experience. It happens whenever you ask it to do a category/page links run. No idea how you could do two runs like that under the current implementation, but maybe there's a corner case or something. Headbomb {t · c · p · b} 16:48, 24 July 2021 (UTC)Reply
If you have no idea, doesn't it make it more likely that my idea is correct? Abductive (reasoning) 17:26, 24 July 2021 (UTC)Reply
However it's being handled by the bot, I would have hoped that the user would refrain from making a second huge request until the first had finished. --BrownHairedGirl (talk) • (contribs) 18:20, 24 July 2021 (UTC)Reply
I made those requests many hours apart, maybe 20 hours? So I guess I expected the bot to block a second request if there was one running. Anyway, if the bot owners want to continue to ignore the second-most prolific Wikipedia editor of all time, I suggest that they reconsider. Abductive (reasoning) 18:29, 24 July 2021 (UTC)Reply
The approach I follow is to not make a second big request until the first one has finished, by checking the bot's contribs. That avoids any overlap. --BrownHairedGirl (talk) • (contribs) 18:38, 24 July 2021 (UTC)Reply
I do that. In this case, there was a substantial delay in when I made the requests, and when they started running. I can't even remember when I made them, it was so long ago and they were so far apart. I am making requests for small categories right now as tests, they do not seem to be taking, but I am also not receiving any 502 errors. Abductive (reasoning) 18:58, 24 July 2021 (UTC)Reply
It would be great if the bot was more informative about its response to all requests, esp batch requests. --BrownHairedGirl (talk) • (contribs) 19:35, 24 July 2021 (UTC)Reply

And again, one editor (@Abductive) has the bot running two huge jobs simultaneously. See current bot contribs: the bot is currently processing both Category:1954 deaths (3388 pages) and Category:Webarchive template other archives (2539 articles). I still don't understand how this is possible, but if an editor can't or won't exercise self-restraint, then the bot should apply it. Neither of these categories concentrates articles which have been identified as needing the bot's attention, so if they are in the job queue at all then they should be run serially at some sort of low priority rather than in parallel swamping everything else. --BrownHairedGirl (talk) • (contribs) 08:54, 28 July 2021 (UTC)Reply

The bot failed somehow. It is the bot owners who are exercising incredible restraint in never answering you. Abductive (reasoning) 09:29, 28 July 2021 (UTC)Reply
  • On the contrary, @Abductive:, this is now the second time in a few days that you have managed to swamp the bot which lists of pages which mostly do not need the bot's attention. The fact that is happening only with one user suggests that this is not simply a bot malfunction.
    I just looked at this set of 500 edits by the bot. They include 172 edits to pages from the set of 3388 pages in Category:1954 deaths, from the start of the set to page 1354/3388. So there were only 12.7% of these pages where the bot found anything at all to do, and some were trivial changes such as this [1] conversion of hyphens to endashes, a job which can be done by many other tools such as WP:AWB.
    With Category:Webarchive template other archives, the score is only marginally higher. The same set of bot contribs shows 43 edits to that set, from 525/2539 to 799/2539 -- so only 15.7% of that set are being edited.
    So the bot is labouring away on huge sets where only a tiny fraction of the page need any action. Meanwhile, my latest job of pages which do need attention was dropped at 09:12 and was slow to restart. (When it was running, the last set of 500 bot edits shows that it made 214 edits to that set of 2192 articles, from page 581/2192 to 1119/2192. That's 214 edits to 538 pages, a 39.8% edit rate to a set which all need attention).
    The bot has limited capacity, and right now most of that capacity is being wasted on huge lists which don't need attention. --BrownHairedGirl (talk) • (contribs) 10:55, 28 July 2021 (UTC)Reply
  • Share how you find articles needing attention, I'll do a run with a list of them. Abductive (reasoning) 18:50, 28 July 2021 (UTC)Reply

Bot makes cosmetic edits (only changing cite newspaper to cite news and removing format=PDF)

Can we fix this fucking bot already? ATS (talk) 18:46, 14 June 2021 (UTC)Reply

is there actually a "cite newspaper" template that's supposed to be valid? lol this seems like whining when the cite newspaper template redirects to cite news, what's wrong with replacing?  — Chris Capoccia 💬 20:39, 14 June 2021 (UTC)Reply
This template was cosmetic. Probably removal of |format=PDF should be marked as such, which I guess would cause the other edit not to be made. Izno (talk) 21:07, 14 June 2021 (UTC)Reply
However, regardless of the technicalities of it, your reverts are not an improvement to the wikipage (because reverting a cosmetic edit is itself cosmetic). Please save your sanity and others' and stop. Izno (talk) 21:18, 14 June 2021 (UTC)Reply
The problem with that edit is that it was a cosmetic edit (no change to the rendered page), which this bot is not approved to do. – Jonesey95 (talk) 21:21, 14 June 2021 (UTC)Reply
There's also little point in changing cite newspaper to cite news. Headbomb {t · c · p · b} 00:13, 15 June 2021 (UTC)Reply
I'll skip going round the merry-go-round on that one. Izno (talk) 16:16, 15 June 2021 (UTC)Reply
There is exactly zero point for any template not up for deletion. ATS (talk) 19:06, 15 June 2021 (UTC)Reply

Jobs being dropped again

Any speculation as to why? Abductive (reasoning) 02:59, 3 July 2021 (UTC)Reply

  • Weirdly, it stopped on Wladimir Klitschko (hist) twice and made edits to the article four times in a row (including the two times it halted), then when I used the Citations button, then by the bot, then by my using the Citations button, then the bot, followed by two in a row by me using the Citations button. This is ten edits in a row by the bot without any human changes to the article. What could possibly be wrong? Abductive (reasoning) 08:03, 4 July 2021 (UTC)Reply
jobs arent generally dropped. they just take longer than your webbrowser thinks they should and the web browser gives up. The multiple edits come from the zotero instance finding new information each time it is run - that is run by wikipedia. AManWithNoPlan (talk) 11:38, 4 July 2021 (UTC)Reply
So, the bot gets to, say, article # 123 out of 2300 in the run, and doesn't make any more edits to remaining articles in the category. And the reason the bot decided to stop was because it asked my browser, which I had closed right after I started the run, how it was feeling? Abductive (reasoning) 21:54, 4 July 2021 (UTC)Reply
Thats a different issue. Wikipedia can time out and the bot eventually gives up. It is possible, but unlikely that the bot might crash on a page. AManWithNoPlan (talk) 00:06, 5 July 2021 (UTC)Reply
I see. It might be having trouble with Wladimir Klitschko because that article has 522 citations. Abductive (reasoning) 00:48, 5 July 2021 (UTC)Reply
Now it keeps aborting on American Jews, another article with lots of citations. Abductive (reasoning) 21:02, 8 July 2021 (UTC)Reply
!Curl error: Operation timed out after 20001 milliseconds with 0 bytes received !Wikipedia responce was not decoded. !Unhandled write error. Please copy this output... Abductive (reasoning) 21:32, 8 July 2021 (UTC)Reply

I have been experiencing issues similar to those reported by @Abductive.

For the last week, I have been feeding the bot with long lists of articles with bare URLs, which I prepare using WP:AWB in pre-parse mode. I paste the lists into User:BrownHairedGirl/Articles with bare links, and feed that to the bot.

On most of these lists, the bot jobs the drop, and I make a new request. When the bot restarts, it processes articles which it skipped on the first pass, and also makes further changes to articles which it had already processed.

This is all kinda weird, and a bit annoying ... because while the bot can do a great job, I can no discernible pattern to whether with a given article it will do any job at all, or a partial job, or a great job. I have never come across such a fuzzy bot, and while I do understand that it is doing a very complex job and relies on multiple external lookups, this fuzziness makes it frustrating to use. The unpredictability also seems to me to be inefficient, because it leads to the same articles being processed multiple times, which exacerbates the bot's problem of capacity well short of demand. --BrownHairedGirl (talk) • (contribs) 00:40, 15 July 2021 (UTC)Reply

  • I have had the same big job dropped twice today. The second time was shortly after 21:30 UTC. See this set of bot contribs; at 21:30, there is the last edit on the set of 2198 pages, No. 1195/2198 ... then a gap until page 40/2198 at 21:56. I had restarted the job at about 21:45, and if I assume that if the previous job was not already dead, the bot would given me its message that I already had a big job running.
    Is it relevant that I also made two individual page requests (using the toolbar) close together about 21:30? Does that kill an existing job? --BrownHairedGirl (talk) • (contribs) 22:19, 20 July 2021 (UTC)Reply
  • Yet another job dropped today. A run of 2191 articles, dropped after editing 1241/2191 at 14:07 UTC: see list of the bot's contribs.
    After 30 minutes of inactivity, I trimmed the list to remove articles already processed, and restarted the remaining 950 pages at 14:49 (bot contribs after restart).
    The bot is doing great work on this set of international relations articles, making cleanup edits to nearly all the pages it processes ... but it's very time-consuming to have to monitor it for job drops, which has happened to my jobs 6 times in the last 6 days (see the of my article list). It would be great if this could be fixed. --BrownHairedGirl (talk) • (contribs) 15:07, 29 July 2021 (UTC)Reply

Curly Quotes

Status
new bug
Reported by
Drahtlos (talk) 14:57, 14 July 2021 (UTC)Reply
What happens
Bot replaces non-English quotation marks in non-English titles.
What should happen
If there is a language tag for a language other than English then the bot should leave quotation marks in titles unchanged.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=List_of_Soviet_microprocessors&diff=0&oldid=1032165907
We can't proceed until
Feedback from maintainers


MOS:CONFORM says otherwise, and that they should be changed. Headbomb {t · c · p · b} 17:44, 14 July 2021 (UTC)Reply

MOS:CURLY clarifies that quotation marks "internal to quoted non-English text" should be preserved, so the bot should change them only if they surround the entire title. Drahtlos (talk) 18:13, 14 July 2021 (UTC)Reply
We do not edit quotes, only titles. AManWithNoPlan (talk) 13:30, 22 July 2021 (UTC)Reply
In the example, the bot replaced

title = Микро-ЭВМ «Электроника С5» и их применение with title = Микро-ЭВМ "Электроника С5" и их применение, i.e. it replaced quotes internal to non-English text. Drahtlos (talk) 17:05, 22 July 2021 (UTC)Reply

I do not believe that is a "quote" AManWithNoPlan (talk) 01:00, 26 July 2021 (UTC)Reply

bot creates cite web with |chapter=

Status
new bug
Reported by
Trappist the monk (talk) 14:28, 19 July 2021 (UTC)Reply
What happens
bot creates {{cite web}} with |chapter= which is not supported by that template
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


For Oxford Dictionary of National Biography (ODNB) the correct cs1 template would be {{cite encyclopedia}}. There is {{cite ODNB}} which only requires |doi= and the entry name in |title=

Trappist the monk (talk) 14:28, 19 July 2021 (UTC)Reply

The chapter was fixed quickly. Will get to ODNB soon. AManWithNoPlan (talk) 13:11, 22 July 2021 (UTC)Reply

cite dictionary → cite ODNB

Status
new bug
Reported by
Trappist the monk (talk) 11:27, 22 July 2021 (UTC)Reply
What happens
left behind unnecessary and/or conflicting parameters; for {{cite ODNB}} (a wrapper around {{cite encyclopedia}}) the only necessary parameters are |entry=, |doi= (preferred) or |url=, and some form of an author parameter; all other parameters are provided by the wrapper template
What should happen
diff
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers


{{cite news}} or not {{cite news}}?

I noticed that Citation bot inconsistently changes citation templates for links to Rock Paper Shotgun articles from {{cite web}} to {{cite news}}. For example, in a test run yesterday, the bot performed these changes in this and this edit but did not do so here or here. Is there a reason for this discrepancy? Either way, wouldn't {{cite web}} be the more appropriate template for this website? Regards, IceWelder [] 06:59, 27 July 2021 (UTC)Reply

Related: Wired is being classified as {{cite journal}},[2] although the "ARE_MAGAZINES" constant stipulates that it should be a {{cite magazine}}. IceWelder [] 14:20, 27 July 2021 (UTC)Reply
I fixed the wired thing - it was the wikilink that caused the issue. AManWithNoPlan (talk) 14:25, 27 July 2021 (UTC)Reply
The different depends upon the meta-data that the website presents. AManWithNoPlan (talk) 14:27, 27 July 2021 (UTC)Reply
Which metadata field is that? For example, this ref was converted whereas this one was not. Apart from the article-specific stuff (title, author, date, keywords), the metadata appear almost identical. IceWelder [] 14:44, 27 July 2021 (UTC)Reply
Probably the citoid server probably timed out on the second one and thus there was no data. AManWithNoPlan (talk) 15:58, 27 July 2021 (UTC)Reply

minor cleanup

Status
new bug
Reported by
Keith D (talk) 11:21, 27 July 2021 (UTC)Reply
What happens
Changes {{citeweb}} to {{cite web}}
What should happen
Nothing if this is the only change as it a purely cosmetic action.
Relevant diffs/links
https://en.wikipedia.org/w/index.php?title=St_Helens_R.F.C.&curid=1095032&diff=1035699398&oldid=1034338489
We can't proceed until
Feedback from maintainers


bot adds |chapter= to cite document

Status
new bug
Reported by
Trappist the monk (talk) 11:30, 28 July 2021 (UTC)Reply
What happens
bot adds |chapter= to {{cite document}}, a redirect to {{cite journal}}; |chapter= and its aliases is not supported by {{cite journal}}.
What should happen
This:
{{Cite encyclopedia |last=Mari |first=Licia |date=2002 |entry=Amendola, Ugo |encyclopedia=Grove Music Online |publisher=Oxford University Press |doi=10.1093/gmo/9781561592630.article.44755}}
Mari, Licia (2002). "Amendola, Ugo". Grove Music Online. Oxford University Press. doi:10.1093/gmo/9781561592630.article.44755.
Relevant diffs/links
diff
We can't proceed until
Feedback from maintainers