This is an archive of past discussions with User:Citation bot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.
Presumably you mean parameter values. And Martin's special case would |author-sep=,, where the comma is the value, not some trailing cruft. ♦ J. Johnson (JJ) (talk) 20:32, 2 November 2018 (UTC)
cs1|2 does not have or support |author-sep=, |author-name-separator=, or |separator=.
Well, even cleaning up https://zenodo.org/record/1000677/files/article.pdf to https://zenodo.org/record/1000677 is blacklisted (/me mumbles something angrily) (t) Josve05a (c)22:34, 5 November 2018 (UTC)
Yes. A friend cannot upload her papers to ResearchGate but can upload them to Zenodo. I think that may be telling. Guy (Help!) 23:40, 5 November 2018 (UTC)
Blocked and {{fixed}}. Also, a second pull is in place to turn it back off, if that is possible. You are correct, it is one thing to violate your own papers' copyright; but it is another thing to violate everyone's papers copyrights. AManWithNoPlan (talk) 18:25, 6 November 2018 (UTC)
I'm not sure what 'changing whites' would be here, but it did something similar in the previous edit [3], where normally it removes publisher in cite journals. Headbomb {t · c · p · b}18:14, 13 October 2018 (UTC)
If it does than that's another bug. It shouldn't remove the publisher parameter in cite journal templates unless the publisher value would be the same as the journal value. (And, actually, for optimal meta data it shouldn't even remove it then for as long as it is correct, so that both meta data entries journal and publisher can be populated. Instead, seemingly duplicate values should be detected in the cite template and one of the values suppressed in the output, but not in meta data.)
I just saw it remove a publisher from a "journal" that is really a newsletter whose publisher should not have been removed: Special:Diff/866664956. For major well-established academic journals, removal of publisher may be a good thing, but blindly doing it to all journal citations is not. Citation bot absolutely should not be making this kind of decision, and should not even be suggesting it to human editors (as they too-often fail to exercise any judgement of their own). —David Eppstein (talk) 21:02, 31 October 2018 (UTC)
Yeah (hence it being in small). I can just imagine the bot edit warring with it self back-and-forth...however, it logically feels as if "all possible edits should be made" before saving the change. (t) Josve05a (c)23:09, 8 November 2018 (UTC)
A (short) time-out for "second round" could be added, or only do it for "small" articles (i.e. if running it manually on a short section), or only run twice if there is not high-use (if that could be "tracked"). Not advocating this be implemented here, though. The issue at hand can (hopefully) be patched this time. Just a thought.(t) Josve05a (c)23:15, 8 November 2018 (UTC)
|journal=Bjpsych International |journal=Ieee Transactions on Computers |journal=Papers from the Workship Within the Framework of the XIII International Congress of Celtic Studies
What should happen
|journal=BJPsych International |journal=IEEE Transactions on Computers |journal=Papers from the Workship within the Framework of the XIII International Congress of Celtic Studies
OECD should always be capitalized. I've seen it both in |last1= and |publisher=. <ref>https://dx.doi.org/10.1787/9789264239012-en</ref> adds |last1=Oecd(t) Josve05a (c)10:47, 12 November 2018 (UTC)
Although frustrating, these very slow runs do often perform the requested edits even if they never return to display a result. Lithopsian (talk) 20:19, 13 November 2018 (UTC)
|doi=10.14288/1.0071732 and |url=https://doi.library.ubc.ca/10.14288/1.0071732 both lnks to the same place. And it has a recognized doi in the path, and should be removed. We should not add such links.
Removes |publisher=Google for citations to Google Maps
Relevant diffs/links
Don't
We can't proceed until
Feedback from maintainers
* {{Citation | publisher = Google | url = https://maps.google.com/maps/ms?ie=UTF8&hl=en&msa=0&msid=210554752554258740073.00045675b996d14eb6c3a&ll=6.839971,28.205177&spn=170.959424,24.609375&z=1 | type = map (non-exhaustive) | title = Participatory budgeting initiatives around the world}}.
Don't add (identical) |series=Handbook of Development Economics if |title=Handbook of Development Economics already exists (without somehow removing one of them)
The full message is "Alter: isbn. Removed accessdate with no specified URL". It covers both, but admittedly the amount of texted changed appears to be inversely proportional to the length of the message text. AManWithNoPlan (talk) 22:22, 17 November 2018 (UTC)
What should be done is that the DOI should be fixed by a human to conform with the DOI specifications. DOI.org is under no obligation to support non-conforming DOI values, and they could remove their de facto support at any time. – Jonesey95 (talk) 05:42, 20 November 2018 (UTC)
Request: clean up google search so-called references
While I'm not sure why a citation to Google Search should ever appear in an article, they do quite a lot. It would be good if the bot would remove unnecessary parameters for such URLs as well, as it does with Google Books.
On VERY rare occasions they are valid (example: the term xyz is more popular/common than zyx on the Internet). Almost all the time, it would be more honest to just say <ref>Look it up yourself loser</ref> AManWithNoPlan (talk) 20:01, 7 October 2018 (UTC)
While I don't disagree with you (at all), I still feel we (read: the bot) should act as if they are all valid, and clean them, and hope that someone else comes along and finds (any) better references. (t) Josve05a (c)20:05, 7 October 2018 (UTC)
aqs=chrome..69i57j69i59.14823j0j7 Assisted Query Stats - used for logging purposes only
sourceid=chrome Where the search originated from - used for logging purposes only
ie=UTF-8 input encoding; default is UTF-8
This is in many cases incorrect, despite Crossref stating this. For e.g. this edit it should be "Middle East Review of International Affairs, Vol. 20, No. 1, pp. 35-59". Perhaps |journal=SSRN Electronic Journal should be forbidden, since there seem to be a lot of misattribution to the real source.
{{cite paper}} is an alias of {{cite journal}} and should be supported in the same ways as {{cite journal}} is (only let the template name stay the same)
The style guides are very clear on not including publishers for Journals. 99% of the time the pdf links to publisher pdfs do not work, and even when they do, they often do not last for long. Anyway, it adds nothing that the doi already provides. AManWithNoPlan (talk) 16:30, 25 November 2018 (UTC)
I do not think this fixable, since the only way is to maintain a list of 10,000 magazines. Also, the template are actually exactly the same. AManWithNoPlan (talk) 15:40, 24 November 2018 (UTC)
They are not exactly the same. The rendering of |issue= and |number= differs, and you cannot set |title=none in cite magazine (there may be other differences). --Izno (talk) 18:41, 24 November 2018 (UTC)
For some odd reason, the bot keeps removing all information from citations about the publisher of the source and the location of the publisher. I have noticed it doing this for a while now and have had to keep cleaning up after it. I do not know if these removals are intentional or accidental, but I see no reason why the bot should be removing publishers from citations, considering that the publisher is a fairly essential piece of information about the source.
Relevant diffs/links
Recent examples of this include the bot's activity here and here. It has done it before, but I cannot find the other examples right away and would have to go looking for them.
We can't proceed until
Feedback from maintainers
All style guides reject including that information for journals. Also, it is often incorrect. The bot has been doing this for over a decade, so I am sure there are other examples. AManWithNoPlan (talk) 15:34, 27 November 2018 (UTC)
Many users prefer direct links to PDF files rather than records (although librarians and website owners prefer links to HTML pages so that they can track the users more easily). That said, this repository attempts to provide the handle in its HTML metadata, but is misconfigured: <meta name="DC.identifier" content="http://hdl.handle.net11245/1.345005"> (slash missing). I suggest to warn the repository administrators. Their records on BASE are also all broken, some OAI-PMH fixes are in order. Nemo14:47, 22 November 2018 (UTC)
the bot is uploading new data slowly. I can not get it to work at the moment.
We can't proceed until
Feedback from maintainers
Also not working for me. I asked it to check The Bill, so far 25 minutes and it's done nothing.-- 5 albert square (talk) 13:43, December 2018 (UTC)
{{wontfix}} shared server and sadly when it gets slow people often just start trying again and again thus making it worse (similar to shooting someone because they are bleeding and hoping it will help) AManWithNoPlan (talk) 15:32, 7 December 2018 (UTC)
Would displaying an error message of some kind be possible here? Something like "<server> is at capacity, try again in <ammount of time depending on server load>"? Headbomb {t · c · p · b}23:38, 7 December 2018 (UTC)
This is not a regression. The URL is added before the PMC is present. Will have to think about this. Perhaps move adding Open URL to the end would be best. AManWithNoPlan (talk) 16:46, 23 November 2018 (UTC)
Bot added "journal" parameter when "magazine" parameter was already present, creating a duplicate parameter error (since both are aliases of "work"). This is similar to the error which renames parameters to create aliases of "work", but in this case new parameters are being created.
Anyway to get the cite template to enclode the url better so Wiley can resolve it, or is this up to crossref/Wiley to fix? (t) Josve05a (c)22:56, 15 November 2018 (UTC)
No, but still sad. A bit surprised though that it didn't add |doi-broken-date=, but I guess it tests if broken before parsing what to write. (t) Josve05a (c)23:47, 15 November 2018 (UTC)
When the actuall website is Reuters.com, it whould be the work (such as |newspaper=), but while Reuters is the author of an article on another website (such as theguardian/nytimes) it should be |agency=. In this case |agency=Reuters be removed. Both |agency=Reuters and |newspaper=Reuters should not be present. (t) Josve05a (c)14:59, 24 November 2018 (UTC)
Bot renames "publisher" parameter to "newspaper". However, "website" parameter is already present. This creates a duplicate parameter error since both "website" and "newspaper" are aliases of "work".
What should happen
Don't convert any parameter to any alias of "work" if any alias of "work" (e.g., journal, newspaper, magazine, periodical, website) is already present.
Go to its main page tools.wmflabs.org/citations/, Thorough mode = yes, Commit edits = yes, and insert "Nuuk" into the input box next to "Process page". Then hit "Process page" and the error will occur almost immediately.
Thank you for the report. This comes from arXiv data. We support about a dozen formats that they use. This helps us decode new ones (or in some cases detect and not decode). AManWithNoPlan (talk) 15:33, 21 November 2018 (UTC)
chapter= was added to Cite encyclopedia without removing title=, causing there to be one quoted version of the chapter name and one italicized version.
What should happen
Bot should not operate on a citation formatted in this way
|chapter= is not a documented parameter in {{cite encyclopedia}}. |title= is supposed to be used for the encyclopedia entry. The bot should probably not add chapter at all when title is present, and it definitely should not add chapter and leave title in place. – Jonesey95 (talk) 18:53, 8 December 2018 (UTC)
The bot's edit summary was also partially incorrect in this edit, in that it claimed to have "Removed parameters", but it did not do so. – Jonesey95 (talk) 18:54, 8 December 2018 (UTC)
user enters a worldcat page for the url parameter and Citationbot ignores it
What should happen
worldcat urls should be removed and replaced with the oclc parameter, the same as with pmids and dois that are in the equivalent urls entered and swapped by the bot. In the case below, it should replace
The doi link points to the exact same page and is not prone to breaking as publisher links are. also, this case the pdf file is actually free which is a very unusual for a publisher website. AManWithNoPlan (talk) 21:04, 15 December 2018 (UTC)
Non-functional DOI links of the form 10.2307/<JSTORID> can be removed if they are broken. Working JSTOR dois, or JSTOR dois of a different form should be left alone. I believe JSTOR used to have internal redirects, but no longer do, so that's why we've got a bunch of crap 10.2307/<JSTORID> DOIs laying around. Headbomb {t · c · p · b}21:49, 19 November 2018 (UTC)
Anecdotally, sometimes the works where the JSTOR ID doesn't correspond to a working DOI actually have another DOI from a publisher. I'm not sure if these DOIs were never issued or what. Nemo23:04, 20 November 2018 (UTC)
That is correct, some do not actually have the doi issued. Some have one from the publisher and one from jstor (and maybe one from researchgate and and who knows who else. AManWithNoPlan (talk) 01:22, 21 November 2018 (UTC)
Add a 'silent' mode. This would simplify the output to simply
--------------------------------------------------------------------------
[12:13:02] Processing page '[[2018 FFA Cup preliminary rounds]]' – [[edit]] – [[history]]
# No changes required.
when there is no changes made and
--------------------------------------------------------------------------
[12:13:02] Processing page '[[2018 FFA Cup preliminary rounds]]' – [[edit]] – [[history]]
# Updating the page ([[diff]]).
when there is a change made. This could probably made 'default' for categories, with &silent=0 to disable it. Or alternatively, &verbose=1 to enable verbose logs. Headbomb {t · c · p · b}12:25, 21 August 2018 (UTC)
Removed/touched a parameter with a comment <!-- some readers have trouble with the link generated by the doi= field? -->, which should "block out" the bot from touching it. (t) Josve05a (c)09:30, 12 November 2018 (UTC)
Inappropriate capitalisation of foreign language titles - Spanish does not use title case, it uses first letter only capitalisation of titles.
What should happen
For Spanish titles, first letter only capitalisation (i.e. where language=es (and potentially other languages), don't apply English-language capitalisation rules.
In a call like https://tools.wmflabs.org/citations/process_page.php?edit=toolbar&user=Headbomb&page=Steve_Bieda, does edit=toolbar do anything? Because I'd like to have some ways to tell the bot that it was triggered via {{Draft article}} or citation expander, or similar.
We might want to rename the parameter to allow something like
I wonder what the audience of this additional message would be? To most users, what is important is the content and motivation of an edit, rather than the circumstances in which an editor came to make it. If I have a clear understanding of the motivation for this change, I'll be able to consider the best way to implement it. Martin(Smith609 – Talk)08:56, 27 August 2018 (UTC)
The goal is mostly to have a way to see where Citation bot is used from. How many of those edits were triggered by the web interface? How many were from user scripts and from which userscript, or how many from templates and which templates (and do any need updating)? How many were done via the Citation Expander gadget? It's not necessarily to have 'official' stats (it would be nice though), but knowing where the bot is used from is nice, and could let us give help to newbies that run into issues with the bot. Headbomb {t · c · p · b}10:32, 27 August 2018 (UTC)
For example, [23] was most likely triggered from {{Draft article}}, present on Draft:Lil ginger ale (we sadly can't feed who used the Template from the template because we don't have a {{CURRENTUSERNAME}} magicword/variable), but knowing it was triggered from the template means it has a fairly high chance of being used by a newbie, and was probably triggered by one of these people. So that lets us (or at least me) customize feedback to people. If I see someone doing something weird/unusual with the bot from {{Draft article}} vs Web Interface vs Gadget vs User Scripts, well you more or less have a continuum of likely noob vs likely noob/intermediate vs likely intermediate vs likely advanced user dealing with the bot. And you'd have an idea of who could have triggered the bot in that scenario. Headbomb {t · c · p · b}10:45, 27 August 2018 (UTC)
If the bot tries to "reformat" a blacklisted link (e.g. https://zenodo.org/record/1223952/files/article.pdf to https://zenodo.org/record/1223952 the bot will not be able to save the edit. We should stop to reformat these URLs, in order to be able to edit such pages. Editing pages with existing links aren't stopped, but formatting them turns them in to new links - which are blacklisted. (t) Josve05a (c)07:06, 28 December 2018 (UTC)
A better approach would be to find what causes this blacklisting, and see if edit filters can't be tweaked to let Citation Bot work around them. Headbomb {t · c · p · b}14:44, 28 December 2018 (UTC)
Yes please use BOTREQ for URL updates, but be careful using AWB it typically breaks archive URLs and/or doesn't undo previous archivals of the broken URL. -- GreenC16:08, 29 December 2018 (UTC)
Via= can't really be implemented in any useful manner, since edit= is currently unused and all usages use edit=toolbar and the draft article uses edit=toolbar and draft does not directly include it (it is two templates deeper, so we can't even tag it as from draft). We have done what we can for now. AManWithNoPlan (talk) 23:03, 23 December 2018 (UTC)
it does not exist. there’s no reliable way to do it. We can detect category vs toolbar, but nothing else. That is why edit= is not used. AManWithNoPlan (talk) 00:28, 24 December 2018 (UTC)
I know and we’ve done all we really can. Unless we have some way of actually getting reliable information (which we do not) there’s really no point to adding it. AManWithNoPlan (talk) 04:08, 24 December 2018 (UTC)
What do you mean 'reliable information'? what's wrong with just displaying the information that's passed in &via=! That'd be the whole point of via. Headbomb {t · c · p · b}04:54, 24 December 2018 (UTC)
I honestly doubt anyone would set it, since the toolbar and the citation toolset core that draft pulls information from both set toolbar. AManWithNoPlan (talk) 05:01, 24 December 2018 (UTC)
Why would we need a list of options / pre-approved stringers? 99%+ of usages would be from templates and scripts. Headbomb {t · c · p · b}06:19, 24 December 2018 (UTC)
I think the pre-approved strings would serve as a kind of input sanitisation. Otherwise at some point you may need to check that you're not inserting junk or spam in edit summaries (where it's hard to remove). I don't know how important a concern this is, but it's not unreasonable to keep it mind. Nemo10:08, 27 December 2018 (UTC)
that get dangerous could be junk like ii - iii, 5-7 or the evil look at pages 5 to seven and browse around pages in the early teens..... I will think about how many letters to allow. AManWithNoPlan (talk) 22:06, 23 December 2018 (UTC)
many style guides actually specify capitalization of Foreign journals independent of the what the journal itself is called. It’s an odd thing. Specific journals can be submitted for capitalization as needed. AManWithNoPlan (talk) 19:16, 24 December 2018 (UTC)
websites are not case-sensitve, but I can add a capitalization exception. the initial reference being a mix of a journal and a website confused the bot. AManWithNoPlan (talk) 19:16, 24 December 2018 (UTC)
::That is it is dumping all the page meta tags, then cite journal parameters, then a PubMed query. I'm not a PHP programmer, but this StackOverflow answer may be useful, if you're not already retrieving the meta tag data. I think PRISM may include the Dublin Core dc. tags as a subset, but the BMJ & maybe the Oxford journals also add useful citation_ tags.
dc.contributor Gordon C S Smith
dc.contributor Jill P Pell
dc.identifier 10.1136/bmj.327.7429.1459
citation_title Parachute use to prevent death and major trauma related to gravitational challenge: systematic review of randomised controlled trials
citation_public_url https://www.bmj.com/content/327/7429/1459
citation_mjid bmj;327/7429/1459
citation_lastpage 1461
citation_doi 10.1136/bmj.327.7429.1459
citation_section Hazardous journeys
citation_article_type Other
citation_pmid 14684649
Whenever there is a |foobar=<SOMETHING>|barfoo=...|foobar=<NOTHING>, the bot changes that to |DUPLICATE_foobar=<SOMETHING>|barfoo=...|foobar=<NOTHING>
What should happen
Whenever there is a |foobar=<SOMETHING>|barfoo=...|foobar=<NOTHING>, get rid of the empty parameter and keep the full one. E.g. [33] Exception: Keep the handling of author/editor parameters the same (last/first, editor-last/editor-first, etc...) since people often mangle the order by accident.
Correct information? There is no journal named 'peprint' out there, and that doesn't seem to be anywhere on the RG page either. Is this GIGO? Headbomb {t · c · p · b}19:05, 5 January 2019 (UTC)
The bot replaced some valid references by a reference to a completely different article (a book review published 10 years before this journal was established...) Worse, it inserted this faulty reference multiple times but as different references (probably because they were named and got different names). The apparent reason for this is that the URLs of the references had changed although (with 1 exception) they still redirected to the correct page. I have corrected this manually (see article history). I do find it weird that "cite web" references were replaced by "cite journal" ones that were completely inappropriate. Although the bot indicated that it was "user activated", there was no indication about who this user was, who clearly failed to check the edits made by the bot.
This mostly happens with Wiley's "fake DOI" ISSN links (which are often rather spammy by the way, as in this article) and can be conclusively solved only by actually resolving DOI links. Nemo10:32, 6 January 2019 (UTC)
Look where the incorrect reference goes. Even though the bot put "journal=Genes, Brain and Behavior", it was to an article in a completely different journal that had "Genes, Brain and Behavior" as title. It didn't go to one of Wiley's URLs at all. Wiley doesn't use these fake DOI URLs any more, although these generally are still functional but redirect to the new (non-DOI) URLs. All that the bot should have done was replace the "fake DOI URL" with the new URL. --Randykitty (talk) 10:45, 6 January 2019 (UTC)
Thanks for maintaining this invaluable tool. BTW, I'm still curious why the bot took those fake DOI links and arrived at an old book review, mixing up the review title and the journal name... --Randykitty (talk) 15:35, 6 January 2019 (UTC)
The Bot took the journal title which was in the title parameter and did a PMC search and found an exact match and went with it. We do have rare false positives like this. AManWithNoPlan (talk) 15:55, 6 January 2019 (UTC)