This is an archive of past discussions with User:Citation bot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.
error messages suggesting than an API is broken on article XXXY syndrome
> Using pubmed API to retrieve publication details:
! Error in PubMed search: No response from Entrez server
> Using Zotero translation server to retrieve details from URLs.
> Using Zotero translation server to retrieve details from identifiers.
> Expand individual templates by API calls
> Checking CrossRef database for doi.
> Searching PubMed... nothing found.
> Checking AdsAbs database
> AdsAbs search 7087/25000:
title:"XXXY syndrome"
> Searching PubMed...
! Unable to do PMID search
! Unable to do PMID search nothing found.
What should happen
Citation bot should be able to connect to PubMed search and PMID search
A bug report has also been filed. Apparently the Wikimedia tool server which hosts many of these citations tools has been blocked from accessing the PubMed name server. Wikimedia Cloud Services has contacted the NIH with a request to lift the block. Boghog (talk) 16:54, 22 June 2019 (UTC)
I have seen that too. With the citation filling tool that also downloads data from PubMed and runs on the tool server, it occasionally works, but the vast majority of time, it doesn't. Boghog (talk) 14:45, 24 June 2019 (UTC)
I have to say I'm consistently puzzled by this logic of not adding missing information based on an already-provided DOI because of a title mismatch. I get not adding a missing DOI based on a title mismatch, but once the DOI is provided, it should be used. Headbomb {t · c · p · b}04:09, 1 July 2019 (UTC)
because all people are imperfect and careless and a larger source of gigo than we want to deal with. I will think about perhaps if the title is a subset 🤔. AManWithNoPlan (talk) 04:16, 1 July 2019 (UTC)
See here where the bot adds a URL to a reference that has a doi and a pmid. I thought that in such cases a URL was not desired. --Randykitty (talk) 09:33, 6 July 2019 (UTC)
The URL points to the same place as the DOI, so it's redundant. Where it adds a url, it should be a free full version of the article. Headbomb {t · c · p · b}10:04, 6 July 2019 (UTC)
if you find a dead link or pay link or DOI equivelent link added, then please report that. We can feed it back to the free DOI system and possibly black list it. AManWithNoPlan (talk) 17:22, 6 July 2019 (UTC)
It seems that this once was a website for the station but now it redirects to multiple spam websites. "Loading" nevertheless does not seem like a title we should accept in other cases as well. --Redalert2fan (talk) 10:31, 6 July 2019 (UTC)
Japanese titles removed while they appear to be correct
Status
{{fixed}} — also added some UTF-8 tests based upon thus to make sure that multibyte characters don’t get gooned in the future
I would add to this. The bot also adds redundant similarly named |newspaper=The Japan Times Online when |publisher=The Japan Times; see this edit. The correct action here is to rename |publisher= to |newspaper= and not add |publisher=.
When |title= is primarily CJK script, in the best of all possible worlds, replace |title= with |script-title=<language code>:<title text>. Yeah, this is a best of all possible worlds thing because it isn't always easy or even possible to know what the language is. At the next release of Module:Citation/CS1, |script-title= will require a valid language code for non-Latin scripts (a limited list) so writing |script-title= without the language code will just result in a profusion of errors.
the utf-8 stuff is the problem, i will get the patch added ASAP. Then I will add a test to make sure this never occurs again. AManWithNoPlan (talk) 12:58, 6 July 2019 (UTC)
The bot is incorrectly capitalizing non-English journal names (as here). The correctly formatted Ekolist: revija o okolju and Acta geographica Slovenica were changed to the incorrect Ekolist: Revija O Okolju and Acta Geographica Slovenica. Doremo (talk) 05:34, 8 July 2019 (UTC)
The don't really know the rules for slovenanian, but at the very least the O in "Ekolist: Revija o Okolju", should be lowercase. Latin should be capitalized however. Headbomb {t · c · p · b}06:00, 8 July 2019 (UTC)
When a url is not a free copy, then it must be removed IF there is another identifier according Wikipedia style guides (we don’t do this with google books, but we should). Also, if the url matches the doi, then it should be removed. AManWithNoPlan (talk) 21:46, 9 July 2019 (UTC)
API tweaks: Put diff | history in a fixed location
If you do a multiple bot run, you will have a list of stuff like
Written to Hoyt Vandenberg diff | history
...
Written to Hubert Winthrop Young diff | history
...
Written to Humanity and Paper Balloons diff | history
So to reviewing for diffs, you search for "diff | history" in the page, and you press Ctrl+G (in Firefox) to jump around. However, because Title in
isn't of fixed length, you need to spend time aligning your mouse with the diff link. Now this isn't the worse thing in the world, but if you have a list of 100 diffs, that's making a task that could take 20 seconds take 5 minutes. So instead, I suggest either of
[22], which adds|journal=The Unsolved Mystery of Kaspar Hauser. Jeffrey Moussaieff Masson Translator and Introduction. New York: The Free Press, 1996. 254 Pp.. Psychoanal Q
Well, I was mostly thinking if you find something like https://(...)/10.1234/987654321{{deadlink}}, it could maybe be parsed as a DOI link were {{deadlink}} not there, even if the full url didn't resolve. I figured that if the link was dead, there would be nothing to be parsed and it wouldn't expand. Maybe I'm wrong there. Headbomb {t · c · p · b}18:14, 11 July 2019 (UTC)
Same for other identifiers if possible. It's not the most critical of things, so not toooo much thought needs to be put on this. But I figured if a link could be parsed when the if a deadlink template wasn't there, it'd be nice to have the bot do something with the link when the template was there. Headbomb {t · c · p · b}20:57, 11 July 2019 (UTC)
It's happening on other articles too. There's this sequence for example, [52] + [53]. It's possibly the ?via=ihub that throws things off. Headbomb {t · c · p · b}02:24, 10 July 2019 (UTC)
I had just noticed it in another article where the expansion then succeeded as I entered the DOI manually. Did you submit many articles with sciencedirect.com URLs at once? Maybe we got throttled? Nemo07:45, 10 July 2019 (UTC)
I suspect the expansion would have worked the next time because ?via=ihub was stripped from the URL in the previous bot's run. Headbomb {t · c · p · b}07:53, 10 July 2019 (UTC)
As in it's going through too many requests? I could hold on for a bit, it's at the end of ~100 article run or so. Headbomb {t · c · p · b}09:03, 10 July 2019 (UTC)
What happens is when a page is no longer avaiable on japantimes.co.jp you get redirected to https://www.japantimes.co.jp/article-expired/ which states: "The article you have been looking for has expired and is not longer available on our system. This is due to newswire licensing terms." and has the title "Article expired". This is not clearly not the title we are looking for. --Redalert2fan (talk) 23:53, 11 July 2019 (UTC)
Journal template should be filled out with "last4=Fürst"
Replication instructions
Run the citation bot on a page containing the following text: {{cite journal |doi=10.1007/BF00562648 }}
We can't proceed until
Feedback from maintainers
GIGO. Literally nothing we can do. We have complained to crossref and the publisher and they promised to fix the data someday. AManWithNoPlan (talk) 20:45, 12 July 2019 (UTC)
Both are apostrophes, curly vs straight is a stylistic typographic change, not a semantic one. On Wikipedia, we mandate straight quotes and apostrophes. Even in quoted material. Even in citations. Headbomb {t · c · p · b}04:29, 14 July 2019 (UTC)
@AManWithNoPlan and Headbomb: Thanks for your reply. You may be right, but if English Wikipedia outright bans all of the curly apostrophes, the readers will never get a chance to find out on their own whether or not there is a stylistic, semantic or other difference between the two types of apostrophes. Don't be so quick to assume things that you don't know for a fact. Just because they are similar doesn't mean they are the same- in fact, calling them 'similar' implies that they are 'different', otherwise we would call them 'identical'. No, I'm sorry, you can't change the name of cited sources randomly. I strong believe that you are dead, dead wrong on this one- you don't know what you are talking about in fact. Why have the two code points if there's no difference? I have to strongly rebuke you here otherwise you might not realize the error you are perpetrating on English Wikipedia. Thanks for your time. Geographyinitiative (talk) 04:51, 14 July 2019 (UTC)
@AManWithNoPlan and Headbomb: No semantic difference, eh? Alright buddy, you look at this edit and tell me there's no semantic or stylistic difference: [65]. The authors are using curly apostrophes that curl inward from both directions. That's the author's way of writing in English. The author doesn't need your fascist hand to come down on them when someone uses this citation bot. So everything has to be simplified now- what is this, 1984? Just let the apostrophes alone. Geographyinitiative (talk) 05:03, 14 July 2019 (UTC)
Right, there is no semantic or stylistic difference there. The difference in orthography does not change the meaning or style of the word. Nothing "fascist" about these edits, and it is entirely inappropriate to refer to other editors in that way - please do not do that. And follow the consensus even if you do not agree with it, until and unless you have been able to change the existing consensus (which at the moment seems unlikely). Regards, --bonadeacontributionstalk08:54, 14 July 2019 (UTC)
Wait. The curly apostrophe isn't even there in the source linked in the article (this diff from the original report above) - it is a translation of the actual Chinese title. The "author" thus appears to be yourself, since you were the one to add the link with the translated title. --bonadeacontributionstalk10:01, 14 July 2019 (UTC)
There is a similar-looking character, ʻOkina (U+02BB), commonly used in Polynesian languages. That character should not be converted to typewriter apostrophe.
partial removal of "subscription" and "via" parameters
Looking at this diff, the bot appears to be removing the via= and subscription= parameters from citations. I personally find those useful, but I'm not too fussed about it. However, the bot has only carried out a partial removal; other citations that include the first of these have not been touched. Is this intentional? Vanamonde (Talk)23:05, 13 July 2019 (UTC)
they get removed when the associated url is removed and they no longer serve a valid purpose. URLs that duplicate doi are removed in a accordance with style guides. AManWithNoPlan (talk) 23:23, 13 July 2019 (UTC)
Ovid is a pain, since they have two websites that you get to choose from, so nothing is ever redundant. I will have to write specific code. IF(doi && pmid=OvidUrl)THEN drop url. AManWithNoPlan (talk) 14:12, 12 July 2019 (UTC)
title = 【お知らせ】Url(アドレス)が変わりました。|小田急電鉄 was added. This translates to: 【Notice】 Url (address) has changed. Odakyu Electric Railway. The page linked to is a dead link and redirects to https://www.odakyu.jp/404.html which gives: 404エラー お探しのページは見つかりませんでした。which translates to: 404 error The page you were looking for was not found.
What should happen
【お知らせ】Url(アドレス)が変わりました。|小田急電鉄 should not be added as a title
IEEE is very annoying: some of their URLs redirect and some others don't. Their rate limits are also horrendous and their staff even boasts about how mean they are towards their users. Nemo10:27, 16 July 2019 (UTC)
It is obvious that this 'Marianne Zimmerman' account is a bot, since it is working around the clock, 24/7. The account is not labeled as such, and has not been authorized by the Bot Approvals Group. In itself not a big deal, because the account has been making only positive edits and has not caused disruption. Still, it is technically violating policy, and I'm wondering why a bot would use another bot to make bot edits. That seems rather silly. I hope the author of the 'Marianne bot' can come forward so that we can work things out. Cheers, Manifestation (talk)12:04, 14 July 2019 (UTC)
The one thing I wonder about if the user checks their edits for possible bugs or mistakes, but seeing not a single revert of citation bot by this user makes me believe they absolutely do not. This means that it would be quite possible that actual bad edits are made... If you can run 24/7 and check your edits In the end it might technically not be a problem (policy aside), but I can tell you that I spend quite sometime checking every edit made by the bot under my request and then posting bug reports here, even on 1000 page category runs.
Further point I wonder how they run citation bot in an automatic way, they either must use a very large input of pages via the web interface ( pages separated with |) or made some sort of script that interacts with the web interface, it is clear with the edit summaries that category mode is not used. In any case basically running an unauthorized bot aside, it is possible that bad edits have and will be made, unless Marianne can kindly prove that they check the bot's edits as is requested. Redalert2fan (talk) 12:15, 14 July 2019 (UTC)
@Redalert2fan: Yeah, I think you're right. This 'Marianne Zimmerman' account must be blocked, at least for now, even though the owner seems to be acting in good faith. There is a reason why Wikipedia bots have a trial period. But more importantly, this 'Zimmerman' bot seems a bit redundant. It appears to just roam around, randomly cleaning up articles it encounters. Can't the Citation bot itself do that? Cheers, Manifestation (talk)12:24, 14 July 2019 (UTC)
@Manifestation: Well currently citation bot does not operate by itself and is only user activated so without activation nothing will happen. "Editors who activate this bot should carefully check the results to make sure that they are as expected. While the bot does the best it can, it cannot anticipate the existing misuse of template parameters or anticipate bad/incomplete metadata from citation databases." is clearly stated on the bots userpage. Since you activate it yourself you should check the edits made. For why citation bot does not operate in automatic mode I suspect that is exactly the reason currently, Maybe the maintainers can further explain this? Since I'm not totally clear on that. Ofcourse Citation bot has long passed its trial period but you can see many pages of bug reports in the archives just because things change on the internet and the maintainers/operator can not predict every variation in templates,characters,languages etc. Which is why it is so important to check these edits. Anyone can run the bot for any reason, including random runs or just pages/categories of interest so that's not a problem as long as edits are checked in my opinion. Thanks, Redalert2fan (talk) 12:37, 14 July 2019 (UTC)
@Redalert2fan: Oh, I didn't saw that. I believed the Citation bot made relatively simple changes, so I thought it wasn't a big deal if someone makes mass-edits with it. But you're right, this may not be a good idea after all. I've reported 'Marianne Zimmerman' to WP:ANI. Thanks, Manifestation (talk)12:58, 14 July 2019 (UTC)
@Smith609: would it be possible to have an onwiki page of blacklisted users for citation bot? Now that activation requires authentication, having an admin-editable page of "blacklisted" users would help in situations like this (while not requiring a full block of the activator). — xaosfluxTalk13:16, 14 July 2019 (UTC)
That would be a good idea, but it would still require us to check the edits/activations manually. Perhaps you could built in some kind of limit per user, with an internal warning being triggered if that user surpasses it. I've been scrolling through the thousands and thousands of edits of the Citation bot commissioned by the Marianne bot, and the earliest activation by the Marianne bot I could find was at 20:24, 24 June 2019. Go here, press Ctrl/Cmd + F, and search for "Marianne". Safe for a few pauses, the bot had been running non-stop for 20 days straight, with no one noticing until now. - Manifestation (talk)16:35, 14 July 2019 (UTC)
The edits I checked looked harmless enough. And if (as in this case) a sock is really just running the bot on randomly selected articles, I don't see the problem. But we do want to block activations from blocked users, to head off more problematic behavior like stalking other users (using the bot to send the message that the user is being stalked, and by whom) or to mask bad edits (by running the bot afterwards to hide the edits from watchlists and make them harder to roll back). —David Eppstein (talk) 19:02, 14 July 2019 (UTC)
I noticed a lot came from the Marianne account, didn't really consider them harmful, although the volume is more than you'd expect from a manual activations. Blacklisting is a good feature to have, although if it needs to be deployed here, I got no real opinion on. Headbomb {t · c · p · b}19:49, 14 July 2019 (UTC)
I, too, had sampled a number of those diffs (a few thousands, I think; an addictive game which consumed many hours of my time). I reported the issues I found, which were very few. It would be nice to have server-side runs on larger sets of "safe" articles (such as bare refs) so that the bot would become a no-op on those. Nemo22:39, 14 July 2019 (UTC)
[http://www.intechopen.com/books/aerospace-technologies-advancements/a-real-options-approach-to-valuing-the-risk-transfer-in-a-multi-year-procurement-contract A Real Options Approach to valuing the Risk Transfer in a Multi-Year Procurement Contract]. Arnold, Scot, and Marius Vassiliou (2010). Ch. 25 in Thawar T. Arif (ed), Aerospace Technologies Advancements. Zagreb, Croatia: INTECH. {{ISBN|978-953-7619-96-1}}
{{cite journal|url=http://www.intechopen.com/books/aerospace-technologies-advancements/a-real-options-approach-to-valuing-the-risk-transfer-in-a-multi-year-procurement-contract|title= A Real Options Approach to valuing the Risk Transfer in a Multi-Year Procurement Contract]. Arnold, Scot, and Marius Vassiliou (2010). Ch. 25 in Thawar T. Arif (ed), Aerospace Technologies Advancements. Zagreb, Croatia: INTECH. {{ISBN|978-953-7619-96-1}}|journal= Aerospace Technologies Advancements|doi= 10.5772/7170|date= January 2010|last1= Vassiliou|first1= Marius S.|last2= Arnold|first2= Scot A.}}
when bot removed the url parameter, the edit summary was "Add: issue. Removed accessdate with no specified URL. Removed parameters. | You can use this bot yourself. Report bugs here. | User-activated." which appears to be saying that before changes, the citation did not have a url (leading to the earlier "not a bug")
What should happen
better edit summary (did not mention removal of URL)
The edit summary the bot produced was:
"Add: issue. Removed accessdate with no specified URL. Removed parameters. | You can use this bot yourself. Report bugs here. | User-activated."
Perhaps before "Removed accessdate ..." could be added something like "Removed URL that matched DOI." or "Removed nonfree URL." (or at least "Removed URL."). Rayhartung (talk) 12:29, 18 July 2019 (UTC)
Thanks for the suggestion. That edit is 5 months old, this was already fixed in March. Now the edit summary states "Removed URL that duplicated unique identifier" (example). Nemo12:41, 18 July 2019 (UTC)
Redundant chapter-url and other url types should also be removed
Please see this diff: [76] where the edit summary is: "Add: date. Removed parameters." The actual change made was: publication-date=August 2018 was changed to date=August 2018. While a part of the parameter was removed no full parameters were removed and no new date was added making the summary a bit inaccurate. If possible could the summary for edits like these be changed to something that describes the specific action a more closely? --Redalert2fan (talk) 20:20, 18 July 2019 (UTC)
we walk a thin line between logging everything in horrible detail and not describing everything. We might want a “parameter name changed” at some point. AManWithNoPlan (talk) 21:10, 18 July 2019 (UTC)
Adds incorrect and incorrectly capitalized journal= parameter to conference proceedings (book) citation in LIPIcs book series
What should happen
Special:Diff/906995033 (ignore the unrelated sciencedirect url in a different citation). More generally Citation bot should never add a journal= parameter to a citation with incompatible parameters (in this case booktitle).
The "problem" here is that in {{cite journal}} the "journal" field is really used to mean serial. I doubt we even have templates to precisely replicate all the FRBR and host/components hierarchies. Nemo19:36, 19 July 2019 (UTC)
Well, this is a {{cite conference}} with a |series= already present. That should be enough to figure out that adding a |journal= to that likely doesn't make much sense. Headbomb {t · c · p · b}20:36, 19 July 2019 (UTC)
Re 'ref' and 'mode' parameters
Could the bot drivers be requested to not add line breaks where |ref= or |mode= are on the same line with (that is, immediately following) "{{cite xxx" or "{{citation"? These parameters change the behavior of those templates in very significant ways, effectively changing the template. Having these parameters deeper into the argument makes them less visible, and creates confusion. Where an editor sees fit to put them on the same line, that should be respected. ♦ J. Johnson (JJ) (talk) 22:10, 18 July 2019 (UTC)
The instance I have at hand was actually InternetArchiveBot's doing, whereas the instance I thought(?) was Citation_bot's doing is not readily at hand. Okay, maybe not a problem here. ♦ J. Johnson (JJ) (talk) 23:20, 19 July 2019 (UTC)
At [82] cite web was changed to cite thesis. Also type = Thesis was added. but at [83] Cite web was changed to Cite document. As far as I can see the only difference before was c vs C in cite. Further do we need type = Thesis if we have cite thesis? --Redalert2fan (talk) 19:47, 19 July 2019 (UTC)
I have encountered and manually corrected a few of these too. Another common pattern is Wiley DOIs losing a central <> element like "<839::AID-NME423>" and DOIs truncated after a dot or missing dots between digits. Nemo08:34, 20 July 2019 (UTC)
The evil period ending doi required some magic to avoid the error (it might have been encoding OR the URL was used, but both the url and doi fields had bot stopping comments added). I do not remember what I did. AManWithNoPlan (talk) 20:05, 20 July 2019 (UTC)
Hello, I just ran the bot on this revision of TRAPPIST-1 giving these results. In the API output this section caught my eye;
> Remedial work to prepare citations
> Trying to convert ID parameter to parameterized identifiers.
> Trying to convert ID parameter to parameterized identifiers.
~ Renamed "date" -> "CITATION_BOT_PLACEHOLDER_date"
~ Renamed "CITATION_BOT_PLACEHOLDER_date" -> "date"
> Trying to convert ID parameter to parameterized identifiers.
~ Renamed "year" -> "CITATION_BOT_PLACEHOLDER_year"
~ Renamed "CITATION_BOT_PLACEHOLDER_year" -> "year"
~ Renamed "date" -> "CITATION_BOT_PLACEHOLDER_date"
~ Renamed "CITATION_BOT_PLACEHOLDER_date" -> "date"
In the end no dates were changed or added. Is this intended behavior or is there some accidental double work going on? --Redalert2fan (talk) 11:43, 20 July 2019 (UTC)
You have just seen inside the machine where the sausage is being made. We have to temporarily move some things out of the way and then put them back during some API calls. AManWithNoPlan (talk) 14:11, 20 July 2019 (UTC)
Batch completed, 145 page(s) processed, 2 page(s) skipped, 24 edit(s) made. Report issues/suggestions.
[diff | history] Hoyt Vandenberg – Add: title. Converted bare reference to cite template.
[diff | history] Title2 – Edit summary
[diff | history] Title3 – Edit summary
[diff | history] Title4 – Skipped, page is fully protected!
[diff | history] Title5 – Edit summary
...
[diff | history] Title25 – Skipped, {{bots|deny=Citation bot}} found!
[diff | history] Title26 – Edit summary
To get the best results, see our helpful user guide!
Suppressing the | You can use this bot yourself. Report bugs here. | Activated by User:Username part of the edit summary. Headbomb {t · c · p · b}19:37, 10 July 2019 (UTC)
I suppose you mean suppressing it in the API only and not the actual posted edit summary by the bot? This would massively help with checking the edits so support for this. But if its quick to implement my original suggestion at least helps a bit already in my opinion. Redalert2fan (talk) 19:52, 10 July 2019 (UTC)
Yes, in the API only. Whoever activated the bot knows they activated the bot and that it's possible for them to do so. Headbomb {t · c · p · b}20:05, 10 July 2019 (UTC)
[86] I'm getting these all the time now and I think they arguably make the citation sections worse. There's no way that [edit: general readers] know to click on the linked "doi" when the citation's title itself is unlinked. I'll note that the {{cite journal}} documentation examples keep the url parameter even when a doi is provided.
My question was where this consensus has been established, or if this is just a practice localized to editors using this bot/tool. czar15:17, 20 July 2019 (UTC)
I don’t have time to look it up, but the links are in the talk archives somewhere—hopefully someone not in an auto parts store can respond better. AManWithNoPlan (talk) 15:31, 20 July 2019 (UTC)
The general idea is that these links are redundant with the DOI/other identifiers, who are clear about where they take you (doi: version of record, jstor = jstor repository, etc... If you don't know what those are, we have the wikilinks). |url= is then freed up to be used for freely-available full text versions-of-record of the paper hosted on an author's website, or similar. If the DOI version is free, you can use |doi-access=free to mark it as free, etc. Headbomb {t · c · p · b}17:29, 20 July 2019 (UTC)
The BOT adds date ranges in ISO format - which is not a valid date format. |date=2011-10-01 - 2011-12-17
What should happen
Should add in a valid format date in accordance with the use templates if present in the article. Should have added |date=1 October – 17 December 2011 or |date=October 1 – December 17, 2011
I found a couple places we called “tidy” on blank parameters. Will update soon. The only blank thing we remove then will be some postscript parameters when meaningless and the empty via parameter since its presence is rare and leads to misuse and parameters that duplicate set parameters (remove blank year if date is set) AManWithNoPlan (talk) 13:10, 26 July 2019 (UTC)
Converted them to {{citation}} in that article. No reason to use such a feature-poor template on a non-linguistics related article. Headbomb {t · c · p · b}19:46, 26 July 2019 (UTC)
Some linkinghub.elsevier.com URLs redundant with DOIs do not get removed, either because Elsevier acts up or because they contain extra parentheses and stuff in their IDs.
What should happen
special:diff/908519421: just remove them all when there is a DOI; this domain is never the final destination for the user, so all these links were added by some automatic tool when the users provided the DOI or other input.
This mostly due to the (Clifton, NJ) thing in one but not the other. Probably should be a hardcoded exception/equivalence. Headbomb {t · c · p · b}04:16, 27 July 2019 (UTC)
Isn't that really a series of books so the citation should be something more like:
{{Cite book |last=Laustsen |first=Anders |last2=Bak |first2=Rasmus O. |editor=Yonglun Luo |date=2019 |title=CRISPR Gene Editing: Methods and Protocols |chapter=Electroporation-Based CRISPR/Cas9 Gene Editing Using Cas9 Protein and Chemically Modified sgRNAs |location=New York |publisher=Springer |pages=127–134 |doi=10.1007/978-1-4939-9170-9_9 |pmid=30912044 |isbn=978-1-4939-9169-3}}
Laustsen, Anders; Bak, Rasmus O. (2019). "Electroporation-Based CRISPR/Cas9 Gene Editing Using Cas9 Protein and Chemically Modified sgRNAs". In Yonglun Luo (ed.). CRISPR Gene Editing: Methods and Protocols. New York: Springer. pp. 127–134. doi:10.1007/978-1-4939-9170-9_9. ISBN978-1-4939-9169-3. PMID30912044.
Also, if the 'fix' is to keep the template as {{cite journal}}, the next release of Module:Citation/CS1 suite will require {{cite journal}} to have |journal=.
Not sure i'd call it crashing, but it does quit before it finishes the page. Is the page too big at 389kb? When I tried, it got about half way through to checking AdsAbs for the citation of DOI 10.1111/j.1438-8677.1971.tb00715.x — Chris Capoccia💬01:39, 1 August 2019 (UTC)
See this edit. I would like the bot to automatedly do this without having to summon the bot.
See this edit. I would like the bot to automatedly do this without having to summon the bot.
This is not a bot bug. Is it possible to program the bot to automatedly restore the required proper attribution in accordance with WP:MEDCOPY? If this bot can't be programmed to do this then which bot on Wikipedia can be programmed to do this? QuackGuru (talk) 17:34, 2 August 2019 (UTC)
The link is not required by the license: what matters is that you name the authors and the license. So, personally I prefer to leave the URL in the citation. Nemo17:42, 2 August 2019 (UTC)
Spanning over multiple templates is beyond the scope of this bot. Not really sure what the distinction between referencing something and adding a separate "we stole a bunch of text from this freely copyable source" template also is. The extra template relisting the exact same information is quite ugly. AManWithNoPlan (talk) 17:50, 2 August 2019 (UTC)
Can the bot be slightly more comprehensive in catching references with unstructured citations like this? (Where I had to manually remove everything and replace with cite journal + doi.) Nemo16:23, 12 July 2019 (UTC)
the bot does this kind of thing, when it sees that citation templates dominate over non-citation templates. We avoid running a bulldozer over citevar AManWithNoPlan (talk) 16:46, 12 July 2019 (UTC)
Yes, I'm asking if it would be fine to catch a case like this one I linked. If so, I could submit a patch. Nemo17:05, 12 July 2019 (UTC)
<ref>[https://doi.org/10.5752/P.2175-5841.2011v9n22p396 Abumanssur, Edin Sued. 2011. “A conversão ao pentecostalismo em comunidades tradicionais.” Horizonte 9 (22): 396–415. DOI: doi.org/10.5752/P.2175-5841.2011v9n22p396 <span></span>].</ref>
<ref>M.C. Curthoys, M. C., and H. S. Jones, "Oxford athleticism, 1850–1914: a reappraisal." ''History of Education'' 24.4 (1995): 305–317. [http://www.tandfonline.com/doi/abs/10.1080/0046760950240403?journalCode=thed20 online]</ref>
<ref>Aday, S. (2010), "[http://www3.interscience.wiley.com/journal/123303811/abstract Chasing the bad news: An analysis of 2005 Iraq and Afghanistan war coverage on NBC and Fox News channel]", ''Journal of Communication'' 60 (1), pp. 144–164</ref>
* Brockliss, Laurence W B, ''The University of Oxford: A History'', [[Oxford University Press]] (Oxford, 2016); 11th century to present; {{doi|10.1093/acprof:oso/9780199243563.001.0001}} online
We can also send an entire line to Citoid and it will use the CrossRef service to get suggestions on what that might be. Sometimes the result is far off, but we can try and make sure it's similar enough. Nemo10:09, 21 July 2019 (UTC)
A quite weird instance, it does seem that on the reference there are 2 titles used because the press release discusses multiple things so the actual given title is "Bombardier Announces Financial Results for the Third Quarter Ended September 30, 2015 <br><br>Government of Québec Partners with Bombardier for $1 billion in C Series as Certification Nears". However I think this is clearly unwanted because it adds unnsecary blank lines in the reflist. --Redalert2fan (talk) 12:29, 31 July 2019 (UTC)
More minor changes that should not be done as single edit
Status
{{fixed}} by just removing work parameter that is useless in cite web
These edits are done to prevent future errors. The better parameter is website not work for this citation, so we fix it now. AManWithNoPlan (talk) 14:25, 31 July 2019 (UTC)
Wouldn't it be beter then to remove it completely in cases like these citations when it is empty? That would remove possibilities for future errors. --Redalert2fan (talk) 14:36, 31 July 2019 (UTC)
In my experience a large fraction of {{cite web}} templates should really be {{cite journal}}, {{cite magazine}}, {{cite news}}, etc., and their |work= parameters should really be the title of the journal, magazine, or newspaper. Calling it a website makes a stupid use of the wrong template even stupider, and will no doubt encourage users to fill it in with the url or hostname instead of the actual title of the collective work. I think switching the name of the parameter in this way is a bad idea. —David Eppstein (talk) 01:51, 1 August 2019 (UTC)
Variants of the URL to which the DOI resolves, which just differ by a prefix like /abs or a suffix like /epdf, seem not to be matched as redundant at the moment.
URL change requests can also be made at WP:URLREQ which can do many URL-specific issues like URLs located outside of CS1|2 templates, {{webarchive}} templates, archive-url additions and deletions, fixing bad encoding, updating IABot database, URLs on Commons, etc.. -- GreenC14:47, 6 August 2019 (UTC)
Thank you. However, the last time I understood it doesn't handle outright removal of URLs which are pure garbage. Adding wayback machine links to those garbage URLs only multiplies the garbage for no benefit. Nemo16:48, 6 August 2019 (UTC)
I find the bot an extremely good idea; many thanks to its developers! Here is one feature request: the bot should also apply to snippets such as
{{Citation|mr=MR0258885}}
by referring to MathSciNet (in this case to ([111]) and retrieve the information from there (or possibly retrieve the doi from there and then proceed as usual). Thanks for considering this extension! Jakob.scholbach (talk) 09:56, 23 July 2019 (UTC)
OK, I am entirely ignorant about how the bot works, but if I am able to access the mathscinet content in my browser, is there no way of giving the same access to the bot?
I just saw at the github page that the bot is able to pull data from JSTOR. JSTOR is also behind a pay-wall, so what precisely is the difference here? Jakob.scholbach (talk) 11:44, 23 July 2019 (UTC)
jstor is not behind a paywall. You can visit it all day and night. Secondly, the meta-data is per my request not protected by any captcha on jstor. AManWithNoPlan (talk) 11:57, 23 July 2019 (UTC)
I feel you will likely get hit by the MR banhammer for bots, but it's at least worth investigating using existing MR to complete the rest of the information. I think the confusion happened because Zbl is captcha protected. You won't be able to search MR though, that's a subscription only thing. Headbomb {t · c · p · b}13:17, 25 July 2019 (UTC)
When you ask the bot to run on say Category:Foobar, the entire category will enter the job queue and get processed. So if means if you have something like
Extended content
10:00:00 am Category:Foobar A is requested to be processed by User:A
10:00:01 am Foobar B is requested to be processed by User:B
10:00:02 am Category:Foobar C is requested to be processed by User:C
10:00:03 am Foobar D is requested to be processed by User:D
You could very well have
10:00 Foobar A1 is being processed
10:01 Foobar A2 is being processed
10:02 Foobar A3 is being processed
10:03 Foobar A4 is being processed
10:04 Foobar A5 is being processed
10:05 Foobar A6 is being processed
10:06 Foobar A7 is being processed
10:07 Foobar A8 is being processed
10:08 Foobar A9 is being processed
10:10 Foobar A10 is being processed
10:11 Foobar A11 is being processed
10:12 Foobar A12 is being processed
10:13 Foobar A13 is being processed
10:14 Foobar A14 is being processed
10:15 Foobar A15 is being processed
10:16 Foobar A16 is being processed
10:17 Foobar B is being processed
10:18 Foobar C1 is being processed
...
13:14 Foobar C235 is being processed
13:15 Foobar D is being processed
Leading to massive delays for User B and User D. A fairer queuing process would be to put each request into a bin
Bin A [16 articles]
Bin B [1 article]
Bin C [235 articles]
Bin D [1 article]
And cycle between active 'bins' until each get empty. So you'd have a queue that looks like
10:00 Foobar A1 is being processed
10:01 Foobar B1 is being processed
10:02 Foobar C1 is being processed
10:03 Foobar D1 is being processed
10:04 Foobar A2 is being processed
10:05 Foobar B2 is being processed
10:06 Foobar A2 is being processed
10:07 Foobar B2 is being processed
...
10:35 Foobar A16 is being processed
10:36 Foobar B16 is being processed
10:36 Foobar B17 is being processed
10:36 Foobar B18 is being processed
10:36 Foobar B19 is being processed
...
13:15 Foobar C235 is being processed
I would have to think about that. It your description of the current mode of operation is off; but, there could be improvements done. AManWithNoPlan (talk) 22:58, 29 June 2019 (UTC)
Whatever the current logic is, the taxonbar run right now is blocking anyone else from requesting edits. Similar things happen whenever I requested category runs. Headbomb {t · c · p · b}23:30, 29 June 2019 (UTC)
There is no logic. Tasks are processed—for the most part—as received. Category and multiple page runs are treated as multiple tasks. AManWithNoPlan (talk) 02:02, 30 June 2019 (UTC)
The bot just does the entire category in one PHP request which has no knowledge of other people waiting. I see two possibilities: 1) ask Toolforge sysadmins how to get it to handle more requests concurrently; 2) split up the category job in multiple requests, e.g. by making the category page redirect to the process page URL for one title which will process just one page and redirect to the next and so on.
Arguably, the fundamental problem with category runs is that they mostly encounter pages which don't need to be treated. This run presumably went through over 1000 pages, checking all their URLs and identifiers and everything, but was only needed for less than 200. On the other hand, the whole point of the bot is that it saves time to a human who would otherwise have to do the hard work, such as selecting the pages which need some edits. Nemo06:10, 30 June 2019 (UTC)
Refill2 uses celery to manage worker. If we go that type of route, then the category API would be changed to list generator that then calls the page API with a list. Single point of entry. AManWithNoPlan (talk) 15:12, 30 June 2019 (UTC)
Yes, with a tiny bit of additional complexity (preferably handled by some external library) the multi-page editing could be handled much better. Nemo15:47, 30 June 2019 (UTC)
On top of binning, there could be some parallel processing of some kind, like having multiple instances of Citation bot running on the tool server, and when one of them was ready to make an edit, it would get queued. This way if you run on an article that takes ~10 minutes to process, other articles could still get dealt with. Headbomb {t · c · p · b}23:19, 2 July 2019 (UTC)
Job Arrays work on Toolforge, it will run 16 slots at a time filling in empty slots until the submitted job queue is done (unlimited size). Requires something like ZOT to do file locking on disk writes, or application-level file locking. -- GreenC01:17, 3 July 2019 (UTC)
I have to say that with the way the bot is being used right now, this really, really would help. I made a ~50 article request last night that took something like 3-4 hours to process. Would have been nice to be able to use the bot on select articles while the large run was going on. Headbomb {t · c · p · b}15:51, 3 July 2019 (UTC)
Since usage has gone up recently by quite a bit this would definitely help (obviously). This actually would enable more people to use the bot at the same time, or if a single person splits their request to do said request faster. We do have to then look out for people who might (ab)use this by just submitting 10x the jobs and taking everything for themselves either by accident or lack of patience but it seems that that might be mitigated by keeping some "slots" free for "Expand citations" via the toolbar, AFCH templates and single page request via the api/interface if that would be possible. --Redalert2fan (talk) 12:50, 14 July 2019 (UTC)
Usage should go down substantially now the Marianne account is blocked. I and a few others will still make big requests, but at least it won't be constant. Headbomb {t · c · p · b}20:03, 14 July 2019 (UTC)
Definitely better since the block yes, however when 2 users run (like you and I at the moment of posting) there is already a noticeable delay. While not constant right now now it could become a problem if even more people use the bot. In my opinion it would be better to "future proof" the bot, I understand this takes a lot of work but again in my opinion if it can be done would be helpful. --Redalert2fan (talk) 17:21, 19 July 2019 (UTC)
Yes, even small-ish ~100 article runs are nightmares to do sometimes. I found that asking to do more that than are often leads to large delays and timeout errors. Which is a shame because you can find articles in need of highly-probably cleanup/tidying that the bot could do (like running on all pages that contain url=https://www.tandfonline.com/doi/...), but those often number in the thousands. Headbomb {t · c · p · b}17:30, 19 July 2019 (UTC)
I just waited an hour for a batch of about 10 to start and got a 504 timeout, not very encouraging to operate. I have no problem with waiting and running again but others might not and lose interest. Productivity is being lost sadly. I'm not particularly looking for the bot to be quicker, when one person runs it the time it takes to check is fine, what I would be looking for is that multiple users can use it at the same time. Would it be possible to run more instances at the same time? Yes the bot might have to throttle its edits but that's better than having user jobs not starting within a reasonable time. Redalert2fan (talk) 18:42, 19 July 2019 (UTC)
Not sure what more would be lost, so I'd rather run the full gamut of fixes. Also ADSABS is very desirable. Headbomb {t · c · p · b}18:18, 19 July 2019 (UTC)
The easiest solution is to further reduce the timeout on individual requests to Zotero and others: it helps avoid traffic jams when there are too many requests and/or a single URL inside a page is especially slow.
Yes, and many articles are now ready for an OAbot run: there are about 30k articles in the queue as of now, with 35k link suggestions. Nemo08:53, 20 July 2019 (UTC)
It used to have 16k! When either you or Chris are using the bot for batch runs, I just go do something else. :) Nemo06:54, 30 July 2019 (UTC)
Yeah, well not much choice. I limit mine in bunches of 100 usually, this way any other request made will not be delayed for too long and won't time out. But it would be nice to just be able "Alright, deal with those X thousand pages with this stuff that's completely fixable". Headbomb {t · c · p · b}12:05, 30 July 2019 (UTC)
This is getting really, really annoying to have request constantly timeout for hours because large categories are being requested. Please prioritize this. Headbomb {t · c · p · b}09:09, 5 August 2019 (UTC)
I have no idea how how the tool servers handle multiple requests. Is seems as if they all run in parallel and the tool server only give so much cpu to the tools as a whole. AManWithNoPlan (talk) 12:06, 5 August 2019 (UTC)
Today it feels better for me: I managed to use the gadget with very good response times even as Headbomb was doing some batches. Nemo12:57, 8 August 2019 (UTC)
The tool does feel better/faster. However, I've yet to see different batches run alternatively. Nemo's success is possibly due to breaks in my requests (I ask for ~100 articles at a time which gives the bot a chance to catch up on other requests without timeouts). I do recall being able to use the citation helper script while the bot was doing a batch run though. Headbomb {t · c · p · b}13:56, 8 August 2019 (UTC)
Nope, I mean I get speedy responses from the gadget in the midst of one of your run, in the same minutes when I see the bot perform several edits. the requests for single pages are much more efficient than batch requests, yes. Nemo14:08, 8 August 2019 (UTC)
I confirm speedy response via the Citations of the gadget. Doesn't work through toolbar link/API however. Headbomb {t · c · p · b}16:23, 8 August 2019 (UTC)
Many small requests work better than few huge ones, if you want I can write you a small script to do it efficiently. Email me to have it in your inbox. Nemo17:21, 8 August 2019 (UTC)
What's the point of all those search.proquest.com links? When I click one from an otherwise complete citation template, I'm not even presented with a title for the resource, so I can't be sure whether the link points to something else entirely. I see they're sometimes pasted as part of some ready made textual citation with a "Retrieved from" link, so I doubt the editors were actually interested in keeping such links. Are they fine to remove? Nemo18:12, 12 July 2019 (UTC)
if you are at the library (or have a library card), you can login and get them. Also, the link sometimes leads to a preview. Often when logged in with my library card, I can get a preview. AManWithNoPlan (talk) 20:44, 12 July 2019 (UTC)
some other bot needs to change all the proquest.umi.com links into the equivalent search.proquest.com urls too (the document numbers are not the same 🙄) AManWithNoPlan (talk) 03:08, 14 July 2019 (UTC)
the bot now does extensive pro quest url cleanup. The umi.com ones are now fixed and most proxies and session specific information should be removed. AManWithNoPlan (talk) 12:49, 20 July 2019 (UTC)
10.5555 is a test doi prefix and will never resolve. On Wikipedia, the vast majority of them are for JSTOR Global Plants. In fact, nearly all 10.5555/... DOIs can probably be removed and converted to |url=https://plants.jstor.org/stable/10.5555/.... They should check if that url resolves however, since there are some 10.5555 DOIs that are tests for other things. Headbomb {t · c · p · b}09:31, 8 August 2019 (UTC)
Kinda duplicate with one above, but this should be generalized behaviour, not just specific to 10.5555 broken DOIs. Headbomb {t · c · p · b}16:12, 9 August 2019 (UTC)