The project page associated with this talk page is an official policy on Wikipedia. Policies have wide acceptance among editors and are considered a standard for all users to follow. Please review policy editing recommendations before making any substantive change to this page. Always remember to keep cool when editing, and don't panic.
The following discussion is an archived record of a request for comment. Please do not modify it. No further edits should be made to this discussion.A summary of the conclusions reached follows.
Back in 2009, the community first enacted a restriction on mass creation of articles. The resulting policy was placed in WP:Bot policy since the impetus was mass creation using automated tooling. Even then concern was raised over whether WP:BRFA was the right forum for this, but at the time "good enough" carried the day.
Personally I'm tired of seeing WP:BOTPOL and WP:MEATBOT being bent out of shape in the arguments over what sorts of not-entirely-bot mass creations are or can be "covered" by WP:MASSCREATE. Thus I propose the question:
Should the answer be yes, the following changes to the text of the policy will be made. The intention is to keep the current meaning as far as possible while removing the bits specific to WP:BOTPOL:
Any large-scale automated or semi-automated content page creation task must be approved at Wikipedia:Bots/Requests for approval by the community. This requirement initially applied to articles, but has since been expanded to include all "content pages", broadly meaning pages designed to be viewed by readers through the mainspace. These include articles, most visible categories, files hosted on Wikipedia, mainspace editnotices, and portals. While no specific definition of "large-scale" was decided, a suggestion of "anything more than 25 or 50" was not opposed. It is also strongly encouraged (and may be required by BAG) thatcCommunity input may be solicited at WP:Village pump (proposals) and the talk pages of any relevant WikiProjects. Bot operators Creators must ensure that all creations are strictly within the terms of their approval.
Per a 2022 RfC, all mass-created articles (except those not required to meet WP:GNG) must cite at least one source which would plausibly contribute to GNG, that is, which constitutes significant coverage in an independent reliable secondary source.
Alternatives to simply creating mass quantities of content pages include creating the pages in small batches or creating the content pages as subpages of a relevant WikiProject to be individually moved to public facing space after each has been reviewed by human editors. While use of these alternatives does not remove the need for a BRFA approval, it may garner more support from the community at large.
Mass creation by automated means may additionally require approval as specified by Wikipedia:Bot policy. Approval of a bot for mass creation does not override the need for community consensus for the creation itself, nor does community consensus for a creation override the need for approval of the bot itself.
Note that while the WP:MEATBOT-like creation of non-content pages (such as redirects from systematic names, or maintenance categories) is not required to go through a formal BRFA by default covered by this mass creation policy, WP:MEATBOT other policies, such as Wikipedia:Bot policy, still applyies.
Should the answer be yes, I don't much care if the destination is a new standalone policy page, WP:Editing policy, or some other existing policy. In the interest of this not failing due to lack of consensus for where to put it, if there's not consensus for a specific destination then we'll default to "a new standalone policy page" at Wikipedia:Mass page creation and people can start a separate merge discussion later if they want.
The bot policy will retain a stub referring to the new policy. The existing redirects such as WP:MASSCREATE will be retargeted.
Mass page creation may require approval by the community, in addition to a BRFA if the method of that creation falls under this Bot policy. BAG may require that community approval for any mass content creation exists before considering bot approval.
Approval of a bot for mass creation does not override the need for community consensus for the creation itself, nor does community consensus for a creation override the need for approval of the bot itself. Bot operators must ensure that all creations are strictly within the terms of their approvals.
I don't suggest doing this at the same time. (Also, I think it would have to be something like the first and third sentences from the first paragraph, which is a level of complexity that should probably be discussed separately.) WhatamIdoing (talk) 04:44, 15 July 2024 (UTC)[reply]
Support per the three reasons given by Anomie below. The current situation always seemed like a strange compromise to me. Pinguinn🐧06:59, 16 July 2024 (UTC)[reply]
Support per Anomie, tho I'm not sure I agree with the wording of the stub, however that can be wordsmithed later. Sohom (talk) 20:08, 21 July 2024 (UTC)[reply]
Support per nom. Makes more sense. The status quo doesn't always have to stay just because it technically works. CFA💬22:18, 28 July 2024 (UTC)[reply]
Discussion (sever MASSCREATE from BOTPOL)
Please don't start trying to discuss any more sweeping changes here. Save those for a separate RFC you can hold after this passes. I ask uninvolved editors to hat any such discussions if people try to start them here, and closers to disregard any !votes calling for such changes. Anomie⚔23:15, 9 July 2024 (UTC)[reply]
Reading this again, my concern is that the wording you use, and the removal from BOTPOL, will mean WP:MEATBOT no longer applies, and thus there will be no restrictions on the mass creation of articles by methods such as boilerplates, which some editors argue aren’t covered by semi-automated.
(edit conflict) I disagree that this proposal changes the meaning in that way. The first sentence already maintains the existing wording about Any large-scale automated or semi-automated content page creation task.Also WP:MEATBOT is really just a duck test, it's supposed to stop people from claiming that a policy about automated edits doesn't apply because their edits are manual boilerplate-filling or whatever by saying that if it looks automated then we can treat it as such regardless. It doesn't actually do anything to make boilerplate-driven manual edits fall under WP:MASSCREATE where they aren't already against consensus or are otherwise disruptive.Also, IMO you'd probably do better to support this, because if this goes through then "The bot policy can't regulate human behavior" and "it makes no sense for human edits to be approved through BRFA" will no longer be valid objections to a proposal to strike "automated or semi-automated" from the first sentence (because it will no longer be part of the bot policy), and if you can get that through then you won't have to abuse WP:MEATBOT at all. Anomie⚔23:43, 9 July 2024 (UTC)[reply]
At the moment, we have a policy that applies to manual bot-like mass creation, while your proposed change removes that aspect.
Considering the intention is to keep the current meaning as far as possible while removing the bits specific to WP:BOTPOL, it would make more sense to remove automated or semi-automated. These are bits specific to BOTPOL, and by removing them you ensure that the section you are removing from BOTPOL actually has applicability outside BOTPOL. BilledMammal (talk) 00:32, 10 July 2024 (UTC)[reply]
I think you're trying to sneak in a wording change that tries to make your existing arguments easier to support. As I asked above, let's do this simple RFC first, then you can try to convince the community at large to accept your changes. Anomie⚔01:18, 10 July 2024 (UTC)[reply]
At the moment, you're trying to sneak in a wording change that makes the argument harder to support. Because of that, this isn't the simple RfC that I thought it was. BilledMammal (talk) 01:19, 10 July 2024 (UTC)[reply]
I don't think you're actually trying to sneak anything in, but I was a little annoyed by you suggesting I was.
What I do think is that this is a change from the status quo - I think the language I proposed to Thryduulf would maintain the status quo, while the language I proposed to you would change it in the opposite direction. BilledMammal (talk) 01:32, 10 July 2024 (UTC)[reply]
You've convinced yourself that WP:MEATBOT means that if you can squint hard enough to convince yourself that something is "bot-like" then you can expand the scope of WP:BOTPOL to cover clearly human actions, so you want to add "bot-like" to try to bolster that. That's no more correct than WhatamIdoing insisting above that WP:MEATBOT is about preventing high-speed editing and nothing else; she might hypothetically say that removing WP:MASSCREATE from WP:BOTPOL removes an implication that it only applies to high-speed editing (since only rapid editing, in her view, is "bot-like") and so want it to say Any high-speed large-scale automated or semi-automated content page creation task to "preserve" that interpretation. Anomie⚔02:00, 10 July 2024 (UTC)[reply]
@BilledMammal, I don't think you need to worry about this. MEATBOT applies to all edits that are "high-speed or large-scale edits that a) are contrary to consensus or b) cause errors an attentive human would not make". Even if MASSCREATE ends up on another page, or even if MASSCREATE didn't exist at all, MEATBOT would still apply to the same edits that it does now.
Ending the fiction that BAG approves the mass creations. Most of the time we already say "go get consensus at WP:VPR first", and then rubber-stamp it if the bot itself passes trials.
Getting arguments about how WP:MASSCREATE should apply to non-bots off of this page, which is supposed to be about the bot policy.
Stopping BilledMammal from having to abuse WP:MEATBOT to argue that WP:MASSCREATE should cover non-bot mass creations, by letting them argue for changing the policy to say that directly instead.
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
Open source bots
I would like to do a temperature check (explicitly not an RfC) as a follow-up to the discussion at this BRFA, of how people feel about changing the source code requirements. Currently the language in the bot policy is:
Authors of bot processes are encouraged, but not required, to publish the source code of their bot. and for adminbots: It is recommended that the source code for adminbots be open, but should the operator elect to keep all or part of the code not publicly visible, they must present such code for review upon request from any BAG member or administrator.
I would like to replace it with something like: "Authors of bot processes are expected to publish the source code of their bot in a public manner under an open source license to facilitate collaboration and forking. Should an author wish to keep the source code private, they must request an exemption from BAG during the bot approval process. BAG members may decline requests solely on the basis of source code not being open source." And some kind of grandfathering clause for current closed source bots.
The rationale being 1) allowing people to suggest improvements for bots or point out possible bugs as a technical review step, and 2) when bots/maintainers inevitably disappear, mandate that there is a path for someone else to take over the bot without starting from scratch. I think the Wikimedia movement has moved in this direction, with requiring open source licenses for bots run on Toolforge (the vast majority of our bots) and there's an abundance of places to post your code. Or in other words, open source should be the norm, private should be an exception.
Some carve outs: I'm sure people can come up with edge cases where publishing code isn't desired; if those turn into actual BRFAs, I'm happy to defer the decision to BAG on whether the exception is justified or not. As a practical matter, I think it would be fine if people make changes, test their code, and then publish it shortly after. I don't think we need a hard requirement that code must be open source before you run it against Wikipedia.
So temperature check: how do people feel about this? Is this a reasonable proposal? Or if you would not support something like this being formalized, why not? Legoktm (talk) 03:54, 5 September 2024 (UTC)[reply]
Looks ok to me. I expect open source codes for bots as well. It helped with the takeover of the AdminStats task (although we ended up hunting for a copy of the codes in Toolforge itself). However, can the grandfather clause be extended into new task requests of current bots as the new tasks may still utilise the closed-source codes. – robertsky (talk) 04:03, 5 September 2024 (UTC)[reply]
I don't see why we should bake 'they must be open, unless you ask for an exception, which we may deny' into policy, rather than the current 'we encourage, but don't mandate, open source bots'. Headbomb {t · c · p · b}10:00, 5 September 2024 (UTC)[reply]
@Headbomb: Could you expand on why you don't think it should be done? (I personally don't think we need an exception, I just expected people would oppose it without one) Legoktm (talk) 15:35, 5 September 2024 (UTC)[reply]
I'm also curious about this, but I do see that allowing for exceptions is a reasonable thing to do. Imagine if somebody came to us and said, "The company I work for has an abuse-detection system that's 10x better than what you're using now. We're willing to let you use it at no cost, but unfortunately I cannot make the code available". Having that exception carve-out gives us the ability to accept or refuse that offer as we see fit at the time. Not having that carve-out forces our hand. I think that's reasonable.
We already have agreements like that with some IP proxy detection vendors (I'm being cagey here about the vendor names only because I'm not sure of the status of these relationships). Because they are only (to the best of my knowledge) used as back-ends to some interactive tools, they don't fall under BAG's remit. But I could certainly imagine somebody wanting to build a BAGgy tool which uses one of those services as a back end. As much as I believe in FOSS everywhere, I also wouldn't want us to shoot ourselves in the foot to stand on principle. RoySmith(talk)16:04, 5 September 2024 (UTC)[reply]
You might not want to put your code up because it's crude/inelegant. You could also be doing things that is "OK" with private code, that isn't OK with public code, like having "if password = sw0rdf!sh continue, else fail" instead of whatever you should be doing with passwords and logins. Or you might be using code from someone else that you got permission to use, but didn't get permission to distribute. Or you may be using closed source code that you purchased, but don't have rights to distribute.
Like, I'll agree it's unequivocally better to have things open sourced. Hence why it should be encouraged. But volunteer coders are an extremely limited resource, so the fewer barriers to entry/participation we have, the better, IMO. Headbomb {t · c · p · b}19:12, 5 September 2024 (UTC)[reply]
Strong support Requiring bots to be open source seems like a good idea to me for reasons ranging from cultural (supporting the goals promoting the ideals of the Wikimedia movement) to security (code review) to disaster recovery (being able to continue operation of critical services should the original developer disappear). RoySmith(talk)12:15, 5 September 2024 (UTC)[reply]
I mainly pushed back against this in the above BRFA because I felt it violated current norms. But I am not opposed to it in general if we make a change to BOTPOL. There are major maintainability advantages to having bot code open sourced. Volunteers that write critical code lose interest or go inactive all the time. Honestly maybe a proposal to require Toolforge for non-AWB bots might also be worth considering. The combination of open sourced code plus Toolforge would be the ideal situation for rescuing abandoned bots. Finally, we also had a situation recently where an operator passed away and their bot was immediately blocked and globally locked. Avoiding blocking working bots ASAP, giving time for us to properly fork and replace them, might be worth adding to BOTPOL as well. –Novem Linguae (talk) 16:18, 5 September 2024 (UTC)[reply]
This is a good point; as anybody who has ever tried to port anything knows, just having the source code is only half the battle. Moving things to a new operating environment can be a pain too; requiring that everything runs in Toolforge (or Cloud VPS) would be a good thing IMHO. I'm not sure where you draw the line, however. Some people would insist that everything run in a Docker container. That would drive me nuts. Some people would insist that we only use phab, gitlab, and so on. That would also drive me nuts. RoySmith(talk)17:01, 5 September 2024 (UTC)[reply]
I think you're underestimating how successive additional requirements hinder attracting new volunteers. It's no big deal to experienced developers, but there is already a lot to navigate as a new developer. Of course, if the goal is to reduce the number of abandoned bots, discouraging new bots will definitely help. I think it would be better to encourage practices like source code availability, succession plans, etc. with a dashboard, recognition, and other approaches. Daniel Quinlan (talk) 20:14, 5 September 2024 (UTC)[reply]
I'm strongly opposed to the proposed change. The current policy encourages open source without being overly restrictive or discouraging of people submitting requests. As an open source developer, I think that's a good thing. But requiring all bots to be open source could discourage some potential projects, especially if they use proprietary code or need to use non-free components. For some projects, there are also security-related reasons to not open source code for the same reasons we have private edit filters. Finally, if there are specific bots that are truly critical and not open source, we should identify those bots and solicit for replacement bots that would be open source, or ask the WMF to write, maintain, and operate replacement bots for those functions. The current policy is well-written. Daniel Quinlan (talk) 21:04, 5 September 2024 (UTC)[reply]
Part of my concern is that if it's necessary to grandfather existing bots, it strongly implies that there would be a chilling effect on future proposals, for both existing and new bots. I’d prefer to start with a review of existing bots to assess their criticality and succession plans, and then consider improvements based on the assessment. Policy changes might be one approach, but I believe that providing encouragements that won't discourage future projects, and can be applied to all bots, would be more effective. Daniel Quinlan (talk) 22:25, 5 September 2024 (UTC)[reply]
I appreciate the feedback that people have given and will reply a bit later after digesting them, but please, can we avoid the bold votes? I would like to focus on the discussion and rationales not ... voting. Legoktm (talk) 21:39, 5 September 2024 (UTC)[reply]
Sorry if the bold came across as harsh, I was following the format of an earlier comment. I appreciate you following up on the discussion to discuss it out in the open which will reduce the odds this is rediscussed in random future BRFAs. Daniel Quinlan (talk) 22:10, 5 September 2024 (UTC)[reply]
I think the circumstances have to be taken into account. If the bot is going to be a one-time simple task, it's probably less critical to have a succession plan in place. If it's a bot that's going to underpin key workflow processes, then having a plan for ensuring that the task can be handed off is more important. I agree that ideally all code would be open source (keeping any necessary private configuration closed) and with a relatively uniform development and runtime environment, but practically speaking, I don't think English Wikipedia can afford to limit its potential pool of developers to that degree. isaacl (talk) 23:48, 5 September 2024 (UTC)[reply]
I would unequivocally support something stronger for bots expected to be a continuing run as "we require open source" and would definitely encourage something like the OP even for shorter runs. I'm sorry, but we cannot continue to depend on closed source bots and private source. (I have said as much multiple times now.) And sorry, if your code is shitty, that's how open source works. Either you can get over your fear of publishing something that is hacked together (as if no one else has hacked together code into production, know how modern MediaWiki started?) or you can do something else with your time (which I'm sure will be productive for wiki goals as well). Izno (talk) 18:00, 29 September 2024 (UTC)[reply]
While having code available is a good start, if we're going to introduce some requirements for succession planning, I don't think it should stop there. Too many people think if the source is available, all is good. But if it's Haskell code, for instance, the pool of potential contributors is significantly smaller. So if we start on this route, I think we need to also include things like the bot has to run on the toolforge servers, the language is highly recommended to be from a list of supported languages, and there is at least one other maintainer actively involved. isaacl (talk) 21:34, 29 September 2024 (UTC)[reply]
Before we consider a stricter policy for some cases, we need to be specific about which bots are actually critical. Requiring people to code in specific languages, or get up to speed on Toolforge on top of learning MediaWiki APIs for a bot that quietly does its own thing, isn't mission critical, etc. is going to be counterproductive. We need more people, especially newcomers to Wikipedia coding, writing more helpful bots, and we should endeavor to keep the cost of entry as low as reasonably possible. The ideal case is always going to be easily portable and well maintained code, but no requirements are going to keep us from avoiding some inevitable realities of the software lifecycle and the perfect is the enemy of good. Daniel Quinlan (talk) 00:13, 30 September 2024 (UTC)[reply]
Let's tackle one thing at a time. I happen to agree that it makes sense to state an increased expectation for these other qualities, but I think we have to start from "someone could even theoretically pick this up and run with it". Izno (talk) 00:52, 30 September 2024 (UTC)[reply]
I think a little bit more is needed to lay the base to enable someone else to theoretically operate a given bot. It could be one of the following: tool runs on toolforge, there are multiple active maintainers, or there is sufficient up-to-date written documentation that describes the software stack and execution environment. And, unfortunately for aficionados of more obscure languages, I think there should be a list of highly recommended languages. isaacl (talk) 01:28, 30 September 2024 (UTC)[reply]
On the one hand, I endorse all of these as obvious good things. And to that I'd add that the source has to be publicly available in a standard source control system (which these days basically means git). It's one thing to say the code is under an FOSS license, but if the distribution mechanism is to download a ROT-13'd shar file from a gopher server, it might as well not exist. And it should have a comprehensive test suite. And an issue tracking system. And code reviews. Well, you see where this is going. All of these things are essential good software engineering practices, but each one is also a barrier to entry for a lot of people, and at some point we need to make an intelligent decision about where we want to draw the line. If we chased away every potential code contributor with onerous requirements, we'd certainly solve the problem of tool migration because we wouldn't have any tools. RoySmith(talk)01:58, 30 September 2024 (UTC)[reply]
Yes, I already stated I don't think English Wikipedia can afford to limit its potential developer pool for all its tools. I think when a tool or bot is planned for deployment, we need to decide how important it is to have some form of succession plan in place. In many cases, we may just live with the risk. For some key processes, we may want to plan for future transition to different maintainers. isaacl (talk) 02:33, 30 September 2024 (UTC)[reply]
I have nothing against publishing my code. I acknowledge that the code quality is far from ideal, since I don't get a lot of time to work on it, and my decision to use C# may not have been the best. The bots have worked their way into some of our processes, so it might be good if someone could at least see what they do in case something happens to me. I would have to go through the files and add the appropriate copyright notices. I am unsure what counts as an open licence; my intent was always to use GPLv3. Once a licence is applied though, it will be hard to change, so a grandfather clause would be necessary. And yes, moving things to a new operating environment is a real pain; Toolforge now insists that everything run in a Docker container. Getting that to work has been taxing, and would have been impossible without a lot of help from the Toolforge admins. Hawkeye7(discuss)20:25, 29 September 2024 (UTC)[reply]
I think the docker container part is just picking an image on toolforge. You don't need to install docker on your local computer, nor do you need to know much about docker except for the one CLI command to run and which image you are going to pick from the list of images. My bots are written in PHP and I use xampp locally to run them when I am doing coding and manual tests. –Novem Linguae (talk) 15:59, 30 September 2024 (UTC)[reply]
Not an image, but a cloud native build pack. The build pack creates the container image. I had to get them to install a dotnet build pack for me. Running docker locally was no problem; getting the application to work properly in that operating environment took more effort. Hawkeye7(discuss)21:30, 30 September 2024 (UTC)[reply]
I previously looked at getting a .NET app to run on Toolforge but they didn't have any "official" support so I couldn't be bothered to figure it out and I concluded I would need to get in touch with them to add this support. This is in addition to all the hoops one has to jump through to figure out how to do things there. None of it is user-friendly, that's for sure, speaking of barrier to entry and all that. Anyway, this [1][2] did not exist at the time. I might ping you at some point to ask you how exactly you did it if I can't figure out stuff from the documentation. I don't suppose you recorded the steps you took to get it working? — HELLKNOWZ∣TALK22:27, 30 September 2024 (UTC)[reply]
Based on the above, there appears to be a reasonable consensus that most (if not all) bots that do "core functions" (my phrasing) should have their code posted so that if the operator disappears (intentionally or not) the functionality can be quickly/easily/efficiently ported onto a new bot/operator who can take over the task. I have started a (currently empty) list above, and would invite editors to add and discuss the list so that we can start asking operators to provide the code if deemed necessary. Personally speaking I think this list should focus on open-approval tasks (i.e. not one time runs) to start, but if someone wants the code for OTRs feel free to ask. Primefac (talk) 19:30, 29 September 2024 (UTC)[reply]
I think there's a tendency for the general editing population to think if the code is available, it should be easy for anyone to step into the void and quickly get a bot running, but that's a fallacy. I think if we're going to introduce requirements for succession planning, they should cover a bit more (as I discussed in an earlier comment). isaacl (talk) 21:37, 29 September 2024 (UTC)[reply]
I think the consensus is for a stronger policy than saying it's "preferable" (which we already do). We could make some exceptions for long-standing bots, such as AAlertBot, which is I think what you are primarily concerned about. – SD0001 (talk) 12:04, 30 September 2024 (UTC)[reply]
A mandatory policy is a recipe for "consensus" to shut bots down or worse remove bot privs fpr being a rouge operator. What else could "mandatory" mean? Which is like that Vietnam War saying, "We had to destroy a village to save it" (variations of this quote). Isaacl is exactly right that dumping a bunch of source to GitHub is meaningless for anyone trying to install and operate it. And some bots the operation requires a lot of training that is not easy to document. -- GreenC23:47, 30 September 2024 (UTC)[reply]
Where in my post did I say mandatory? I did not, so to disagree with something I didn't say is a little odd. This sub-thread is about taking the first steps - right now we don't even know which bots have open-source or freely-available code bases, or where they're hosted, etc. Primefac (talk) 12:38, 5 October 2024 (UTC)[reply]
I get where everyone is coming from with the desire to make sure bots keep running smoothly, but it's not clear to me that there's consensus making open source mandatory. I'm concerned that:
The focus is open source and adding extra requirements instead of having succession plans.
There aren't clear definitions for terms like "critical" or "core" and we don't have a list of the bots that would be impacted.
Grandfathering some bots might mean we end up with all of the downsides that will discourage future development without significantly improving continuity.
I've only written one bot so far, one that's trivial to set up and also open source, but it likely wouldn't exist if I had to produce open-source code before getting project approval or had been required to use Toolforge for my first project. The problem that Protection Helper Bot solves has been a Phabricator ticket since 2012 and proposed multiple times before and since then (such as this discussion in 2017). We should be encouraging new developers to help solve long-standing problems rather than throwing up roadblocks, even if they seem like low bars to most experienced Wikipedia developers. Daniel Quinlan (talk) 01:03, 1 October 2024 (UTC)[reply]
You're right, "mandatory" was used by another commenter. However, I do actually believe setting the expectation that core functions should make their code available would likely turn that expectation into a requirement in practice. The policy already recommends it and that seems to be interpreted aggressively at times in BRFA discussions. I would also like to understand the current situation before changing the policy. Daniel Quinlan (talk) 01:21, 6 October 2024 (UTC)[reply]
Your bot was the exception as it's an adminbot that touches protection of articles. Most BRFAs have no requirement or request to release their source, and don't in practice. ProcrastinatingReader (talk) 21:13, 14 October 2024 (UTC)[reply]
I agree with the bot policy that source code for adminbots should be open or the developer must present such code for review upon request from any BAG member or administrator. My previous comments should not be interpreted as contradicting that. I designed my bot to be easy for anyone to run by releasing the code as open source and ensuring it's easy to set up. However, I believe it's fair to say that some of the additional requirements that have been discussed would have likely deterred me from submitting a BRFA. Daniel Quinlan (talk) 23:11, 14 October 2024 (UTC)[reply]
Mandatory or whatever aside, I think there is merit to us having a list of what we think are "essential" bots, along some idea of what the succession strategy for these bots is. (any of: is source available? are they hosted on Toolforge with multiple maintainers?) ProcrastinatingReader (talk) 21:12, 14 October 2024 (UTC)[reply]
I added a couple bots to the list started above. I also included how esoteric their tech stack would be considered these days (ie: how easily could someone take over maintaining it with updates/fixes). Bots using pywikibot or mwbot-rs for example I think are quite accessible. Custom C++ code or even Perl code I'd say is not particularly easy to take over. Realistically there's nothing we can do about these, but it's worth remaining aware of our bus factor.I'm loosely defining "core" as the bot disappearing causing noticeable disruption to the encyclopaedia, some significant process, or otherwise meaningfully impacting the quality of articles. ProcrastinatingReader (talk) 21:46, 14 October 2024 (UTC)[reply]
I suggest creating a parent list of key English Wikipedia processes/ongoing work items, and under them listing the essential automated tasks for those processes/work items. (I understand that some bots may be grouped under multiple processes/work items.) At the very least, it would be helpful to those not familar with all the bots if the list could include a brief summary of their essential tasks. isaacl (talk) 21:54, 14 October 2024 (UTC)[reply]
Yes. Basically a breakdown by workflow: here's an important process (which might be doing a set of ongoing work items), and here're the key elements that are automated in order to make this sustainable. I was thinking that depending on the size of the lists, or the number of bots that support multiple workflows, it might be worthwhile to keep the bot list with its details separate, and just have the workflow list point to the bots in the bot list. I feel this makes it easier to think about what workflows are absolutely necessary to keep running (and think of ones that are missing from the list), and to know what they rely on. isaacl (talk) 22:23, 14 October 2024 (UTC)[reply]
Thanks isaacl, I think this is a good idea. I'd like to suggest a single list for now, unless it transpires that it's common for single bot accounts to do multiple core tasks? I think it's easier than correlating entries across two lists, if we can avoid it.
I am thinking there's a few things we should understand about each bot, rather than just asking "is the source available". I've tried to summarise these in the lead of Wikipedia:Core bots. I give the example of ClueBot NG there - I think the fact that it's an ANN model using C++ and an uncommon C++ framework means someone outside the core development team is unlikely to be able to pickup that bot as-is and realistically maintain it, as opposed to just running it.
With that in mind, I'm wondering if it might be good to develop a simple criteria to assess a bot against, to serve as a decent summary statistic compared to raw-text comments. e.g. categories like: "source available and executable?" / "multiple maintainers?" / "maintainable tech stack?", on which a bot can get a binary score (good/bad). These categories are mainly just to illustrate the idea. I'm not fixed on what kind of framework we should assess bots against for a realistic 'operational resiliency' strategy. ProcrastinatingReader (talk) 11:06, 16 October 2024 (UTC)[reply]
Could also do a scale of 1-5. Being hosted on Toolforge could add a point, source code published could add a point, active maintainers could add a point, etc. –Novem Linguae (talk) 13:25, 16 October 2024 (UTC)[reply]
Point systems are pointless (pun unintended). Let's not have a metric that serves no actionable purpose for sake of having one.
Legobot (talk·contribs) once an hour (i) detects {{rfc}} transclusions that lack a |rfcid= parameter, and adds one; (ii) ensures that the next valid timestamp after every existing {{rfc}} tag is less than thirty days in the past, and if not, removes the {{rfc}} tag and also removes the RfC statement from all of the listings (such as WP:RFC/BIO); (iii) checks the RfC category parameters for each {{rfc}} transclusion, such as |bio, and ensures that the RfC is listed on corresponding pages such as WP:RFC/BIO
Yapperbot (talk·contribs) (also once an hour, but half an hour after Legobot) sends messages to user talk pages concerning RfCs where Legobot has recently added a |rfcid= parameter, see WP:FRS
I don't see a lot of utility in a single summary score. I think the number of core bots should remain below the threshold where a group of people could go through them and determine relative priorities for attention. Plus, in a volunteer environment, who works on helping with what bot is going to be highly influenced by personal interest in the associated workflow, in any case. For an individual characteristic like "maintainable tech stack", there could be some usefulness in having a score, to help those not familiar with the details of the related technology to make relative comparisons. I would consider it to be more descriptive than analytic, though, to avoid getting bogged down in its precision. isaacl (talk) 15:55, 16 October 2024 (UTC)[reply]
Reuse for bots and tools
Somewhat orthogonal to all this the above thread, in an ideal world, I'd love to see greater standardization across bots and tools. I've written a few of my own tools, but I spent a lot of time reinventing a lot of wheels to make them work. I know there's been some progress in this area (pywikibot is certainly a step in the right direction) but there is still a lot of effort expending by people running in different directions. Which in turn makes it harder to have people pick up other people's projects. RoySmith(talk)16:35, 16 October 2024 (UTC)[reply]
@RoySmith: as I imagine you've already heard, frameworks are wonderful: everyone should have one :-). It's a big challenge to create one that is usable by others, with sufficient documentation. There's Help:Creating a bot § Programming languages and libraries, but it's mostly just a big list, without much guidance to help someone decide on what to use. I feel there should be a location for programmers to share experiences, but I'm not sure where that is. Wikipedia_talk:Bots redirects to Wikipedia:Bots/Noticeboard, whose header makes it sound more like a co-ordination spot than somewhere to collaborate on development. isaacl (talk) 21:31, 16 October 2024 (UTC)[reply]
Some statistics would go a long way. Help:Creating a bot lists a lot of options that nobody would recommend nowadays. While I don’t believe we should mandate any language or toolkit, it would help to inform new developers which languages and toolkits are used in bots in active use, especially more recent bots. At least half of the current BRFAs are Python with Pywikibot. Daniel Quinlan (talk) 22:18, 16 October 2024 (UTC)[reply]
I agree that knowing some basic usage info would be helpful. Whenever I look at a third-party library/framework/tool, I want to know how popular it is and how actively it is maintained, in order to get a sense of how likely it is to continue to maintained in future, how easy will it be able to find answers to questions, and how useful have others found it. But circling back to the problem of overhead scaring away developers, this also applies to those creating code and tools for reuse. Tracking this info and keeping it up to date is extra work, and it might be less interesting for a one-person team than working on their project. isaacl (talk) 22:55, 16 October 2024 (UTC)[reply]
Update the global bots section
Hi, currently the global bots policy says that (Wikipedia:Global_rights_policy#Global_bots) global bots can only run on this wiki for the purpose of fixing double-redirects. I believe that is outdated, because it links to a discussion from 2008 and Meta policies have changed global bots in 2021 to allow running any task that is approved. I think this requires a change on this wiki as well, as otherwise it can be rather confusing. I propose allowing global bots to run here for any approved task, especially since en.wikipedia will be notified when anyone submits a global bot request. After all, this wiki can still instruct any bot not to run here, if required. If not, I recommend adding this wiki to the opt-out set so that global bots are completely disabled (rather than enabled for just one purpose). Leaderboard (talk) 06:49, 12 September 2024 (UTC)[reply]
Applying for local bot approval here should still be required for bots here, our community and project are huge and expect our bot operators to be engaged here. As far should we kick out all global bots that are doing the task they are already approved for, that doesn't seem to be necessary. — xaosfluxTalk09:32, 12 September 2024 (UTC)[reply]
Any change to the policy would need consensus for that change, here on the English Wikipedia. That discussion could be held here, but would need to be an actual RFC or other widely-advertised and widely-participated in discussion. Personally, I'd want to see more reason to change the policy than given so far. Anomie⚔12:07, 12 September 2024 (UTC)[reply]
@Daniel Quinlan This is more for consistency, because I would normally not request local bot rights on a wiki that's not on the global bots opt-out set unless I know that it does not accept all kinds of global bots. I do not have any specific global bots in mind for this reason. Other wikis in this group include the Russian Wikipedia, where global bots are allowed but appear to reference old policies. Leaderboard (talk) 14:02, 2 October 2024 (UTC)[reply]
The change on meta changed the requirements for meta. It did not change the requirements here, so when you say I think this requires a change on this wiki as well, it reads as if you think we must be consistent with global policy. "Confusing" it is not: we simply have different requirements. Izno (talk) 17:21, 2 October 2024 (UTC)[reply]
@Izno Actually, that is what I was hinting at - "fixing double redirects" is just a single task that I am not convinced is needed as a specific exemption. And I was also saying that en-wiki is not the only wiki in this group, where it appears that the rules were created when the global bot policy was more restrictive. Also the policy change in Meta did change the rules for every wiki allowing global bots that did not explicitly have a restriction. Leaderboard (talk) 06:03, 3 October 2024 (UTC)[reply]
If I'm understanding correctly, meta has a policy allowing global bots, but that policy doesn't mandate that the bots can be run on the individual wikis without their consent. Each wiki can make its own decisions on what bots it wants to accept. isaacl (talk) 06:56, 3 October 2024 (UTC)[reply]
@Isaacl Yes and no. The global policies are global and individual wikis cannot "opt-out" of it. However, individual wikis can set preferences in terms of how they want global bots to use their bot flag, but I've seen a few wikis (eg this one) that references old Meta policies in doing so (which is what I want to correct) - I've not seen a single wiki that explicitly sets such restrictions while referencing the updated 2021 rules - nor have I seen any wiki yet have restrictions other than allowing fixing double-redirects/interwiki language links.
And also, regarding "policy doesn't mandate that the bots can be run on the individual wikis without their consent", yes it isn't a "mandate", but the whole point of global bots is to avoid having operators request bot flags on every wiki, and hence if I were a global bot operator, I wouldn't go around asking for permission on global bot-approved wikis, unless I already know that the said wiki does not allow global bots for any approved purpose. And I cannot do this for all of the 800+ wikis that allow global bots either.
TLDR; yes "wiki can make its own decisions on what bots it wants to accept", but to do so would kind of defeat the purpose of global bots. Leaderboard (talk) 07:31, 3 October 2024 (UTC)[reply]
You're making a false distinction in I wouldn't go around asking for permission on global bot-approved wikis, unless I already know that the said wiki does not allow global bots for any approved purpose, which IMO makes your argument unconvincing. If you (as a global-bot operator) know enwiki only allows interwiki-fixing global bots without separate approval, why would you not ask for permission if you want your double-redirect-fixing bot to run here? If your only complaint is that meta:Bot policy/Implementation#Where it is policy isn't clear enough, an easier solution might be to improve that page. The global policies are global and individual wikis cannot "opt-out" of it. Except this one they can. Even if the global bot policy didn't explicitly say so, I think you'd find that we don't necessarily accept here any random "policy" that someone on Meta declares is "global". I've not seen a single wiki that explicitly sets such restrictions while referencing the updated 2021 rules You have, this one. Anomie⚔12:12, 3 October 2024 (UTC)[reply]
@Anomie The bot page links to a discussion from 2008, not 2021. Put it this way: I know I have to apply for local bot rights. Would someone else not familiar with en.wiki? Leaderboard (talk) 21:15, 3 October 2024 (UTC)[reply]
The global policy says The operator should make sure to adhere to the wiki's preference as related to the use of the bot flag. It explicitly allows each wiki to make its own decisions on what bots it wants to accept. isaacl (talk) 15:58, 3 October 2024 (UTC)[reply]
@Isaacl I don't dispute that, and I don't also dispute that en.wiki is not doing anything wrong per se. However, I do believe that en.wiki created this exemption in 2008 when Meta rules were different, and believed that it needs at least a relook in 2024. How and what I'm not too bothered with - the other contributors have a lot more experience than I. Put it this way: I would rather have en.wiki put itself in the global bot opt-out set so that it's clear to everyone that you must apply for a local bot flag, rather than this weird one-task exception which isn't obvious unless you actually go to the bot policy (and it's not like it's any more difficult for bot operators fixing double-redirects to file a local bot flag request than a global bot operator for any other task). Leaderboard (talk) 21:21, 3 October 2024 (UTC)[reply]
As I mentioned back at the beginning of this, if you want a reexamination then creating an RFC is the way to go. If you want to start drafting one (I recommend a draft to reduce the chance of confusing wording issues), feel free. Anomie⚔11:27, 4 October 2024 (UTC)[reply]
I'm not motivated enough to do it - you all have more experience than I. I just posted here as a suggestion for improvement - it appears to me that the community does not feel this to be worth it. Leaderboard (talk) 19:37, 4 October 2024 (UTC)[reply]