This page is within the scope of WikiProject AI Cleanup, a collaborative effort to clean up artificial intelligence-generated content on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.AI CleanupWikipedia:WikiProject AI CleanupTemplate:WikiProject AI CleanupAI Cleanup
To help centralize discussions and keep related topics together, all non-archive subpages of this talk page redirect here.
I wanted to share a helpful tip for spotting AI generated articles on Wikipedia
If you look up several buzzwords associated with ChatGPT and limit the results to Wikipedia, it will bring up articles with AI-generated text. For example I looked up "vibrant" "unique" "tapestry" "dynamic" site:en.wikipedia.org and I found some (mostly) low-effort articles. I'm actually surprised most of these are articles about cultures (see Culture of Indonesia, Culture of Qatar, or Culture of Indonesia). 95.18.76.205 (talk) 01:54, 2 September 2024 (UTC)[reply]
I have never used an automatic AI detector, but I would be interested to know why the advice is "Automatic AI detectors like GPTZero are unreliable and should not be used."
Obviously, we shouldn't just tag/delete any article that GPTZero flags, but I would have thought it could be useful to highlight to us articles that might need our attention. I can even imagine a system like WP:STiki that has a backlog of edits sorted by likelihood to be LLM-generated and then feeds those edits to trusted editors for review.
It could indeed be useful to flag potential articles, assuming we keep in mind the risk that editors might over-rely on the flagging as a definitive indicator, given the risk of both false positives and false negatives. I would definitely support brainstorming such a backlog system, but with the usual caveats – notably, that a relatively small false positive rate can easily be enough to drown true positives. Which means, it should be emphasized that editorial judgement shouldn't be primarily based on GPTZero's assessment.Regarding the advice as currently written, the issue is that AI detectors will lag behind the latest LLMs themselves, and will often only be accurate on older models on which they have been trained. Indeed, their inaccuracy has been repeatedly pointed out. Chaotic Enby (talk · contribs) 14:54, 11 October 2024 (UTC)[reply]
How would you feel about changing the text to something like "Automatic AI detectors like GPTZero are unreliable and should over ever be used with caution. Given the high rate of false positives, automatically deleting or tagging content flagged by an automatic AI detector is not acceptable." Yaris678 (talk) 19:27, 15 October 2024 (UTC)[reply]
That would be fine with me! As the "automatically" might be a bit too restricted in scope, we could word it as "Given the high rate of false positives, deleting or tagging content only because it was flagged by an automatic AI detector is not acceptable." instead. Chaotic Enby (talk · contribs) 19:46, 15 October 2024 (UTC)[reply]
I support that wording. I use GPTZero frequently, after I already suspect that something is AI-generated. It's helped me avoid some false positives (human-generated text that I thought was AI), so it's pretty useful. But I'd never trust it or rely on it. jlwoodwa (talk) 16:15, 16 October 2024 (UTC)[reply]
I don't really see the need for detectors at this point, as it's usually pretty clear when an editor is generating text. As you say, the worry is false positives, not false negatives; these are pretty quickly rectified upon further questioning of the editor. Remsense ‥ 论00:08, 6 November 2024 (UTC)[reply]
What to do with OK-ish LLM-generated content added by new users in good faith?
After opening article Butene, I noticed the headline formatting was broken. Then I read the text and it sounded very GPT-y but contained no apparent mistakes. I assume it has been proofread by the human editor, Datagenius Mahaveer who registered in June and added the text in July.
I could just fix the formatting and remove the unnecessary conclusion but decided to get advice from more experienced users here. I would appreciate if you put some kind of a brief guide for such cases (which, I assume, are common) somewhere BTW! Thanks in advance 5.178.188.143 (talk) 13:57, 19 October 2024 (UTC)[reply]
Hi! In that case, it is probably best to deal with it the same way you would deal with any other content, although you shouldn't necessarily assume that it has been proofread and/or verified. In this case, it was completely unsourced, so an editor ended up removing it. Even if it had been kept, GPT has a tendency to write very vague descriptions, such as polybutene finds its niche in more specialized applications where its unique properties justify the additional expense, without specifying anything. These should always be reworded and clarified, or, if there are no sources supporting them, removed. Chaotic Enby (talk · contribs) 15:24, 19 October 2024 (UTC)[reply]
Great, now I've been guilt-tripped and can't refuse! I'll go at it, should be fun – and thanks for setting up the skeleton! (I was thinking of also having a kind of flow diagram like the NPP one) Chaotic Enby (talk · contribs) 19:10, 19 October 2024 (UTC)[reply]
Hi Folks!! I'm looking to catch up to the current state. I reviewed an article during the last NPP sprint as an IP editor had flagged it with LLM tag. I couldn't say for sure if it was generated or not, so I'm behind. I sought advice and was pointed here. I was generated in fact. So I'm looking any flagged articles that you happen to come across, so I can take a look and learn the trade, chat about and so on, so to speak. I've joined the group as well. Thanks. scope_creepTalk14:01, 24 October 2024 (UTC)[reply]
The pre-ChatGPT era
We may want to be more explicit that text from before ChatGPT was publicly released is almost certainly not the product of an LLM. For example, an IP editor had tagged Hockey Rules Board as being potentially AI-generated when nearly all the same text was there in 2007. (The content was crap, but it was good ol' human-written crap!) Maybe add a bullet in the "Editing advice" section along the lines of "Text that was present in an article before December 2022 is very unlikely to be AI-generated." Apocheir (talk) 00:57, 25 October 2024 (UTC)[reply]
So far, I haven’t seen anything that I thought could be GPT-2 or older. But I did run into a few articles that seem to make many of the same mistakes as ChatGPT, except a decade earlier.
If old pages like that could be mistaken for AI because it makes the mistakes that we look for in AI text, that does still mean that’s a problematic find; maybe we should recommend other cleanup tags for these cases. 3df (talk) 22:53, 25 October 2024 (UTC)[reply]
I think that's very likely an instance of "bad writing". Human brains have very often produced analogous surface-level results! Remsense ‥ 论23:05, 25 October 2024 (UTC)[reply]
Yes, I have to say, ChatGPT's output is a lot like how a lot of first- or second-year undergraduate students write when they're not really sure if they have any ideas. Arrange some words into a nice order and hope. Stick an "in conclusion" on the end that doesn't say much. A lot of early content on Wikipedia was generated by exactly this kind of person. (Those people grew out of it; LLMs won't.) -- asilvering (talk) 00:31, 26 October 2024 (UTC)[reply]
I ran this text from 2017 version. GPT Zero said 1% chance of AI.
FIH was founded on 7 January 1924 in Paris by Paul Léautey, who became the first president, in response to field hockey's omission from the programme of the 1924 Summer Olympics. First members complete to join the seven founding members were Austria, Belgium, Czechoslovakia, France, Hungary, Spain and Switzerland. In 1982, the FIH merged with the International Federation of Women's Hockey Associations (IFWHA), which had been founded in 1927 by Australia, Denmark, England, Ireland, Scotland, South Africa, the United States and Wales. The organisation is based in Lausanne, Switzerland since 2005, having moved from Brussels, Belgium. Map of the World with the five confederations. In total, there are 138 member associations within the five confederations recognised by FIH. This includes Great Britain which is recognised as an adherent member of FIH, the team was represented at the Olympics and the Champions Trophy. England, Scotland and Wales are also represented by separate teams in FIH sanctioned tournaments.Graywalls (talk) 00:03, 6 November 2024 (UTC)[reply]
There's probably more bad than good writing on the Internet, and all LLMs have been extensively trained on all this bad writing, that's why they are prone to be like it 5.178.188.143 (talk) 14:23, 17 January 2025 (UTC)[reply]
AI account
Special:Contributions/Polynesia2024. Their contribution pattern is suspicious. No matching edit summaries and content dump in thousands of bytes minutes apart over many articles. Some of their inserted contents test as high as 99% AI, such as the contents they inserted into Ford. What is the current policy on AI generated contents without disclosure? Perhaps it could be treated as account sharing (because the person who has the account isn't the one who wrote it) or adding contents you did not create. Graywalls (talk) 23:53, 25 October 2024 (UTC)[reply]
There isn't technically any policy on not disclosing AI content yet, even in obvious cases like this one. However, the user who publishes the content is still responsible for it, whether it is manually written or AI-generated, so this would be treated the same as rapid-fire disruptive editing, especially given their unresponsiveness. Chaotic Enby (talk · contribs) 00:25, 26 October 2024 (UTC)[reply]
Ski Aggu is potentially stuffed with fake sources that do not work or sources may not directly support contents. CSD request was denied. I'm not going to spend the time to manually check everything but putting it out there for other volunteers to look. Unfortunately AI spam bots can apparently churn out tainted articles and publish into articles, but there's more procedural barrier to their removal than creation. Graywalls (talk) 16:19, 26 October 2024 (UTC)[reply]
The whole first block is two passing mentions, a couple youtube videos and many Discog style album listing sites. There is nothing for a blp. Several of them don't mention him. They are fake. scope_creepTalk16:49, 26 October 2024 (UTC)[reply]
Editor with 1000+ edit count blocked for AI misuse
Wow, I think that would be a quagmire if we were specifically looking for LLM text, as detection would be slow and ultimately questionable in many instances. We could go through and verify that the info added in those edits is verifiable, but I wouldn’t go beyond that, nor do I think there is a need to go beyond that. — rsjaffe🗣️14:28, 26 October 2024 (UTC)[reply]
Unfortunately this user's pattern of LLM use goes a lot further back. I've already started cleaning up Specific kinetic energy and Specific potential energy; I've also tagged the two sections he added to Molecular biology (which appear to be LLM-generated summaries of the linked main articles, they'll probably turn out to be OK as long as someone with subject matter knowledge can review and source them).
While this isn't how I found these pages (was following up on this user's non-AI-assisted bad edits), it's notable that Molecular_biology#Meselson–Stahl_experiment (added in 17 April) was a 100% AI match on gptzero. I don't think that automated detection is reliable enough to justify straight-up banning people, but it's probably reliable enough to justify flagging repeat offenders for manual review. Preimage (talk) 12:39, 6 December 2024 (UTC)[reply]
Est-ce qu'il y a une contributrice ou un contributeur (suisse) pour parler de l’arrivée des textes/photos IAG sur Wikipédia et de votre expérience. Il doit rendre son sujet déjà demain, mardi 29 octobre à 12h ?
Merci de prendre contact avec moi. Je suis la responsable d'outreach & communication.
Besides the GPTZero results, there's also the fact that this was added in a section it had nothing to do with, and doesn't actually make sense in context. In WikiEdu classes, I wouldn't be surprised if there was a certain amount of students using AI even if not told to do it, whether looking for a shortcut or simply not knowing it's a bad idea. It could be good to ask the instructor and staff to explain to students more clearly that they shouldn't use AI to generate paragraphs. Chaotic Enby (talk · contribs) 20:46, 5 November 2024 (UTC)[reply]
@FULBERT:, we just went over this regarding one of your students. @Ian (Wiki Ed):, I've cleaned up after a different student of yours in Job satisfaction and now another editor MrOllie is having to clean up after them again. When multiple editors have to clean up after WikiEDU student edits multiple times, it is disruptive. Graywalls (talk) 20:52, 7 November 2024 (UTC)[reply]
this edit at Test anxiety. AI detection site says "Possible AI paraphrasing detected. We are highly confident this text has been rewritten by AI, an AI paraphraser or AI bypasser." with a 100% score. It talks about tests having been done and things being examined, but I see nothing of relevance about test anxiety or issues to do with academic "exams or anything of relevance to test anxiety in the source. The source is open access with full PDF access. How's this edit look to you all? @MrOllie, Chaotic Enby, Ian (Wiki Ed), and FULBERT:Graywalls (talk) 02:54, 9 November 2024 (UTC)[reply]
Gaining insight into this interaction can aid in creating more potent support plans for academic anxiety management. - This definitely feels LLM generated. Sohom (talk) 02:58, 9 November 2024 (UTC)[reply]
@Graywalls @Chaotic Enby @Ian (Wiki Ed) @Sohom @MrOllie This is all very appreciated. I have spoke with my students (in general) in our last class session and will do so again when we meet today to reiterate this. Additionally, I have met individually with several students whose edits have been identified above. I have also revised some of my own (saved) verbiage to use when reviewing edits to account for the concerns you have all helpfully raised above. While I am glad for this support for my students, I am also concerned on a more general level with this behavior happening across all our projects (and workplaces), and think these discussions, along with the verbiage we use to address the issues as instructional opportunities, present us with challenges that go beyond the concerns many of us initially faced with plagiarism alone. Please let me know if you detect this with any of my students again this term, and I thank you all again for your help and guidance here. FULBERT (talk) 17:42, 14 November 2024 (UTC)[reply]
Thanks a lot for your help in this, and for speaking with your students about this issue. There are definitely challenges to be addressed, and this is the reason behind this project. Chaotic Enby (talk · contribs) 19:04, 14 November 2024 (UTC)[reply]
Hello, Our German project KI (AI in German) and Wikipedia starts now and your project was in a significant medium here. We will study your experiences carefully. I have a little introductory question. You write: To identify text written by AI, and verify that they follow Wikipedia's policies. Any unsourced, likely inaccurate claims need to be removed. You write later, that the source can be hallucinated completely or it's not the correct content - also our experiences. When the AI texts will become better and formal (style) identification "AI generated" will be more difficult: Is there an alternative to checking everything? We have "Sichten" (reviewing all edits from new users) with a jam of 18.520 edits, waiting up to 55 days - checking only against vandalism. When a deeper check will be necessary I see problems regarding resources. Using AI itself for it? Thanks for hints regarding the future development. --Wortulo (talk) 17:36, 9 November 2024 (UTC)[reply]
@Wortulo Hello! Congratulations for setting up the project – it looks to be very well-organized, and I would be happy if our teams could work together to take inspiration from each other's experiences. What you mention is indeed an issue we've been keeping in mind. Source checking is indeed something that might become necessary in the future to a larger extent, although for now it's something that we have only been doing in cases where we already have suspicions. Regarding the new edit queue, while we don't have a direct equivalent here, we do have the Wikipedia:New pages patrol, where source checks of new pages are indeed performed. We do not yet use AI tools to assist us in this, as they are quite unreliable, although we are experimenting with potentially using some in the future. Chaotic Enby (talk · contribs) 17:51, 9 November 2024 (UTC)[reply]
Thank you very much for your quick and constructive reply. I did describe your project in more detail with the relevant links. We have our first online meeting on 27 November and I will also suggest something like this as an option. So far, it has been organised on the basis of individual initiatives here. I agree with you that tools are still too unreliable for recognition "AI generated" at the moment (false positive when AI has been used to improve style f.e.) They will also continue to evolve (perhaps we can help) and will become necessary if things develop as I suspect. Hallucinations will perhaps also be prevented by AI itself one day, but the danger that they will be less easily recognised will probably come sooner. An exchange of information would be greatly appreciated. Thank you in advance and I would be happy about this. Wortulo (talk) 08:47, 10 November 2024 (UTC)[reply]
@Wortulo Thanks a lot! Don't hesitate to ask me if you have any more questions. I would also be interested in how the meeting goes, if any points that are brought up could also apply here! I've been reorganizing the English Wikipedia's project these last few days – if you have any comments or advice, do let me know! Chaotic Enby (talk · contribs) 23:42, 12 November 2024 (UTC)[reply]
I will do so. I have seen your new page "resources" and have linked the website. Allow me to ask regarding 2 typical examples from November:
Background: when in de:WP an article has been identified, often a long discussion will follow and then a deletion or not (it is necessary to convince an admin to delete). When we start, we want an easy and unique solution. Have you some hints? The second question: I am sure you know DEEPL to translate and DEEPL write to improve Texts (or similar tools). From my experience, there is often a false positive identification as KI generated then with the tools. Probably the differences in formulation will decrease, but hallucination percentage will remain (due to the technology itself). I also have no solution, but what do you think about this? Declaration obligation? Wortulo (talk) 07:47, 13 November 2024 (UTC)[reply]
Hi! Like many maintenance categories, it is a hidden category, and isn't visible from the article page. Regarding the {{llm}} template, it is an alias for {{ai-generated}} – there is no separate LLM-exclusive project, and the essay Wikipedia:Large language models is in the scope of the current project. Except articles that are clearly AI-generated hoaxes (which we can speedily delete), articles often have to be rewritten (as Wikipedia:Articles for deletion is rarely appropriate for cleanup issues), although Wikipedia:Draftification is usually an option for new creations. Regarding tools like DeepL Write, I wouldn't necessarily call it a "false positive" – rewriting using AI can still bring issues of non-neutral tone and implied synthesis. In the latest discussion (nearly a year ago), there wasn't a consensus to require declaring the use of such tools, although consensus can change. Chaotic Enby (talk · contribs) 12:17, 13 November 2024 (UTC)[reply]
Thanks. I am a "DAU" (in German the silliest acceptable user and common - that all explanations must be as comprehensible as possible) ;-) I tried to understand what's to do, if I would contribute. The alias t|llm is not in your new list "Resources", then a message is visible for the reader (when using t|ai-generated it is not). So for me it's unclear, when I use what of these two. You should explain? And as a DAU I also was unable to find the hidden category at all when I edit (set on article or talk?) Drużyna coat of arms is still another example in your list of November where I do not find the hidden category. I have some experiences in our project Payed editions. There we have templates with a clear text on discussion page - also connected to a hidden category. It explains in general, what happens and it is possible to explain the specific things and to discuss then. In this direction I see our adaptation of your good idea (!!!). KISS (Keep it smart an simple) is a goal. Wortulo (talk) 06:53, 14 November 2024 (UTC)[reply]
Greetings from dewiki! A while ago (and with some LLM help), I wrote a python script that looks at checksums in ISBN references. I was able to identifiy 5 articles with hallucinated references. Please note that there can be many other causes for checksum fails in ISBN references (clueless publishers, for example) aside from honest mistakes and switched numbers. The file can be found at github. There is also a list of Wikipedia articles from the English language Wikipedia with failed checksums in ISBN references. The list is online at github, too. For example, I have some concerns about the articles Battle of Khosta and Spanish military conspiracy of 1936. I would like to ask someone more familiar with the English language and the subject matter to look into it. Thank you in advance. -- Mathias Schindler (talk) 09:35, 14 November 2024 (UTC)[reply]
Greetings! Thanks for the link, it would be great to incorporate this script in the New Pages Patrol workflow if possible. I'm wondering if there would also be a way to check if existing ISBNs match the given book title?Looking at the first example, one of the books cited (The Circassians: A Handbook) does exist, but with a completely different ISBN. The Circassian Genocide appears to exist, but with at least two different ISBNs, both similar to the given one but ending in ...87 or ...94 instead of ...82. The other ISBNs aren't present on the page anymore, with 0714640369 being (wrongly) used to refer to Muslim Resistance to the Tsar: Shamil and the Conquest of Chechnia and Daghestan, which has since been fixed to 9780714634319.The article definitely reads like it has been AI-written, with the "Aftermath" section trying to emphasize how the event demonstrates such-and-such like a typical ChatGPT conclusion: demonstrating their capability to repel, underscored the difficulty Russia faced, highlighted the effectiveness of Circassian guerrilla tactics... Chaotic Enby (talk · contribs) 11:59, 14 November 2024 (UTC)[reply]
Hello! Regarding your first question: I am looking into expanding my script to compare the ISBN with bibliographic databases as well as the title information given in the article to see if they match (at least to a certain degree. I assume some fuzziness will be required here) Thank you for having a look at the examples I gave. As I am not a very active user at enwiki, I will leave it up to the community here to draw further conclusions about what to do with such an article. I will let you know about future versions of my script (and I will definitely look more closely into other identifiers, such as DOIs, URNs etc.) Have a nice day! Mathias Schindler (talk) 20:43, 16 November 2024 (UTC)[reply]
By all means, spread the word! You can freely add and link all the publicly available info anywhere. I am open to suggestions and I would love to hear more from people who have looked into the EN-wiki-ISBN-list that I published, like you did. Mathias Schindler (talk) 08:05, 19 November 2024 (UTC)[reply]
Sykes's nightjar relies heavily on https://animalinformation.com, which appears to be entirely AI-generated (look at the privacy policy, the articles, etc.; see my post on the talk page). I tagged this unreliable sources, but I couldn't find a tag for AI-generated sources, so I'm posting a notification here instead. Mrfoogles (talk) 23:37, 9 November 2024 (UTC)[reply]
Don't think further discussion is needed particularly, thanks for the suggestion though. Do you know if there is a tag for incorrect information? I think in combination with the unreliable sources tag that basically does what is necessary. Mrfoogles (talk) 00:12, 10 November 2024 (UTC)[reply]
i would just remove all the information cited to AI-generated sources, and see if you can find anything reliable to expand it back out with. ... sawyer * he/they * talk00:30, 10 November 2024 (UTC)[reply]
Is there an easy way to generate a list of which images in all of those subcategories are currently in use in mainspace here on Wikipedia? (I'm currently using the VisualFileChange script which can highlight global usage of images in a category, but that's across all Wikipedia projects rather than just the English one, and also includes non-mainspace user page and Signpost usage.) Belbury (talk) 14:30, 13 November 2024 (UTC)[reply]
A translation produced by ChatGPT of Tzetzes's commentary on Lycophron's Alexandra has been linked on 175 pages related to Greek mythology. [3] The translation itself is, suffice it to say, highly problematic, and shouldn't be linked on Wikipedia. Is there an effective automated method for removing these links en masse? Thanks, Michael Aurel (talk) 23:02, 15 November 2024 (UTC)[reply]
While something like AWB could "naively" remove the links themselves, it could be better to look at the articles individually to see whether the material already has good sourcing and the link can be safely removed, or if a substitute translation should be found and added instead. You could also drop a note at WP:RSN so editors can look at the wider website (https://topostext.org) to see if other similar translations are present. That way, the extent of the problem could be more accurately assessed, and future editors will be able to find it in the archives. Chaotic Enby (talk · contribs) 17:37, 18 November 2024 (UTC)[reply]
@Chaotic Enby: Thanks for your reply. Unfortunately, the work hasn't been translated into English by a scholar yet (or out of the original ancient Greek at all, I don't believe), so the only replacement link which we could really provide would to be an old edition of the work in ancient Greek (eg. [4] or [5]), and I imagine adding such links wouldn't be possible with automated tools. A discussion at WP:RSN might be useful, and could help to establish a consensus around how such translations ought to be handled, although I do note that a google search for "chatgpt site:topostext.org" only brings up this translation, which would seem to indicate that this is the only AI-generated translation hosted at that website. (Also, these links were all added by one editor I believe, in good faith but unwittingly, who I contacted before starting this discussion, so hopefully this translation, once removed, won't be linked again.) So, given this, would you say an automated method of removal, while possible, is likely not preferable to a manual approach? Or perhaps someone familiar with AWB could remove the links, and I could go through each page afterwards and manually link a Greek edition, or find a secondary source? – Michael Aurel (talk) 22:44, 18 November 2024 (UTC)[reply]
I would say it is still way preferable to look individually at each use of the source. By the way, especially when dealing with medieval or ancient texts, more recent secondary sources are very much preferred. Tzetzes's commentary might be "secondary" with respect to Lycophron's Alexandra, but given the age of the source, it is indeed best to treat it as a primary document from a historiographical perspective, and to cite secondary sources that discuss it in context. Chaotic Enby (talk · contribs) 23:04, 18 November 2024 (UTC)[reply]
Alright, fair enough. And yes, secondary sources are of course always preferred when dealing with ancient texts. Tzetzes' work, while in some sense "secondary" to Lycophron's I suppose, is functionally a primary source, at least as far as Wikipedia is concerned; my suggestion to replace these with links to a Greek edition was only because in most instances there is almost certainly no secondary source which contains the cited information, due to the obscurity of Tzetzes' text, and its relative insignificance to Greek mythological study. – Michael Aurel (talk) 23:23, 18 November 2024 (UTC)[reply]
175 articles is quite a lot to check. I think we need to find out if the foundation is valid first. A chat at RSN could kick that off. We also need to find out if the translations are accurate, which is the core of it. If this doesn't answer, then they need to be removed. scope_creepTalk08:13, 19 November 2024 (UTC)[reply]
Thanks. I suppose I came here under the assumption that this sort of source wasn't considered acceptable, but perhaps the use of AI-generated translations isn't something which has actually been discussed before, so a precedent-setting discussion could certainly be helpful. – Michael Aurel (talk) 09:25, 19 November 2024 (UTC)[reply]
@Chaotic Enby: What cat does it go it? Couldn't locate it. Found a couple of others incuding Category:Articles containing suspected AI-generated texts from November 2024. There is already 24 artices for Novemeber. scope_creepTalk14:21, 19 November 2024 (UTC)[reply]
Interesting, this could certainly be a useful way of flagging the pages containing this source (and other such sources). Perhaps a new cat for pages containing this tag could be something along the lines of "Articles containing suspected AI-generated sources", as a specific tracking category for this seems as though it could be of use to this WikiProject, seeing as AI-generated sources are presumably only going to crop up more and more frequently. – Michael Aurel (talk) 16:47, 19 November 2024 (UTC)[reply]
To clarify here (as the RSN discussion has now been archived), is the idea to, in an automated manner, add these tags across all of the pages with this source? I've removed around fifty of the links so far (a decent start I suppose), but tagging these would allow this to be designated as an outstanding task, visible and open to others. – Michael Aurel (talk) 09:19, 26 November 2024 (UTC)[reply]
Yep, while removing references in a (semi-)automated way shouldn't be done, tagging them automatically so editors can look more closely at individual instances is definitely helpful. Chaotic Enby (talk · contribs) 12:32, 26 November 2024 (UTC)[reply]
When I was reviewing article in that cat "Ai-generated texts", I sent several articles to draft, in effect an NPP review. I think I did about 6 of them went. One was really bad. scope_creepTalk12:51, 26 November 2024 (UTC)[reply]
Just noting that these are two different cats, "AI generated text" (when the articles themselves are AI-written) and "AI generated sources" (when they cite sources that are AI-written), the tag mentioned earlier puts articles in the latter category. Chaotic Enby (talk · contribs) 13:01, 26 November 2024 (UTC)[reply]
Ah, that's good to know. Though, hmm, would it potentially be easier for you to do it, as you're no doubt experienced with AWB, and I'm assuming it wouldn't take all that long (maybe?) to add tags to this many pages? Though if I'm wrong on either count (or you think it would be better I do it), I'm willing to give it a go. – Michael Aurel (talk) 23:14, 26 November 2024 (UTC)[reply]
Yep, it definitely reads like ChatGPT's attempts at "quirky" humor. There's {{ai-generated}} as a tag you can add if you want. If you have more time, you can look at the history, revert the addition and message the user (either yourself, or Wikipedia:Twinkle has ready-made warnings for that matter). Chaotic Enby (talk · contribs) 21:38, 5 December 2024 (UTC)[reply]
It seems like the most effective way to clean up articles, going through the category of articles tagged as possibly ai-generated, is to just wholesale delete any uncited content, then spot-check sources to see if they support the content. If they don't, then they can be removed and if enough don't, the article can be stubbed as they probably all don't (this is useful when it is impossible to access all of the sources). If they do, the best available option seems to be to just delete the AI tag and presume it's good if the history isn't too suspicious.
This might be helpful to add to the guide. The main problem in fixing possibly AI-generated articles seems to be source access, where AI (possibly) can cite a source you can't access and it's impossible to check. Mrfoogles (talk) 00:58, 6 December 2024 (UTC)[reply]
Feel free to add it to the guide! Important emphasis on the fact that if AI-generated text cites inaccessible sources, it's pretty much guaranteed that the model didn't have access to these sources either, so it can be safely treated as unsourced. Chaotic Enby (talk · contribs) 11:34, 6 December 2024 (UTC)[reply]
I don't think this is AI-generated. I can't see any details that are strange, the focus seems relatively consistent, and it looks a lot like her, which is rare for someone who isn't that famous. Sam Walton (talk) 23:18, 17 December 2024 (UTC)[reply]
Hi all- As a website owner that has been using ChatGPT for years, I believe I can spot signs of AI-generated content pretty quickly. I have a full-time job but would love to assist (to ensure the truth remains true and for my own personal development.)
Hello! A good start would be to install Wikipedia:Twinkle, which allows you to tag articles (including, in this case, with the {{AI-generated}} tag). You can tag pages that you encounter, or look for new additions in Special:RecentChanges! If you see users adding AI-generated content with clear issues (which for now is the vast majority of visible AI-generated content), you can warn them with {{uw-ai1}}. Chaotic Enby (talk · contribs) 21:23, 2 January 2025 (UTC)[reply]
@Aisavestheworld: Also have a go at servicing the Category:Articles containing suspected AI-generated texts catgeory where they end up, to clean the stuff up and remove the article content entries. Be bold and remove the stuff if you see it. This is the greatest literary/encyclopeadic project since the Library of Alexandria, so its worth the time. If your in the NPP/AFC group, post it back on the NPP queue and anything else if you find its troublesome, for example if there is autopatrolled editor is who is using it. If its draft under the 90 day limit, then redraft it and put a clear reason why its been drafted. Speak to the editor and tell them why is not acceptable to post AI slop. Explain it clearly so they realise its not whats wanted, and tell them there is stormy weather ahead if they continue. Be soft, considerate, kind, responsive and helpful. But if you warning them and they don't comply after the four warnings, e.g. disruptive editing, send them to WP:ANI, or here where we can have a group chat e.g. coin. If it doesn't work, out then its ANI. It is far too early to use AI effectively, seems to be the wide consensus, although I think its probably going to be good for diagrams, for example medical diagrams, and physical illustrations but not BLP's portraits or any BLP. Hope that helps. scope_creepTalk16:48, 6 January 2025 (UTC)[reply]
Thank you @Scope creep - Can you help me get started here? I think I just need to know where to go and I can get started: "Category:Articles containing suspected AI-generated texts catgeory". Aisavestheworld (talk) 18:29, 6 January 2025 (UTC)[reply]