Obviously, I am hoping to draw together a project to facilitate the cleanup of copyright issues by interested Wikipedians. In addition to giving guidance on how best to address individual copyright concerns, my hope is to create a gathering point where efforts can be coordinated to clean up massive infringement. For a few examples, see here, here and here. It is also my hope that interested contributors will be encouraged to help rewrite articles listed at WP:CP so that we lose less valuable content. --Moonriddengirl(talk)16:30, 12 March 2009 (UTC)
Wow, I can see you put a lot of work into this. :-) My main concern at the moment is that this page does a great job of summarizing and describing relevant resources (some of which I'd never heard of), but less organization of activities. I suggest having a section or subpage for members, a userbox, a "to do" list, a list of important pages to watch, and so on. You might also consider linking information about how to handle copyvio files on Commons (Commons:Template:Copyvio to tag obvious copyvios for speedy deletion and Commons:Deletion requests for less obvious ones). I might also see if I can set up a public webserver for my ContributionSurveyor tool. Dcoetzee19:43, 13 March 2009 (UTC)
Thanks so much. :) Today is a bear for copyright problems (DumbBot came back online, and today is the day the mass of those are up for evaluation), but I'll try to build this on. I guess at that point, I'll either propose it or just see if there are enough interested parties to go live. The Council likes to see 5 - 10. I don't think there's any kind of canvassing issues by notifying users I think might be interested. --Moonriddengirl(talk)19:48, 13 March 2009 (UTC)
I think this is a fantastic idea. For me at leasy there's a lot of uncertainty where copyright problems are concerned; I think even just the existence of a forum for people who deal with these sort of issues could come and ask questions would be ideal. The presence of a "watchlist" of sorts for certain topics or articles which are prone to copyright issues would be extremely useful, and even just the list of members would provide a valuable point of reference for those looking for answers in specific fields. I think guidelines for how to remove copyrighted material - i.e. looking at sections of the article which don't meet CSB's listed URL and comparing them to different pages on the same site. Some guidance of how to deal with CC vs GFDL issues would be nice too, having discussed them with you, I see just how we need a uniform way to deal with the problems. But yes, a welcome project, in my eyes. – Toon(talk)01:08, 14 March 2009 (UTC)
It occurs to me that a talkpage infobox for such articles (something like: "Please note that television episode summaries must be written in original language, not copied from other sites or press releases, to comply with Wikipedia's copyright policy. They must also remain succinct enough not to infringe as an abridgment." --Moonriddengirl(talk)13:20, 14 March 2009 (UTC)
I have enough approval per Council guidelines to go live, and this does not include those people who have told me here and elsewhere that they would be interested. Since this needed to be in project space for further development, I've moved it. --Moonriddengirl(talk)13:24, 16 March 2009 (UTC)
Bravo! I think it would be good to have a place to discuss general copyright issues away from the hustle and bustle of the questions pages, and without pretending hat everything has to come from a change in policy. Count me in, even if I don't use userboxes ;) Physchim62(talk)14:01, 16 March 2009 (UTC)
The work process is a bit vague still, and needs to be a little more explicit. What is not clear to me is what actions are to be taken when a project member has reviewed a suspected copyright violation, and the next step is deletion. This requires an administrator, which I am not. We can jsut then move on and administrator will get to it, but how do other wikiproject members identify that the article has already been reviewed? How does an administrator know that it has been reviewed? How does the administrator know that the review is trustworthy? -- Whpq (talk) 15:37, 16 March 2009 (UTC)
Well, if it's specifically WP:SCV, they're removed from the last after review...sort of. :) The way the process has been established there, they're removed if they're tagged CP or if they're cleaned up. If they're tagged for speedy, they're only removed when they redlink...I guess so that contributors can be sure they're followed up on. I'm not sure how to more effectively organize that. It looks like Prankster Turtle makes notes: [1]. I wonder if some icons could simplify things, the way they do it at WP:RPP.
As far as the category, Category:Possible copyright violations, I don't know how to identify that it's already been reviewed. There may not be a way, unless we change the way categories are currently set up--such as removing the ones that are tagged WP:CP from the possible copyright infringement category. (Perhaps we could make it a subcat? That might be a good idea.) Administrators check the listings at CP and have to follow through before taking the responsibility of deleting or what have you, but notes made there can still be very useful. For instance, when I'm the admin at CP, I always check to see when infringement was introduced so I know if I should revert to last clean and where I need to look to see if followup action is necessary.
Focus. Given that the project is just getting under way with a limited number of members, perhaps focusing on one specific category, and developing out the process for that would be a good idea. We can evolve the process based on the experience, and then mould it to fit other categories as the project matures. Perhaps we start with Wikipedia:Suspected copyright violations as that doesn't look too daunting. -- Whpq (talk) 15:57, 16 March 2009 (UTC)
On a WikiPolitical point, it's generally considered a Bad Idea to plan admin action through a WikiProject – best leave that to the other Community pages. On the other hand, what we can do is look to see where the current copyright review processes could be improved. I mean, what are we going to do with the 1000 New Zealand molluscs? Physchim62(talk)16:09, 16 March 2009 (UTC)
Superb! In its first few hours, this WikiProject is already serving a useful purpose, namely telling interested editors that the matter is "under control" ;) Physchim62(talk)16:32, 16 March 2009 (UTC)
Oops! No, no. Not meant to discourage assistance. With 1,000 articles, burnout over there is quite likely. :D (eta: Actually, we could probably help out quite a bit. Those stubs don't require any great familiarity with molluscs. Want to take a letter?) --Moonriddengirl(talk)16:33, 16 March 2009 (UTC)
Ongoing large-scale infringement cleanup
Now that Physchim62 has reminded me that we are past the stage of "talking about" the project and at the point of "doing", I wanted to note that there are currently two ongoing mass infringement cleanup projects of which I'm aware:
Handling of this one is straightforward. As indicated there, the project has decided to stub the suspect articles for later expansion. Even without familiarity, contributors should be able to help out there. They are coordinating clean-up but would undoubtedly welcome further assistance. Details are on that subpage.
This one has been underway for over a month in my sandbox. The original contributor is under voluntarily probation as a condition of lifting his indef-ban whereby all his major contributions at this point are reviewed before going into article space. The purpose of this clean-up is to check his earlier efforts for infringement and revise as necessary. Frequently, this contributor cited his sources, but still infringed by replicating whole sentences or phrases. We also have to look out for superficial revision, where a few words are changed, but not enough. Most of the articles that he has created have been addressed; we're primarily on the articles to which he has added. I find it useful to look at the diffs of his contribution and compare with sources he has cited or run a narrow google search for striking phrases. If you address an article, strike it out. It's probably no longer really important to check the ones where infringement has been found, as the case for probation has already been made, but it's kind of a habit at this point. :)
In the project section, "What is a copyright concern?", it would be helpful to link to an essay or howto (or some other article) at the top, that walks editors through the process one step at a time, hand in hand, from identifying copyright concerns, to tagging and notifiying. Thanks! Viriditas (talk) 02:03, 17 March 2009 (UTC)
That could be a good idea, but might be a little complicated, since procedures vary, especially if we take into account files. Maybe for now a pointer to Wikipedia:CV#Instructions? This is a bit basic, but could serve. I'll put it in as a "see also" while we talk. :) --Moonriddengirl(talk)20:40, 17 March 2009 (UTC)
Hi. :) Welcome. There are tons. It depends on if you feel more comfortable working with text or files. If text, do you like writing? There are a number of articles over here that need evaluating for close paraphrasing and may need rewriting. If you like evaluating, there seem to be hundreds of articles at Category:Copied and pasted articles and sections that need comparison to sources to see if they are copyright infringements or not. There's also still a massive cleanup project going on at Wikipedia talk:WikiProject Gastropods/Subpage for organizing CopyVio Cleanup. We've managed to make very good headway there, but we're going to have to get going on the next phase...once we figure out how. :/ --Moonriddengirl(talk)23:24, 19 March 2009 (UTC)
I'm in the middle of merging articles related to the Gastropod cleanup and have stumbled upon someone whose contributions need checking. He has had articles deleted for copyright infringement before. I've tagged two: Fareed Nabiel Fareed and Hilary Rantisi. The latter represents his effort to create a new version to replace a copyright infringement. As you can see by comparing it to the tagged source, it doesn't go nearly far enough. (In one sentence, the word "frequent" has been changed to the word "recognized." Enough to escape Corenbot, but hardly enough to avoid infringement.) I've explained the problem at the user's page and listed both at CP.
If somebody could take a look at other articles in his or her contrib history, that would be greatly helpful. If not, I'll try to get to it after I finish this batch of merges. :) --Moonriddengirl(talk)14:49, 20 March 2009 (UTC)
I will take a look and report progress. I will work from most recent to earliest. I'll note where I stopped so that somebody else can pick it up. -- Whpq (talk) 18:06, 20 March 2009 (UTC)
I have finished Mujid S. Kazimi which puts me at Dec 14, 2008 chronologically in the contribution list. Question: I've been tagging with the copyvio template, including verbatim copying. Should the verbatim copying be tagged for speedy deletion? -- Whpq (talk) 18:22, 20 March 2009 (UTC)
Probably (a lot of admins would just delete), but under the circumstances I'd tag it for CP. There's no harm in delaying closure for a few days, so long as the material is blanked, in case the contributor can verify permission. If you don't object, if there are a bunch, I may go through to see where the problems are and dispose of a few of them, since the text is readily available at the external site. Thanks for picking up on this. I've got more work stuff going on this afternoon than I was expecting, which is putting a real damper on my true calling. :/ --Moonriddengirl(talk)18:33, 20 March 2009 (UTC)
Between us, we seem to have covered it. I didn't find a single article in his contribution list that was not an infringement at one point, although some have been cleaned by other contributors since. I've strongly cautioned. --Moonriddengirl(talk)21:31, 20 March 2009 (UTC)
Plot description box
Does anybody have any input on this? I'm not very experienced in creating templates for talk pages (well, aside from text ones, anyway). I don't know if establishing such a thing is controversial or if there's a place I should propose it or if it's a matter for bold. The idea is that it could be put on talk pages of television episode lists, books, movies, etc.
Plot descriptions cannot be copied from other sources, including official sources, unless these can be verified to be public domain or licensed compatibly with GFDL. They must be written in original language to comply with Wikipedia's copyright policy. In addition, they should not summarize in sufficient to detail to constitute a derivative work. See Wikipedia's Copyright FAQ.
It seems like a good start, but the wording on it seems a bit confusing, at least is it to me... I have reworded it to as follows to hopefully make it less confusing:
Plot descriptions cannot be copied from other sources, unless these sources have an acceptable license (such as public domain or a GFDL compatible ones) or the owner has given permission to use them. In addition they must:
Only be used for ideas rather than sentances. (See copyright policy for more information).
Should not summarize in sufficient to detail to constitute a derivative work. See this page for more detail..
I actually found the original version less confusing. Here's my shot at it:
Plot descriptions cannot be copied from other sources, including official sources, unless these can be verified to be public domain or licensed compatibly with GFDL. They must be written in original language to comply with Wikipedia's copyright policy. In addition, they should only briefly summarize the plot; detailed plot descriptions may constitute a derivative work. See Wikipedia's Copyright FAQ.
I'm not certain if this issue falls within this WikiProject, but I can't think of anywhere else to post. Basically, I have just been in contact with the uploader of this logo. As a work of a US Government agency, it should theoretically be public domain, but is made clear that they are not to be used without permission- see this and this page. I chatted to the uploader about this issue, (see their half of the conversation) and they tell me that the majority of USGov agencies have similar protocols in place, citing NASA as an example. We already have templates that explain that the NASA logo use is restricted for reasons other than copyright, but I'm concerned that this (generally) may be an area that needs to be reviewed- we are potentially using a large number of logos (especially from minor US Government groups) as PD, when they are not. I may well be looking at this completely the wrong way, but I am far too tired to look into the issue now- I just want to get it written down while I'm here. My experience with copyright extends to the border of Wikipedia- any further than that, I have no idea. Does anyone have any thoughts on the issue? J Milburn (talk) 22:54, 25 March 2009 (UTC)
The problem extends far beyond "minor US Government groups"! All of the following images have restricted commercial use:
registered trademarks
the flags and official symbols of many countries and international organizations (Art. 6ter, Paris Convention)
the olympic symbol (which has its own international treaty devoted to it!)
images of identifiable people, either living or with litigious heirs (eg Marilyn Monroe)
Commons thinks very hard about this sort of thing. They have Commons:Non-copyright restrictions, and a whole category for restriction tags (Commons:Category:Restriction tags) including examples such as Commons:Template:Trademarked, Commons:Template:Personality_rights, and Commons:Template:Counterfeiting, just to list a few. Another important restriction is that merely cropping an image may violate copyright, because it may cause an element that was de minimis to become the primary focus on the image. The rule of thumb is, we only strive to ensure that copyright is not an issue; and that the image is usable, commercially or not, in the original context of one or more encyclopedia articles. Anyone who seeks to reuse it in other contexts should be careful to observe other, non copyright related limitations, and these sort of tags serve to warn them. Dcoetzee17:25, 26 March 2009 (UTC)
The above contributor has introduced copyright infringing material to Wikipedia at least twice: here, with text from this essay ([licensing], although the contributor may not have realized that); here, with text from this site. This contributor has been advised now of copyright policies explicitly. Does anyone have time or interest in evaluating other contribs, which are extensive but limited to a few articles? (It may be enough to run them through a plagiarism finder.) If not, I'll try to pick it up later, but I am currently far behind due to a combination of a busy work week and the highly unexpected family medical emergency late last week that I'm still dealing with. --Moonriddengirl(talk)18:11, 30 March 2009 (UTC)
I've examined the IP contributions up to September 20, 2008. The start of the economy section was a copyvio, but subsequent edits have obliterated it. Ohterwise, it's clean. I've been going through them manually. -- Whpq (talk) 21:17, 2 April 2009 (UTC)
Mollusc, fish contributor problem detected years ago
I just wanted to note that this edit made me curious: this contributor was first notified of our copyright policy in February 2006. It really is too bad that we don't have a better system for tracking copyright infringers, as this massive mess could have been averted. :/ --Moonriddengirl(talk)00:59, 1 April 2009 (UTC)
Copyright violations at over 50 articles
Wikitiki666 (talk·contribs) has been creating copyvio articles since the beginning of March. I noticed one copyvio, which led me to find many more. The user has been contrite and has rewritten several of the articles that I pointed out, but I think more eyes are needed here to determine what should be done to rectify these massive copyright violations. Cunard (talk) 03:38, 5 April 2009 (UTC)
Thanks for investigating further. I'll take a look at the situation and see what feedback I can offer. (I'll make it second on my morning list. :)) --Moonriddengirl(talk)11:31, 5 April 2009 (UTC)
Whilst checking A Thread of Scarlet as it had been tagged with the {{copypaste}} template (since removed by the article creator) I found that it would appear to have been copied from the book itself: [2][3][4][5] (sorry for all the links but google books only has snippet view). I've added the {{copyvio}} template to that page, notified User:Corsair1944 and listed at WP:CP as I'm pretty sure that it is a copyvio.
I also checked the article creators other contributions and there could be other problems:
What is the best course of action in cases such as these? Should I just tag everything I think is a problem and list at WP:CP? I'd rather learn and fix it myself if possible? Apologies in advance if I'm getting the wrong end of the stick with respect to all this. regards ascidian |talk-to-me14:52, 5 April 2009 (UTC)
The thing about WP:CP is that it's a holding bin for seven + days. That means that you can tag and list them and learn & fix them, too. :) Any administrator who closes a CP is supposed to check the temporary page to see if the article has been recreated in clean version there. Just to share my own personal approach, I'll note that when I tag a great many articles by one person, I do not give them the "nothanks" over and over, but simply create a subsection to the first nothanks saying "I have found additional problems in X, Y, & Z." Also, I'm sure you know this but it always bears repeating, we have to be careful in rewriting articles not to infringe on the Wikipedia contributor, which means that we have to treat any original text by them as we would original text from any other copyrighted source. The only difference is that we can re-use their text if we provide credit for it. I often do this in edit summary and incrementally: I would add original text by User:X in one go, with an edit summary that says something like, "Text by User:X on 1 January, 2001." (And, by the way, I've added a link to a pdf that certainly validates your concerns with the first mentioned article.) --Moonriddengirl(talk)15:08, 5 April 2009 (UTC)
Status Report- Images need checking
Resolved
I have been going through bunch of pages (by "category checking", for example checking all articles in Category:Game Boy Advance games) and based checking whether they have acceptable FURs or not and have identified some pages where I wasn't sure whether they have acceptable FURs or not. Could someone help me analyze the pages listed here and add rationales/tag for deletion/etc as appropriate? NanohaA'sYuriTalk, My master22:47, 5 April 2009 (UTC)
All images either given a valid rationale, or nominated for deletion in some way. Please do let me know if you update the list again, I would be happy to go through it. J Milburn (talk) 11:19, 7 April 2009 (UTC)
Might I be allowed to start it? Oops, too late, there really are talk pages for Signpost articles, that's how omphaloskeptic we've become! Don't tell the people at WP:FAC, or they'll be wanting them as well!
I'm shocked that this should be considered important enough for a Signpost article, and all the more so that Awadewit should dare to promote the idea here. Plagiarism may often be a sign of copyright violation, something we must avoid as far as we are able. However, we are (should be) combating copyright violation, not promoting some Ivory Towerhypocrisy of intellectual purity: the latter, frankly, is what the article is proposing. Physchim62(talk)00:25, 14 April 2009 (UTC)
I think most or all of Sky89's image contributions are copyvios. His upload log is here. I just deleted one that was taken from the BBC website (found this on tineye), but I don't really have time to go through all of his uploads. Given his area of interest (Myanmar), he may not speak English well. (I'm not sure.) Calliopejen1 (talk) 14:37, 14 April 2009 (UTC)
Among the images I deleted were some taken from website and one that was obviously a postcard. But I don't have the image experience to evaluate many of these. Our (unadopted) guideline on Wikipedia:Plagiarism says, "A user's original photographs can also be expected to have similar metadata, since most people own a small number of cameras; varied metadata is suspicious." In the image batch uploaded on the 14th, I see an iPhone, a Sony Ericsson, a Nikkon and a Samsung Techwin (one example of each; there are many others. There's also one come through photoshop, here, but I can't find it on the web.) A few months ago, we also have a Panasonic and a Nokia. Unless we're dealing with a very avid photographer, there may be reason for suspicion. The Ericsson occurs so often that I would not be surprised if this contributor were not the original photographer of those. I'm not sure how to proceed with investigating this further. Maybe a contributor who is more experienced with images can help? (Also, I have not blocked the contributor even though I confirmed several instances of infringement among yesterday's upload and the contributor has been blocked in the past. I thought it might be better to get the full scale of the problem before we take any kind of protective action.) --Moonriddengirl(talk)14:04, 15 April 2009 (UTC)
Eep, that's quite a mess. I've left the uploader a warning message about uploads.
In circumstances like this I tend to list most or all of the user's contributions on WP:PUF. I'm going to list some of them there in a while. Stifle (talk) 14:17, 15 April 2009 (UTC)
The Sony Ericsson model quoted is a fairly common model of cellphone. The images taken by that may well be his own. I've tagged the others. Stifle (talk) 14:23, 15 April 2009 (UTC)
My guess is that the uploads are a mix of photos that he really has taken himself and ones that have been taken from web sites. I've noticed that there is a mixture of license claims used. Some are claimed under a CC license or a CC and GFDL license together. I've not gone through these exhaustively but they appear to be all taken with the Sony Ericsson or the Apple iPhone. The Sony Ericsson photos bear dates in 2007 and the iPhone photos bear dates in 2009. This is consistent with somebody who has upgraded their cell phone. I suspect that these represent actual photos taken by the uploader. Many are stated has him being the copyright holder and releasing into the public domain. I've not reviewed every one of them, but they all appear to be images that are consistent with being swiped from somewhere else. Either the image is taken with a camera other than the Sony Ericsson and iPhone, or there's no information and the image sizing and photo quality is not consistent with other images. I wonder if the uploader recognizes there is a difference between the images he has taken himself, and ones he has found on the web, but just doesn't understand that he can't use the latter. -- Whpq (talk) 21:14, 15 April 2009 (UTC)
It looks to me like he doesn't give a damn about licensing, and is just slapping whatever tag on an image that doesn't result in it getting deleted. Stifle (talk) 10:23, 16 April 2009 (UTC)
Contributor notified of copyright policies in November 2008. I have confirmed other recent infringements at Lifestyle lift, PSoC and Mobile phone. Seems primarily to be pasting from PR releases, which, of course, are not public domain unless they say they are. I have explained the copyright policies again at his talk page, since his note at Wikipedia:Copyright problems/2009 April 15 leads me to believe that he has misunderstood rather than deliberately continued in defiance of policy, but more of his contribs need review. If nobody else has time, I'll try to pick it up later today. :) --Moonriddengirl(talk)11:54, 20 April 2009 (UTC)
Image logs need checking. One web-resolution photo reported at WP:FFD by copyright holder. Another I found said self-made in ms paint but was watermarked (c) Maporama. Images seem to be from a variety of states (though mostly in the south) and a variety of resolutions. Charts look fine. Calliopejen1 (talk) 20:17, 21 April 2009 (UTC)
He has confirmed that he is not the photographer of at least some of these and has requested deletion of a few. I'm going to present him a list and ask him to specify which he asserts are his photographs and which not. --Moonriddengirl(talk)11:54, 23 April 2009 (UTC)
Another problem user
This one's bigger than I have time to tackle right now, so I'll see if anyone else wants to help out.. Supersentai(talk·contribs·deleted contribs·logs·filter log·block user·block log) has uploaded a ton of images, first as sourced from a blog or other web site, but then when those were challenged, he/she has uploaded them under different names with PD tags, and seems to have also uploaded a bunch to Commons. Most of them seem to be easily replaceable by free photos of the artifacts or clothing in question. A second or third pair of eyes on these contributions would be appreciated. (ESkog)(Talk)12:56, 24 April 2009 (UTC)
I did a quick check and spotted one image that wasn't tagged for deletion that should have been. I did not go through them exhaustively, so there may be more but they will be evident after the current contributions are deleted. The rationale for them is very poor. Somebody is clearly buying these modern hanfu so it it must follow that photographs could be taken of these articles of clothing even worn by the owner. They are out there and just needs somebody to take the picture. -- Whpq (talk) 13:45, 24 April 2009 (UTC)
The question of copyright violation has come up at Wikipedia:Articles for deletion/Chronology of Star Wars. I thought I had a handle on it, but it turns out WP:NFCC does not apply to text, but only, by definition, to "all copyrighted images, audio and video clips, and other media files". The nub of the question really is whether a chronology of a fictional universe which has been published in other books can be summarised in Wikipedia free of copyright. Appreciate any thoughts. HidingT17:26, 18 April 2009 (UTC)
Well, WP:NFCC does not apply to text, but WP:NFCdoes. :) That said, I find your question a little unclear: "The nub of the question really is whether a chronology of a fictional universe which has been published in other books can be summarised in Wikipedia free of copyright." Are you asking, "Can we compile details from various fictional works to create a chronology of events that occur in those fictional works?" or are you asking, "Can we summarize a chronology of a fictional universe that has been published in other non-fiction books about that universe?" In other words, are you asking if we can craft a brand new chronology from original details or if we can summarize one that's already been crafted? The answers to those questions could be very different. :)
(A) If the former: factors to consider would include (a) how much original language is incorporated from the fictional works? (whether or not quoted); (b) how much detail is given from individual fictional works? (enough to form an abridgment?) and (c) has the material has been fully transformed from a work of fiction into a reference guide? Worthy of comparison here: [6]. If the material has been transformed into a chronology drawing on many books published at many times by many different authors (not to mention, perhaps, movies & cartoons?), this would be to our benefit. Rowling won her lawsuit (linked a few sentences back) because of the substantial copying of language and plot elements from her books; she did not win her derivative work lawsuit based on her claim that the Harry Potter Lexicon was a derivative work of the Harry Potter series. The court said that the purpose of the Lexicon "is to give the reader a ready understanding of individual elements in the elaborate world of Harry Potter that appear in voluminous and diverse sources. As a result, the Lexicon no longer “represents [the] original work[s] of authorship.” 17 U.S.C. § 101. Under these circumstances, and because the Lexicon does not fall under any example of derivative works listed in the statute, Plaintiffs have failed to show that the Lexicon is a derivative work." IOW, if we could persuasively say, "the purpose of the article is to give our readers a ready understanding of individual elements in the elaborate world of Star Wars that appear in voluminous and diverse sources" then we're probably in pretty good shape. Unless we've gone heavily into the land of duplicating language or extensive detail. As the Harry Potter Lexicon ran into trouble: "The Lexicon’s use lacks transformative character where the Lexicon entries fail to “minimize[] the expressive value” of the original expression.... A finding of verbatim copying in excess of what is reasonably necessary diminishes a finding of a transformative use."
(B) If the latter—that we are summarizing previously existing chronologies and not forging new ground ourselves—we face bigger hurdles. Still quoting that same lawsuit, "The Lexicon’s use of Rowling’s companion books, however, is transformative to a much lesser extent. Although there is no supporting testimony, the companion books can be used for a reference purpose." We would not be transforming a non-fiction reference that includes a chronology. We'd be duplicating its purpose. We'd have to consider "the effect of the use upon the potential market for or value of the copyrighted work" (again, presuming that the copyrighted work here is something like a fictional Chronology of the Star Wars Universe by John Smith). We're free to create reference works on Star Wars (as under A), but we're not free to condense and exploit existing reference works on Star Wars. We can make our own, but we must not draw too much from any existing reference books, neither in language or choice of detail.
I hope I've been at least somewhat clear in my opinion. :) If not, please let me know. IANAL, but I do have some experience with copyright. As I'm sure you know, it'll always come down to the finding of an individual court; these are not easily answered questions. --Moonriddengirl(talk)18:33, 18 April 2009 (UTC)
You've been clear enough yes. I agree that it'll always come down to the finding of an individual court, and I'm thinking that the only approach to take is avoid the issue; copyright concerns are either misunderstood around here or disregarded, so until they become a priority, I now think it's not necessary to worry about them. My own understanding and my own morals don't really amount to a hil of beans... HidingT10:42, 22 April 2009 (UTC)
If you think copyright concerns are disregarded around here, then I believe you may have misunderstood at least the attitude of the Foundation, if not the typical editor. I have also seen consistent concern for them among most of the administrators with whom I've interacted and a good many editors as well (though particularly not the ones who are deliberately defying copyright :)). While it's true that they can be misunderstood and misapplied, copyright problems are a priority...and I don't say that only because I spend a good 40 a hours a week addressing them, on Wiki and through Foundation e-mail at OTRS. The moral element is only one dimension; anything that can cause legal problems for the project has to be a priority. Copyright is up there right along with WP:BLP. --Moonriddengirl(talk)10:54, 22 April 2009 (UTC)
I think we may be talking at cross purposes. Slam dunk textual copyright violations are taken seriously, but I'm surprised that WP:NFCC does not apply to text, and I think that the more complex forms of copyright infringement aren't actively considered a problem. I base that on statements from Mike Godwin and the general community consensus. There's very little guidance on copyright as it applies to text and to fiction. HidingT12:06, 22 April 2009 (UTC)
Good to know. I'd hate to think that people believe copyright problems aren't a serious concern. :) Again, WP:NFCC does not apply to text, but WP:NFC does. The basic rule of thumb there: if you duplicate it from a copyrighted source, do it in specific, limited circumstances and mark it clearly. Don't do it too much. More complex forms of copyright infringement may have slightly more complete coverage at Wikipedia:FAQ/Copyright than Wikipedia:Copyright itself. That FAQ addresses derivative works, for instance, including the difference between summary & abridgment, and it talks about the degree of revision necessary to avoid infringement. I think, though, that a lot of people are probably unaware of its existence. And it isn't extremely detailed; like most FAQs, it touches on the major points of a problem. I agree with you that there should be more guidance on copyright as it applies to fiction, particularly in what level of detail is acceptable for summary. My biggest complaint, though, is that so many contributors simply paste summaries (as for tv shows) from other sources. It's a constant battle, keeping that under control. :/ --Moonriddengirl(talk)12:22, 22 April 2009 (UTC)
Yes, WP:NFC does apply to text, but barely mentions it, and WP:NFCC is the substantative policy portion of WP:NFC, so ommitting text from being covered there creates confusion. I was aware of the FAQ, but found it next to useless to be honest. I agree about copyright violation. I've wasted many a time expanding an article only to have the work removed because of tainting by copyright. HidingT12:53, 22 April 2009 (UTC)
If the omission of text from NFCC may cause confusion, then that should probably be addressed. I've opened the question at Wikipedia talk:Non-free content#Text and WP:NFCC. Likewise, with the FAQ, if it needs improvement, we should discuss ways to fix it. If you've found it next to useless, then other contributors probably have as well. Since it's purpose is to be useful, we should see what we can do. :) Even if the expansion goes beyond the scope of the FAQ, it's always possible to supplement with essays and guidelines and link to those, so long as it is clear that these are recommendations and not mandated. --Moonriddengirl(talk)13:02, 22 April 2009 (UTC)
I think the FAQ on derivative works is so vague as to be meaningless, because I don't grasp its meaning. Unless all it is saying is that derivative works are bad. It doesn;t really discuss ways in which a Wikipedia article or a group of Wikipedia articles could become derivative words, so in essence there's no "avoid this". At the comics wikiproject we've had to take a hard line on publications like OHOTMU, and disallow them as sources for statistics and images, because our articles can tend towards competing with such publications, and then we can end up infringing upon "the potential market for or value of the copyrighted work". That experience has informed my thinking on the issue and probably leaves me with something upon which to base my opinions, an experience perhaps a wider number of contributors may lack. HidingT13:50, 22 April 2009 (UTC)
Derivative works, room for improvement?
←In relevant portion, it says:
A derivative work is something that is "based on and a close copy of" another work. For example, the Harry Potter movies are derivative works of the Harry Potter books. Therefore, Warner Bros. required J.K. Rowling's permission to make and distribute the films.
You may not distribute a derivative work without the original author's permission unless you're using one of the rights they weren't granted (like fair use or fair dealing). Generally, a summary (or analysis) of something is not a derivative work, unless it reproduces the original in great detail, at which point it becomes an abridgment and not a summary.
Essentially, that means "You can't write something that is based on and a close copy of another work without permission. You can analyze or summarize something, if you don't include so much detail that it becomes an abridgment." How do you feel this can be moved towards your goal of helping Wikipedians understand how to avoid derivative works?
As an aside for onlookers, factual compilations are addressed further up in the FAQ:
Facts cannot be copyrighted. It is legal to read an encyclopedia article or other work, reformulate the concepts in your own words, and submit it to Wikipedia, although the structure, presentation, and phrasing of the information should be your own original creation. The United States Court of Appeals noted in Feist Publications v. Rural Telephone Service that factual compilations of information may be protected with respect to "selection and arrangement, so long as they are made independently by the compiler and entail a minimal degree of creativity," as "[t]he compilation author typically chooses which facts to include, in what order to place them, and how to arrange the collected data so that they may be used effectively by readers."[1] You can use the facts, but unless they are presented without creativity (such as an alphabetical phone directory), you may need to reorganize as well as restate them to avoid substantial similarity infringement. It can be helpful in this respect to utilize multiple sources, which can provide a greater selection of facts from which to draw. (With respect to paraphrasing works of fiction, see derivative works section below.)
But here's where I get patchy. Is it a "fact" that Harry Potter was born in 1981? It's a fact that J.K. Rowling has written that, but is it a real fact, or a fiction? And then, how does that limit our ability to compile? At what point, in compiling an article made up of these "fictional facts", do we cross the line between summary and abridgement? I guess the law is just very very fuzzy on this. I'd be inclined to say that chronologically listing all the pertinent detailes of a plot would likely constitute abridgement rather than summary, but that's just me. If I had to move people towards understanding what a derivative work is, I'd use your text under the box rather than the text in the box, because it doesn't get muddied in phrasing like "unless you're using one of the rights they weren't granted". But I would like to see some discussion of what constitutes abridgement and what constitutes summary. I mean, a derivative work, I'm guessing, just means the information is derived from a given source. Now at what point do you cross the line between abridgement and summary when you source froma work like Star Wars: The New Essential Chronology? What have the author and licensor got to protect here? Is the chronology itself a "fact" that we can compile, or a "creative expression" we can't? There are only so many ways you can say the sky is blue, but that's cool, because no-one owns the sky. There are only so many ways you can say "Luke Skywalker, with the assistance of Obi-Wan Kenobi's spirit and Han Solo, destroys the Death Star", but somebody owns Luke Skywalker. My point is this; is this something we can work out some kind of answer for, or is it so gray that unless we spot cut and paste, there's not a lot we can do, because there's not a lot we can know? Is it okay to use a primary source, a sole work of fiction as the only source for an article as long as no cut and pasting occurs? HidingT08:44, 23 April 2009 (UTC)
Also, I've always thought the Harry Potter movie example is a really bad one, because we don;t make movies on Wikipedia. At least, not yet AFAIK. HidingT08:45, 23 April 2009 (UTC)
The answers to many of those questions are in my first reply above. Putting them into the FAQ could be complex. :) A reference guide to a fictional work is transformative, but it can't contain too many of the creative elements of the original work, either in literal duplication or comprehensive non-literal similarity. For that reason, if you did a chronological listing of pertinent details of the plot, you run a higher risk than if you do an alphabetical listing of major characters...though either could cross the line if it takes more than necessary of the "flavor" of the original. It is not a fact that Harry Potter was born in 1981; that's fiction. But compiling the basics of such character information in a reference book is transformative, as long as you don't take too much of the creativity of the author in doing so. (Compiling the basics of such character information from other reference books can be more complicated, but in all cases, the amount of detail is a dividing line. We could say, "Prunella Potter, Harry's fifth daughter, was tall and thin and was as unhappy about that fact as she was her name." We could not say, "Prunella Potter, Harry's fifth daughter, was a willow of a girl, born with a certain lankiness that suggested to her besotted parents that she could achieve great heights—both literally and metaphorically. This was a promise she fulfilled as she grew, and by the time she was six years old her equally fond Uncle Dudley had taken to affectionately terming her 'beanpole', a well-meant sobriquet that Prunella greeted with all the horror and disdain it deserved. After all, she often thought, as if it wasn't bad enough being named Prunella and having four older sisters and being tall and thin as a stick, but Uncle Dudley had to point it out every time he wanted her to pass the salt. The only advantage to the hated nickname was that it was marginally better than the hated name." (Well, actually, we could say that, since Rowling never did, but I trust you take my meaning. :) And we could, if we use limited quotations where necessary, so long as we don't quote excessively.) The author and licensor would undoubtedly like to protect all possible uses of their franchise, but the law doesn't quite give them that. :) From a copyright standpoint, it is quite all right to use a sole work of fiction as the only source for an article, but not up to the point of saying "as long as no cut and pasting occurs." Substantial similarity goes further than that.
I can see your objection to the Harry Potter movie example. I would imagine it would be better to use Star Wars: A New Hope, which existed as a movie before it became a book. Given that Wikipedia is a reference work, it would probably be good to give a little detail on transformation. These are always going to be gray areas, but that doesn't mean that we shouldn't try to give good guidance to our contributors. --Moonriddengirl(talk)11:28, 23 April 2009 (UTC)
I'm half tempted to try and work your answers into some form of guidance. Looking at your answers, I think I'd like to see the FAQ improved my having entries on "Transformative", "Creative expression", "Substantial similarity" and "Abridgment". I get that compiling the basics of such character information in a reference book is transformative, but on what level is Wikipedia a reference book? Is an individual article afforded better protection because it is within the Wikipedia? By which I mean, would it be an acceptable defence to argue that the sole article is part of a larger work, and its place within that larger work is transformative? That the very fact that it is a Wikipedia article makes it transformative? Or must there be transformation within the article too, or within at least the general field, for example there must be transformative material on the Harry Potter phenomenon to allow a transformative defence for any goven article about Harry Potter? The FAQ states that "In general, the educational and transformative nature of Wikipedia articles provides an excellent fair use case for anyone reproducing an article." Which seems to state that for the purposes of re-use each article should be in some way transformative, but I may be reading that wrong.
The potential for infringement exists in each distinct article and also articles by group. Let's move outside of Wikipedia into the world of the Snigglewort Wiki, which exists to document the fascinating world of Sniggleworts. Imagine an article standing as a reference work on an individual television episode of the Sniggleworts. This is a typical Wikipedia article, with factual information about date of first airing and length and guest-stars and everything else you see in a television episode article. It also has a plot summary. If a court were to examine that plot summary, it would ask if the article was transformative—that is, if it legitimately stands as a reference guide to the article, or if the purpose of the plot summary was to supplant the original work, to entertain. If reading the plot summary eliminated the need (or desire) to view the episode, then it is insufficiently transformative. An individual article could be sufficient (and certainly would if, for example, it were a transcript; the degree of danger increases the closer you get to that point). Even if it weren't that close, though, if the Snigglewort Wiki has a lot of highly detailed episode summaries, it could get itself into real trouble.
The case that comes to mind here is Castle Rock Entertainment, Inc. v. Carol Publishing Group, where the makers of a Seinfeld trivia test were found to have infringed on the makers of Seinfeld (see [7]). While it's possible that none of the individual episodes from which "trivia" was drawn would have exceeded the de minimis standard (that is, individually, they might not have been significant enough to warrant a suit), in aggregate they were found to represent substantial similarity. Even though the publishers of the trivia test argued that they had transformed the material, the court held that the transformation was insufficient and that the purpose of the book was entertainment...just like the show.
Articles on fictional subjects are meant to be written from a real-world perspective. This is not only good practice for a general reference work like ours, but also potentially helpful from a "transformative work" perspective. But the Seinfeld trivia book example demonstrates that when dealing with fictional subjects, transformation may not be as simple as it seems. As the court noted, "Although derivative works that are subject to the author's copyright transform an original work into a new mode of presentation, such works--unlike works of fair use--take expression for purposes that are not 'transformative.'"[8] IOW, just dressing up fiction as a reference work may not suffice if the primary purpose remains entertainment. Fictional works have closer copyright protection since they are regarded as innately more creative, and (again as the court noted in this case), "the fictional nature of the copyrighted work remains significant...where the secondary use is at best minimally transformative."
If the author of the Snigglewort series took issue with the Snigglewort coverage, he would likely bring the articles before the court together, presuming that the Snigglewort Wiki ignored the take-down notice. (Wikipedia surely never would; this is why I've moved us to another Wiki. :)) If the articles in aggregate represented "substantial taking", then the Snigglewort author would probably win his suit, which would presumably address the individual Wiki authors and the Snigglewort Wiki itself, for contributory infringement. (What the Snigglewort author would hope to recover is a different matter. If nothing else, the articles would disappear from the Snigglewort Wiki.)
How to express something like this in a FAQ is a challenge. Particularly for me, I think, because I write long. (Had you noticed? :D)
Okay, just a couple more queries. "If reading the plot summary eliminated the need (or desire) to view the episode, then it is insufficiently transformative." That seems a bit of an issue, because at what level can you tell whether you've eliminated the desire. I've seen people describe a useful Wikipedia plot summary as allowing the reader to catch up on what they have missed, be that an episode of a show or book in a series. That would seem to cross the line, especially as episodes themselves are now sold on i-tunes. I'm also a little unclear on how you determine whether an article exists to entertain or to inform.
You write beautifully. Concise, clear and eloquently.
Cool. I think, with your permission, I'm going to try and work this conversation into some sort of guidance on Fiction and Copyright. HidingT13:53, 23 April 2009 (UTC)
Thanks. :) I always worry I'm nattering on, and it's good to know I haven't read that way to you. I'll look forward to seeing what you can come up with. Meanwhile, you may have noticed at WT:NFC that there is some talk about how to incorporate text into the policy as well as the guideline. Hopefully, that will resolve soon. --Moonriddengirl(talk)18:59, 23 April 2009 (UTC)
It was I who said that an adequate summary would permit the reader to catch up--not on everything he had missed--but on the basic threads of the plot, enough to follow the action of the next episode. It's not an adequate substitute if one wants to find out the details, or the way they are presented, or observe the faces or the setting, all of which are part of the episode. It is a guide to the main action. That;s sufficiently specialized to be transformative. of course, again we have the question of what is sufficient for one viewer might not be for another. In the extreme, of someone who wants the details, there is no substitute at all, and such a person wouldn't even bother reading one. Another way of saying it, and applicable to all encyclopedia articles, is that it should provide enough information to talk about the episode with someone who had seen it. that's in fac the basic purpose of an encyclopedia - to provide a basic knowledge; not a fully detailed knowledge. Not enough to discuss it in detail with a fan, just with an ordinarily interested viewer.
No possible plot summary can eliminate or satisfy the desire to see the film or whatever. One does this to find out more than the plot. If all that needed to ber communicated wwas the ploty, works of art would be much simpler to produce. DGG (talk) 01:37, 26 April 2009 (UTC)
I'm not sure that the courts would agree with this: "No possible plot summary can eliminate or satisfy the desire to see the film or whatever." I'm not entirely sure I do, either. I have to admit that I've read plot summaries of movies and been satisfied without actually having to see them. But, then, I'm the type who usually prefers the book. :) If a trivia guide to Seinfeld was found to be insufficiently transformative from the original episodes, I'm not sure we can assume that a "guide to the main action" is in itself sufficiently transformative since, obviously, how much detail goes into describing the main action is going to be subjective, and it can go from a paragraph summary of a movie to a novelization. At some point, well before you reach reproducing a detailed script, you do cross the line. For a recent example, I believe this did, before I abbreviated it. (It's still longer than I'm comfortable with, but I don't think it's a copyright problem anymore.) I don't disagree with you that a plot summary should cover the main action, but some kind of more specific guidelines to eliminate things like that would be helpful. --Moonriddengirl(talk)01:54, 26 April 2009 (UTC)
Page redesign?
Or at least incorporation of new elements?
I'm not very design oriented, and so in a blatant rip-off lovely homage (with credit) to the brand new Wikipedia:WikiProject Images and Media, I've been monkeying with a redesign for our daunting (my fault!) project page. Even if the redesign proves to be far too whimsical for a serious project like ours, I think that some of the elements are worth adopting. I am very taken with the noticeboard. Please see my design-challenged rough at User:Moonriddengirl/Copyright cleanup and let me know if you like the idea, if you can make it way better, if you prefer what we have, and if, in any case, you like the idea of the noticeboard, too. I'd be most appreciative. :) --Moonriddengirl(talk)20:08, 26 April 2009 (UTC)
Did you get a copyright clearance with WP:WPIM for the use of their design? Seriously, I think it looks good, and the organization of the main page is great. The noticeboard looks particularly useful. The goal should be that it be easy and intuitive for people to quickly find what they're looking for, and I think the new design accomplishes this much better than the current. – Quadell(talk)12:13, 27 April 2009 (UTC)
LOL. This is a pay it forward homage, it seems. According to hidden notes at WPIM (which I've kept intact), the design comes from WikiProject Vital Articles, which came from the Dungeons & Dragons WP, which came from the Simpsons WP, which came from the Olympics WP, which came from the Molecular and Cellular Biology WP page. In other words, it will soon take over all of Wikipedia. --Moonriddengirl(talk)12:20, 27 April 2009 (UTC)
ROFL! I think this is called "viral marketing", or something like that. To give another example, the article assessment scheme (Stub–Start–B–A plus the rest) was developed at WP:CHEMS… except it wasn't, we nicked it from somewhere else and now nobody can remember where :( good ideas reproduce, bad ideas get forgotten, you can think of it as Darwinian selection. Physchim62(talk)13:59, 27 April 2009 (UTC)
We'd need a template to keep track. :) Maybe I should add to the hidden note that future thefts should take from the first source, for simplicity. By the way, I'm working on collecting templates in a table for a resource subpage (see developing Wikipedia:WikiProject Copyright Cleanup/Resources). Anybody know how to collapse the whole table? It'll make the overall page easier to work with, I think. If nobody does, and I can't figure it out, I'll find somebody to pester about it, but I thought I'd save some time and start here. :) --Moonriddengirl(talk)14:59, 27 April 2009 (UTC)
We desperately need more manpower. :/ This individual has been around a while and is evidently copying plot summaries and other material from other sources. A contribution check is needed. I've spent a good bit of time this morning doing a contrib check for another user at AN; I still want to help with the one immediately above this, and [[Wikipedia talk:WikiProject Gastropods/Subpage for organizing CopyVio Cleanup] looms like the iceberg that sank the Titanic just off the bow. (User:Whpq, I am extremely impressed with the headway you make there with your icepick. :)) I've requested help assembling an advertisement which I hope will draw more people. --Moonriddengirl(talk)15:59, 6 May 2009 (UTC)
The user uploaded yet another copyrighted file after being explicitly warned about his actions. He now currently blocked. -- Whpq (talk) 14:48, 13 May 2009 (UTC)
There are evidently persisting copyright infringements within this article. I haven't finished today's copyright problems board and have enough "real" work (actually, I consider this work more real) and will be challenged to do that today. I thought I would list it here in case anybody wanted to take a crack at cleanup on it. It could be fun, for those who like a challenge. --Moonriddengirl(talk)18:02, 12 May 2009 (UTC)
The contributor who noted the problem has listed a long series of potentially problematic sections at the article's talk. I've advised him to blank the article, and I've notified the various projects. I believe this one has the potential to become controversial, and assistance from interested parties here might be beneficial. (One particular task that needs doing is dating the suspected sources to make sure that they did not come first; it's unlikely that they did, with all of them. Infringement has been detected and removed from that article in the past.) Since an uninvolved admin will need to close the matter after a week, I believe I'll stay out of it. Unless I can't. :) --Moonriddengirl(talk)14:42, 13 May 2009 (UTC)
I've requested assistance on this at WP:AN, at WP:AN#Copyright concerns, assistance requested. See that ticket for background. I suspect no ill intent, but this DYK & GA contributor doesn't seem to have understood our copyright policy. As far back as January 2008 and as recently as May 3, 2009 he has placed evidently copyrighted text on Wikipedia in violation of WP:C and WP:NFC. I've got several hours still of the History of the Jews in Poland review ahead and am still not finished with WP:CP (Today was clean-up on another multi-article infringement). --Moonriddengirl(talk)15:43, 14 May 2009 (UTC)
I'll take a look now (starting from the earliest contribs, for anyone else who fancies helping) since the SCV backlog is gone (thankfully). – Toon(talk)15:47, 14 May 2009 (UTC)
If anybody comes across it, it looks like these people have actually nicked the content from Haitian Revolution (without attribution, I may add), by the dates and the article's evolution, so this particular addition isn't an issue. – Toon(talk)16:01, 14 May 2009 (UTC)
OK, I did some plagiarism checks and googling, looking at the earliest edits; aside from the aforementioned Water supply and sanitation in Ecuador, I didn't find any remaining obvious plagiarism, although I have concerns about the introductions of other Water supply and sanitation articles; the structure of the intros has been based upon the Ecuador article, which was copied from [11], it'll probably need rewriting (although it's only short). The article itself came up clean for anything I could find on the web; either the vios have been wiped out, or they were never there.
Keep an eye on this: I just removed a bunch of sections that were nothing but quotes, or little more. People may revert this, since it did gut the list, but when the whole mini-article is a quote, what can you do? Shoemaker's Holiday (talk) 17:17, 15 May 2009 (UTC)
First of all, we need more people. :) Copyright work on Wikipedia is enormous. I have this morning encountered another prolific contributor who may have introduced multiple copyright infringements. (I believe he may have learned from his first notice, but alas that was relatively recent.) How can we get more people? I've just finished today's CP crop, and I'll be embarking on Day 3 of cleanup on the History of the Jews in Poland (one massive article, that). We especially need keen-eyed people who can take on contributor checks. This can be pretty time-consuming. Anybody know where we might find them?
Second, anyone have any idea how we can organize this more efficiently? Take the section above. I'm not sure much you were able to evaluate, Toon, and that could be an issue: others might not know either. I think we need some sort of standard process.
We might also need a specific board for contributor checks where there is reason to fear multi-article infringement. OTOH, I think that contributor checks need to be handled with as little drama as possible. Sometimes they don't disclose serious problems; sometimes they do. Sometimes contributors are open to learning what they've done wrong; sometimes they get very belligerent. I worry about escalation of drama and the potential for incivility. Jimbo wrote a pretty big check for us to cash with his (paraphrased at source) "Any time plagiarism is brought to the site's attention... Wikipedia administrators review all postings made by that author." I always take a peak at the contrib history of individuals that land at CP. Even if there's no practical way to review all postings, when we see problems in more than one article, a contrib check seems prudent.
I recognise the problem; I wasn't very helpful myself there. I think we need a way of coordinating, perhaps a specific sub-page for each user whose contribs need checking, where we can list the articles checked, whether they are clean or whether issues have been found — people can assign themselves certain tasks, i.e. what type of articles they will check etc. and where we can discuss the issue generally. Of course this would work much better if we had more people willing to dedicate their time to the thankless task! I'm a little busy today, so I'm not able to dedicate much time today, but I'll fill in what I've checked in the section above. – Toon(talk)16:30, 15 May 2009 (UTC)
Fabulous! What area you interested in? Are you interested in evaluating text for copyright problems? Or looking for matches for images that may be infringement? Right now, we have one massive clean-up project already ongoing at Wikipedia talk:WikiProject Gastropods/Subpage for organizing CopyVio Cleanup (directions for contributing are on that page). We also have a contrib check above that remains undone, for Daydreamer198 (talk·contribs). In this case, it's a matter of looking at older article contributions to be sure that the contributor hasn't pasted material into articles. He's done it for a couple, so there's reason to worry that there may be more. Articles found that are a problem I'd tag {{copyvio}} (unless it can easily be removed) and list at WP:CP, but instead of giving him the "nothanks" template which that tag generates, I'd just leave him a personal note telling him that they're listed for copyright evaluation under the existing note. Any of that sound up your alley? Or would you like other suggestions? :) --Moonriddengirl(talk)01:39, 16 May 2009 (UTC)
I'm most interested in evaluating text for copyright problems -- I'm not that good with image IDing, to be honest -- and I'll try to take a look at the Daydreamer198 (talk·contribs) stuff to see where I can pitch in. Thanks! -- ArglebargleIV (talk) 13:53, 18 May 2009 (UTC)
I do this for images, and I'm always willing to pore over a problematic user's uploads. But I don't like doing text checks. – Quadell(talk)20:21, 15 May 2009 (UTC)
Question about copyright status: historical marker text
There's a question about the copyright status of text on historical markers. Please weigh in if you have interest at that thread. (This is more than just a courtesy notice; such conversations seldom get as much participation as they need. :)) --Moonriddengirl(talk)13:04, 18 May 2009 (UTC)
I've removed it from the article, but I haven't been able to figure out how to tell Commons to delete it, and I'm out of time. Can someone please deal with this? WhatamIdoing (talk) 04:25, 20 May 2009 (UTC)
Thank you! I've made a note for myself (at Commons) about the templates.
The editor has left a note there saying that she was denied permission, so presumably the file will be deleted soon. WhatamIdoing (talk) 01:04, 22 May 2009 (UTC)
I spent some time clearing the backlog on WP:SCV, I'd appreciate if someone could in turn spare a few moments taking a second look on the different article cleanups I did as a consequence to make sure I didn't make too many mistakes myself. Thanks. --MLauba (talk) 16:26, 26 May 2009 (UTC)
Hi. :) Welcome! Although I work WP:CP more than WP:SCV, I'll be happy to offer some tips. Good handling on this one, but one thing to consider: when you discuss the matter (as you do on the talk page of that article), you might want to start from the assumption that maybe the contributor is right. Not that you're wrong. The page is obviously copyrighted. But there's a chance that the contributor is the original copyright holder and is authorized to release the material here. I always offer a pointer to Wikipedia:Donating copyrighted materials or Wikipedia:Requesting copyright permission just in case. It's also a good idea, if you can, to take a peak at the contributor's history. In this case, he posted another article that is duplicated from external sources that CorenSearchBot didn't catch. I've blanked it at and listed at WP:CP. I've done the same with Barretstown castle. That blanking keeps us from publishing the text in case it turns out that the contributor can't or doesn't license the material. Wikipedia is quickly mirrored, so this helps cut down on infringement spreading from us. (It's a good idea, too, because on research I find this gentleman's permission letter is not explicit as to which article he is talking about. If the article were listed at CP, then that would guarantee that somebody would follow up on it.)
Philatelic expertisation is a tricky one. It looks like a false positive, but it actually isn't. When Corensearchbot picks up these, it's often the case (as here) that somebody has transferred material from one Wikipedia article to another. The mirror predates the article. :) We need to check, then, to be sure that they have given credit in the edit summary. They didn't. I'll give them a heads up about the processes of splitting articles and correct that. (A lot of people wouldn't know to look for that; certainly, I wouldn't have when I first went to CP. I've seen it often enough that I have my own form letter about it, but not often enough that I've created a template. Corrected with edit summary and splitfrom. And I see you know all about GFDL histories from here and here, so I know you just hadn't encountered this situation before.)
Thanks for the review, there's quite a few things to take in, isn't it :). Amusingly enough, tagging Barretstown castle with copyvio bugged me on my drive home earlier. Completely missed the issue with the splitting, though, I'll pay more attention to that in the future. And to all other things pointed out.--MLauba (talk) 17:37, 26 May 2009 (UTC)
I'm looking at it ('bout time I did some work around here!). It's not a huge problem on enwiki, many of the images will fit NFCC, but I'd like to see how big the problem is elsewhere as well. Physchim62(talk)12:06, 27 May 2009 (UTC)
The problem at Commons is even smaller than the one here. There appears to be no problem at all on Spanish or Tagalog Wikipedias. At a quick glance, the majority of the images seem to be official seals: it isn't a problem to keep the images (they can always be fair-use logos), but we do need to stop and think what their copyright status actually is (especially for the ones on Commons). So I'll keep up with the research before I actually start changing any tags. All comments welcome, this is not a cut-and-dry simple case. Physchim62(talk)12:32, 27 May 2009 (UTC)
Thanks. I think one of the things we'll need to do is address the template itself, in case it misleads other contributors. Maybe it will need to be reworded? I'm not sure. --Moonriddengirl(talk)12:52, 27 May 2009 (UTC)
Contribution Surveyor tool for Windows
I will find a way to put this into our resources section, but I want to be clear that I've got it all right. I'm not particularly technologically clueful. :)
User:Dcoetzee has a "contribution surveyor" program for Windows. I have no clue how it works; that it works, I know. :) You install it on your own (Windows) computer, and it quietly creates a text document in the background while you do other things. It prioritizes the contributions of an editor by most substantially edited articles. When it's finished, you have a list of articles, from most substantially edited to least. Moreover, you have a string of "diffs" next to the article titles which indicate how large the particular edit was. Click on the diff, and it opens a new window showing you the alteration.
I'm finding it very useful in my contrib check on User:Contributor777, as I have already used it to identify four additional problem articles. (I have a long way to go.)
Example of this in action. It gives me this in the list by the contributor I'm checking:
I grab a string of text from the diff and run it through google search, and I hit matches. (If I didn't, I would try other strings or briefer strings. I know there are problems with the contrib history here, after all.) It was actually the second sentence that helped me narrow down to my probable point of origin [12].
I check the current article to see what remains. This particular infringement has been almost completely obscured, so I can simply clean this one, rather than listing it at CP.
I got it directly from the source, and my instructions were e-mailed to me. :) Now, we need somebody to make a version for Mac. :D --Moonriddengirl(talk)12:15, 27 May 2009 (UTC)
I've checked every new article ever created by this user and found infringements in most of them (see Wikipedia:Copyright problems/2009 May 21). A good many of these infringements may result from not understanding that official synopses cannot be used on Wikipedia, but there are others, including copying biographies from IMDb and Romani people in Brazil, from this. I have not looked at other contributions. I wonder if we can get User:Dcoetzee's program accessible? In the meantime, I'll ask him if he can do a listing here. --Moonriddengirl(talk)19:54, 21 May 2009 (UTC)
Discovered through investigation of a report on WP:SCV, contrib history shows several other contributions containing single-sentence to multi-paragraph copyvios. I've currently reviewed all new pages between Magnolia Ballpark and Cecil Ballow Baseball Complex, as well as Bobcat Field (baseball) in his contrib history and cleaned what I found. I also left a personalized notice on his talk page. Going to bed now, though, and I suspect there might be more. MLauba (talk) 23:09, 26 May 2009 (UTC)
Sigh. I'm still working on the contributor check two sections above yours. On the plus side, User:Dcoetzee has made his program for prioritizing and clumping contributions available online, and I will provide directions for accessing and using it here as soon as I have opportunity. Things are a bit hectic for me offline at the moment. And welcome, indeed. :D
Those who aren't familiar with the output of Derrick's program might want to check out my mega-long sandbox. It lists all the articles to which a contributor has contributed (the one two sections above, in this case), with the ones to which he has contributed the largest blocks of text on top. Each diff link identifies the size of the particular edit. It's really an ingenious method of quickly identifying potentially serious problems. I'll try to get the details for that program on this talk page this morning. I need to access my e-mail account. :) --Moonriddengirl(talk)11:35, 27 May 2009 (UTC)
Cleaned up some more, I'm now going backwards, still focusing exculsively on new pages. Cleared until May 16th. There is unfortunately a very clear pattern as almost every contribution I checked has at least one sentence copied verbatim from of the sources. MLauba (talk) 13:01, 27 May 2009 (UTC)
Extended investigation report: good idea or no-no?
If you will, have a look at Talk:Adult attention-deficit disorder/WD claim investigation. Since I made a mess of this on the first pass, I thought it worthwile to detail my research leading me to reverting my initial conclusions and explaining why I'm now convinced that Wikipedia is in the clear.
So, what do you think? (or is it something which already exists but I'm not aware of?)MLauba (talk) 14:09, 28 May 2009 (UTC)
I'm lazy, since I suspect I might have to do that again from time to time, I just wrote a template :). I also added a summary infobox to the talk page linking to the other one just to be sure nobody brings that up again. MLauba (talk) 17:38, 28 May 2009 (UTC)
Formal reports can be useful, here's one of mine from 2005! There especially useful in a contentious situation or where there are many pages to be dealt with (both of those were the case for the "German images" investigation). I wouldn't want to make them mandatory, as we should be encouraging people to assume good faith and simply to participate, but I see no harm in indivdual edtors issuing them if they feel it's helpful. Physchim62(talk)14:15, 29 May 2009 (UTC)
Noticeboard reminder
Please update the noticeboard if you become aware of major issues concerning copyright that might interest other contributors here. And if you are interested in major issues concerning copyright, please keep an eye on it. :) It's readable from the main project page, but can be watchlisted at Wikipedia:WikiProject Copyright Cleanup/Noticeboard. --Moonriddengirl(talk)11:37, 29 May 2009 (UTC)
I've started on this one, having already come up with that lovely listing of contributions that User:Dcoetzee's program generates (see a few sections up to get it for yourself, Windows users.) I am out of time. I have so far found copyright infringement or GFDL infringement (even both) in most of the major contributions I've looked at. Assistance would be much appreciated.
The list is to be found at User:Moonriddengirl/Contribution check Martim33. If you have time to help out, that would be great. I've sectioned them for easier use. If you do help out, please just delete an article listing when you've checked it, whether it's clean or not. (I've been listing some at CP and cleaning some on the spot.) --Moonriddengirl(talk)01:56, 31 May 2009 (UTC)
Hi. I'm trying to come by a workable solution for dealing with massive, cross-article infringement by single contributors. I've opened two sections on the subject at Wikipedia Talk:Copyright violations: one on how to clean them up and another on how to work with the contributors who place them. This is a big issue on Wikipedia that I deal with routinely. The processes we have in place simply are not intended for this kind of situation, and I would be extremely grateful for assistance in working out processes that are. Please contribute there. --Moonriddengirl(talk)12:57, 1 June 2009 (UTC)
Previously warned in March 2009; blocked today 72 hours for infringement yesterday (he also removed the copyvio template without comment that another contributor had placed on the article). Other contributions bear checking. --Moonriddengirl(talk)23:43, 1 June 2009 (UTC)
I'm finding more material of concern. Several articles blanked, more probably to follow. If you have time to chip in, the Contribution Surveyor results are linked just below the username. --Moonriddengirl(talk)01:13, 2 June 2009 (UTC)
Before I transclude this to the tabs, what do you think? I'd like to make it as low-drama as possible, but there simply is no way to depersonalize it, given that it's inherently personal. --Moonriddengirl(talk)12:53, 2 June 2009 (UTC)
By the time it's reached the stage of requiring a contributor check, the need for low-drama has long past. -- Whpq (talk) 16:55, 2 June 2009 (UTC)
Okay. We now have an investigations tab. Please pitch in on any ongoing if you get a chance. (That's a "you" plural, Whpq. I know you're already pitching in. :) I feel bad about ditching Graham, but I keep poking at all these other situations.) --Moonriddengirl(talk)21:10, 2 June 2009 (UTC)
Well, there's only so much you can do unless you are a substantial contributor to the article. Since Wikipedia doesn't own the copyright, editors have to address these things themselves. One basic procedure is at Wikipedia:Mirror#Non-compliance_process. Anybody can send the GFDL non-compliance notice, but only a substantial contributor can send a DMCA take-down notice. You might also list them at the mirrors & forks page so other contributors will be aware. --Moonriddengirl(talk)21:00, 2 June 2009 (UTC)
Whoa. Infringing copyright is a serious business, as it can expose Wikipedia to actual legal action. This is why we have processes which deal with problems. It's not something to take personally; it's perfectly understandable when people don't know what's necessary or applicable under copyright law - it's a hugely complex area. The issue still needs to be dealt with and the time it takes to do this is massive. Again, please don't take it personally, but it's something serious that we can't just ignore. – Toon(talk)22:38, 4 June 2009 (UTC)
To begin, I think the scope of this project is excellent and very much a high priority for the encyclopedia. However, I have a serious problem with the creation of "investigation" pages for contributors to ascertain if they have violated copyright, and specifically with the use of headings and subpages incorporating the username of the contributor. Whether you refer to it as "investigations" or the euphemism of "contributor surveys", I think the creation of such a page (and its subpages) ends up vilifying people who are more-than-likely editing in good faith. I understand that this page was made to coordinate a multi-person effort to comb through the contributions of people who have multiple copyvios, but is it necessary to embarrass them with a page with their username as a header and a subpage with a list of their putatively specious diffs? The purpose of this project is to clean copyright violations quickly, not to deal with offenders. Can the diffs in question be categorized in some other way (alphabetical by article, for example) even if it means that it is harder to track the progress of surveys of specific contributors? -- Samir05:38, 7 June 2009 (UTC)
I understand your concern, but I shall quote myself from above: "By the time it's reached the stage of requiring a contributor check, the need for low-drama has long past." -- Whpq (talk) 09:01, 7 June 2009 (UTC)
I agree with Whpq, though I heartily sympathize with your concerns. I think that the important thing here is to check this material efficiently. Our purpose is not to embarrass individuals, but to address a serious problem that can cause legal difficulties to the project. Certainly I agree with you that many people are operating in good faith. As we note at Wikipedia:WikiProject Copyright Cleanup/How to clean copyright infringements, "It is important in this project as in all of Wikipedia to begin by assuming good faith. Contributors to Wikipedia come from many backgrounds and do not always understand the US copyright laws that Wikipedia complies with or the policies and guidelines we have developed to ensure we remain in compliance." Wikipedia:WikiProject Copyright Cleanup/Contributor surveys reminds contributors to cleanup that "it is not our purpose to harass individuals listed here for evaluation, though some may require administrator intervention to protect the project. Individuals should be courteously advised of copyright policies. In multiple articles are tagged for infringement, it is better to leave an individual note than fill the contributor's page with multiple, redundant template advisories." It also reminds that "This section is for listing contributors who require extensive evaluation because it is confirmed that they have placed copyrighted text on Wikipedia. If you simply have concerns about an editor, but no confirmation, please discuss your concerns either directly with the editor or bring them up at the project talk page."
If you have additional suggestions for how to make sure that this isn't used as a bludgeon, I'd certainly be pleased to hear them. We can't hide the name of the individual we're checking. I'm not even sure that would be a good idea. In the few we've processed so far, I've archived and noted a close date. This is because for the most part, these individuals have not been blocked. Hopefully there won't ever be questions again, but if more checks are needed down the road, we need something to stop us from duplicating effort. It takes many hours to check through somebody's contributor history.
As above, I am also concerned that this remain low-drama. While I don't believe that we can deny a problem, I don't want to maximize it either. We had prior to this latest situation remained low key with these checks, but this latest event had been "not low key" before this was listed. --Moonriddengirl(talk)11:15, 7 June 2009 (UTC)
I appreciate that going through contributions of people with multiple copyright violations is a high-yield strategy to identify more copyvios. I agree for the sake of openness the names should be discreetly identified in a list (with names removed ASAP -- I suggest as soon as the data dump from the program is put on-wiki) (clarified...see below). You can make a separate list for files and remove the names as soon as all the files have been reviewed.
However, there is no need to attach username attribution to subpages. This would be my suggestion for an alternative categorization for subpages (without names) that makes it look less like a witch-hunt. It is admittedly more cumbersome. but not by much:
(1) create alphabetical subpages (say /A-F, /G-L, /M-R, /S-Z) for articles with suspected copyright violation;
(2) take the data dump of the program you are using to get the article name and diffs, and sort it alphabetically in the word processor of your choice. Then place the article name and diffs into each alphabetical subpage. Everyone who is being evaluated will have the diffs in question placed into the same subpages. If you add to a page, re-alphabetize in a word-processor of your choice;
(3) if you are reviewing diffs, you are no longer reviewing them per contributor, but alphabetically. For example someone could review /A-F and there could be diffs from multiple different contributors in there;
(4) no subpages are created with usernames. If someone is identified as having multiple copyright violations, then it gets brought up for discussion at a relevant administrative noticeboard.
Here is an example (not practically useful because I think you have started investigating these already): for 2 of the contributors listed on the survey page, I used the program and created the following subpages sorted alphabetically by article: /A-F, /G-L, /M-R, /S-Z. If someone wanted to tackle the diffs, they could do so alphabetically without direct reference to the person. The system of using + and - system in edit summaries and removing diffs that are not copyvios works just as well.
Even if you don't want to sort things alphabetically by article, then just identify your subpages as /1, /2, /3; or by date of the dump of data from the program. The name of the individual being surveyed does not need to be attached whatsoever.
Just my suggestion for the future. As above, I think it's exceptional work that you are doing and my intention is not to make things difficult -- Samir22:33, 7 June 2009 (UTC)
Currently they are prioritized by size of contribution, which is a very useful strategy, since it helps us prioritize articles that are more likely to be an issue. Listing the articles alphabetically does not permit us to prioritize more problematic articles. The point here, obviously, is to clean Wikipedia. There are no subpages created without verification of problem first. If you think a different naming hierarchy than the username involved would work, we could try assigning them case numbers or something like that. To me, the name of the contributor in the subpage is not the point. The first one I opened, before I became aware of how often this would be used, was simply called Wikipedia:WikiProject Copyright Cleanup/Major project. (We could emulate OTRS and incorporate a date string? Wikipedia:WikiProject Copyright Cleanup/Check20090607?)
I'd be happy to open a new section on WP:AN notifying anytime somebody is brought here. I'm all for anything that brings more contributors and attention to clean-up. In fact, most of these have been publicized at AN or ANI, just because the system is fairly new. However, I had not thought to make it a practice primarily because I have seen contributors that I believe could have been rehabilitated swiftly blocked there and I felt that high publicity forum would likely embarrass them further. My preference would be only to take situations to ANI or AN when a contributor is not open to discussing the matter and additional admin input would be beneficial.
With respect to names, why do you suggest that we remove names ASAP? Knowing that a contributor has been a problem for the sake of copyright infringement is actually beneficial to the project. Take User:GrahamBould, who has verifiably infringed copyright in thousands of articles. He was notified of copyright policies three years before the investigation was launched. Many of the multi-article infringers I have encountered in nearly a year of monitoring WP:CPwere notified in the past and have persisted, some through misunderstanding, some through willfulness. Transparency is a benefit to the project in such cases. (Again, these people are all identified as having multiple copyright violations. Single article issues, obviously, are already handled through WP:CSD or WP:CP.) --Moonriddengirl(talk)22:46, 7 June 2009 (UTC)
IMO the username of the contributor is very much the point. Personally I do not think it is benign to list usernames on a Wikiproject page as being investigated for copyvio, and I suspect many others feel the same way. It embarrasses people acting in good faith and causes unnecessary drama.
If the purpose of this Wikiproject is to clear copyright issues, then my suggestion would be to keep the names out of it and to deal with the issue of the content. Ideally, you don't even need to construct a list of the investigated: to maintain transparency, just notify the contributor that their diffs have been dumped for assessment. You can sort articles and diffs by whatever strategy you think best (size of contribution, case numbers, number of diffs per article, alphabetical, etc.) but keep the usernames out of titles and headers. For the issue of repeat copyvios, either have an admin bring it up for resolution at the user's talk page or else RfC, RfAr, ANI or immediate block as appropriate. There is no need to list their names in a Wikiproject page for the sake of transparency -- there are appropriate avenues for dealing with repeated violators of copyright, but embarrassing them by listing them on a Wikiproject page is not one of them. For example if GrahamBould infringed copyright numerous times then (1) make a list of the content that needs to be fixed (for collaborative use) and (2) have an admin block, RfC, RfAr, etc. as appropriate. Putting his name on a list on a Wikiproject page accomplishes nothing in my opinion. -- Samir06:59, 8 June 2009 (UTC)
Perhaps you misunderstood me when I said that the name of the subpage was not the point. I was not arguing that it was needed, but agreeing that it wasn't...which is why I demonstrated with the first example. The case of repeat infringement is always discussed with the contributor, although I don't believe that it is necessary to have an administrator launch that discussion unless the use of admin tools are necessary. Administrators have access to additional tools, but otherwise have no special authority when it comes to discussing policy lapses. After all, non-administrators are empowered to the same degree as admins to caution vandals, right up to final warnings, even if they can't administer blocks. The transparency of which I speak does not relate to the immediate so much as it does to the long-term. There needs to be some way to access these logs in the future should a contributor again come under question. One of the users on our list was formerly indef-blocked for copyright infringement, but was unblocked when he agreed to one-on-one mentoring. He is at this point free to contribute. If it is found at some point in the future that he is again inserting copyrighted text onto Wikipedia, we need access to the record of what has been evaluated so that we know not to duplicate the efforts. There is no appropriate forum for that. The point is not to embarrass anybody, but WP:C and WP:BLP are special concerns in that they exist to keep the project from legal trouble. No court of law is ever going to call on Wikipedia to say, "I see that User:I'm12 violated your WP:NPOV policy repeatedly. What did you do about that?" Far more feasible is the question, "I see that User:Copypaste violated copyright law repeatedly. What did you do about that?" When a vandal is blocked for inserting the word "fuck" into articles, we don't have to go back and check every contribution to be sure that every disruptive edit has been removed. Copyright infringement is a different matter. If we counsel and clear an editor now, what method can we use to make sure that editors discovering ongoing problems a year from now aren't forced to put tens of manhours into checking the 20,000 contributions we've already evaluated? If the placement in a WikiProject is a problem, I'd be perfectly happy to propose adding it as a subpage to WP:CP. I don't care where it is; what I care about is that the work is done efficiently and in such a way to protect Wikipedia. --Moonriddengirl(talk)10:57, 8 June 2009 (UTC)
Coming in here from MRG's link on WP:AN. As far as digging for copyvios, I think Samir put one thing very well: "I appreciate that going through contributions of people with multiple copyright violations is a high-yield strategy to identify more copyvios." I don't think anyone here is disputing that -- note I'm even quoting the person pushing for change, here. That said, I do think Samir has a point, and I do think that alienating or embarrassing people could be a legitimate concern; hopefully it's the sort of problem we can find a working fix for. Some ideas:
Consider tagging relevant pages with {{NOINDEX}}, which will remove them from many search engine results; we already do this at several "investigation"- or "discussion"- type process pages, specifically including articles for deletion or sockpuppet investigations.
Would it work to categorize by article name, rather than username? It would still be just as easy, I should think, to include multiple listings in one section, or to cross-link between them as needed.
Would it be possible to avoid creating a username-based section or subpage until the presence of multiple or ongoing copyvios is established?
Mostly I'm just brainstorming, here. I see two conflicting goals, and ultimately I think cleaning up after copyvios is more important, but I don't see any harm in trying to improve our process in the meantime, if we can. – Luna Santin (talk)21:01, 8 June 2009 (UTC)
I think {{NOINDEX}} is an excellent idea. I don't think we can reasonably avoid creating username-based pages, though, due to the nature of the work. One user, for instance, routinely translates copyrighted Polish sources, and should be checked by someone who either knows Polish or is good with Google Translate. Another user routinely pulls text from a certain offline encyclopedia about crustaceans, and should be checked only be people with access to that book. These are extreme cases, but even in lesser cases it's much more efficient for me to "get a feel" for how a given user tends to insert copyvios and use that info to go through multiple articles by that user. I appreciate not wanting to embarrass people, but that has to be a lesser concern than protecting the encyclopedia. (Also, so far as I can tell, we never create username-based subpages until the presence of multiple unambiguous copyvios is firmly established.) – Quadell(talk)21:10, 8 June 2009 (UTC)
Gotcha. I had a hunch NOINDEX would be the most likely to fly; the others were more brainstorming. I do agree our first priority has to be the encyclopedia. – Luna Santin (talk)21:14, 8 June 2009 (UTC)
Removing copyrighted content ASAP is a high priority task. But removing copyrighted content is not the same as making archives with usernames. Archiving names of violators on-wiki is not a priority. To answer the question: "I see that User:Copypaste violated copyright law repeatedly. What did you do about that?" -- the answer should be "I notified Mike Godwin" not "I made a list of all of the copyvio I reverted and put their name on it".
Right now you are dumping lists of diffs from Dcoetzee's program. Dump the list into any of: numbered archives, dated archives, alphabetical archives, or archives sorted by case number. Notify the people being investigated on their talk page. If you need a Polish speaker or someone with a crustacean book, then make a page indicating that. You will achieve the same goal of removing copyright. You'll also attract more people (like me, who originally came here to help) who are very troubled by attachment of usernames to archives and lists referencing copyright violation. If there is a serious concern about a specific user they should be blocked, and if you think it needs to be documented for legal purposes, send an e-mail to Mike Godwin -- Samir07:13, 9 June 2009 (UTC)
<- Let me play devil's advocate for a while. We are talking about repeat offenders, contributors who have been found to violate copyright not just once or twice but in multiple instances, sufficiently often in fact to suspend WP:AGF and verify every single contribution they have made. We're not talking about people inserting a POV in an article, we're talking about people who, intentionally or not, have broken copyright laws. And we're agonizing about whether their usernames should appear on a list (wherever it is being located) established with the intent of checking for more copyright infringement.
We don't fuss about usernames listed on WP:SPI, yet these people actually broke no laws, only the wikipedia terms of use. We don't fuss about usernames listed on WP:AN, WP:RFC or WP:WQA, yet again those people actually broke no laws, only our terms of use. We list (probably mostly underage) users at WP:AIV because they replaced an article's content with swear words - yet again, these people never broke any laws, only our terms of use. And we're now troubled because of a to-do list to verify contributions of people who did actually break laws?
The reason why it's not practical to make lists without indicating the user investigated is that when you have a 3-years old article with over 1000 edits, finding the copyrighted text without knowing what to look for takes a lot longer and leaves infringing content live for longer as well.
I agree we have to take precautions against search engine indexing of the to-do lists, and archive these once completed like we do for WP:SPI. But beyond that, I'm afraid I don't really emphasize with the above concerns. Not for repeat offenders, sorry. MLauba (talk) 10:28, 9 June 2009 (UTC)
That's a good point, MLauba. I find that pretty persuasive, particularly given that those users may not have even violated policies, but simply been brought up for investigation. Samir, there is no policy that calls for notifying our attorney when we find copyright infringers, even multiple article infringers, and that's probably a good idea (eta: not having such a policy, I mean.). I think if we involve Mike in his official capacity, we will almost certainly have no choice but to block these users from future participation. His job is to protect the project. Whether he thinks a user is redeemable or not, as Wikipedia's legal representation, he's going to have to consider the court's perspective: "You knew this user was breaking the law, and you let him continue?" "Yes, we did." Generally, I only consult with Mike on those situations that can't be resolved on Wikipedia for that very reason. If we force him to go "on the record", he's going to err on the side of caution. He has to. But maybe that's your point? You say, "If there is a serious concern about a specific user they should be blocked..." There are serious concerns about every one of these specific users, each of whom can be proven to have pasted text onto Wikipedia in multiple articles. Are you proposing that we should simply block them all? Copyright policy sustains that, but it's a much more iron fisted approach than I personally like to take. I do believe that most infringement is caused by lack of familiarity with US copyright laws, not by intentional wrongdoing. Of course, if they're all blocked, then we can simply archive the investigation at their user pages, in keeping with MLauba's precedent at SPI. It would require a new template, but templates are cheap.
Luna Santin, thank you very much for heading over from AN. :) {{Noindex}} is a fabulous idea; I had forgotten that such a thing even existed! I will posthaste add it to instructions and incorporate it in existing pages. --Moonriddengirl(talk)11:33, 9 June 2009 (UTC)
Of course, if they're all blocked, then we can simply archive the investigation at their user pages, in keeping with MLauba's precedent at SPI. It would require a new template, but templates are cheap.
Frankly, I'd be more lenient for this specific aspect. Users who do act out of ignorance or misunderstanding of the law should have their contributions investigated and cleaned, and then talked through and educated on the issues. In my view, the only thing required is a courtesy notification on their talk page that we found sufficient problematic content that they have been listed under the investigation page, and invited them to dialogue and help out if they want to / can. It doesn't require an indefinite mark on their user pages like sockpuppets templates, and once their contribution check is over, it has to be archived so that they are no longer branded by their mistakes.
People who simply ignore all warnings and do not cooperate, or try to use wikipedia as a soapbox to advertise other views on copyrights don't require special templates either, because again, we have all tools we need in our hands today to deal with them - a preventative block to avoid further harm to the project or in the worst case an outright ban.
TLDR - I'm advocating workable investigation lists which are protected from search engine indexation and will be archived once cleanup is done. I have absolutely zero qualms about having these people on lists while the investigation is on-going. However, once the investigation is over, we either have contributors who have realized their mistakes and helped clean up their contributors, and in that case their slate should be as clean as possible (I'd be absolutely livid for instance if past copyvios redeemed by their authors were to held against them 6 months later in an RFA - they should score extra support because they helped fix the issues once made aware of them). The bad faith infringers will be blocked from harming the project, and again no extra steps are required. If they return later on and agree to abide with the copyright laws, their past actions should also not be held against them. At least that's how I see it. --MLauba (talk) 13:37, 9 June 2009 (UTC)
For the record, I don't promote that either. I myself have never blocked a multiple article infringer unless they have been previously advised of copyright policy and usually only for a short time (except where there has been consensus at AN to do otherwise or where there is no good reason to believe that the contributor will stop). Sorry if I gave the wrong impression there; I do not advocate blanket blocking of multiple article infringers. That note was responding to Samir's suggestion that "If there is a serious concern about a specific user they should be blocked" in pointing out that if the user is blocked, noting the cleanup at their userpage would be appropriate (as User:Blueboy96 did to the userpage of User:GrahamBould.) --Moonriddengirl(talk)13:58, 9 June 2009 (UTC)