This is an archive of past discussions on Wikipedia:Plagiarism. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.
Plagiarism is a broader topic than copyright infringement. Suggest we change Wikipedia:Plagiarism to a soft redirect with a proposal template and work out a functional definition of what plagiarism is and how to avoid it. DurovaCharge!22:00, 19 June 2008 (UTC)
I don't know what a "soft redirect" is. Let's just make this not a redirect, and work through policy. Here are some issues I raised with Carol Spears before:
Unique descriptions and phrases copied exactly from books must be put in quotation marks as I did with "in the rock crevices and water-receiving depressions". It is not enough to correctly attribute the source, if the same exact phrase is used it must be in quotation marks. --Blechnic (talk) 00:22, 4 May 2008 (UTC)
In this case I would add, in addition to unique descriptions and phrases, entire sentences or longer portions of text. It's a simple guideline. --Blechnic (talk) 15:59, 19 June 2008 (UTC)
Al, plagiarism is plagiarism, but putting stuff in quotation marks does not give us the right to use as much as we want, and this runs into copyright violations, where all or most or large, inappropriate, non fair-use portions of a text are incorporated whole into articles on Wikipedia, for example, this well cited article:
I think we should do one or the other: ignore the copyright issue, it exists but at a certain level is wholly separate from the issue of copy-pasting anything, public domain, free license, works-for-hire - not a legal topic but rather an ethical topic; or, lets pretend that plagiarism of a PD work is exactly equivalent to a copyvio. We should pick one or the other approach, else this discussion will always be disrupted by the interested editor interjecting "that's a copvio!" - but that's not what we're about here, right? Franamax (talk) 01:47, 20 June 2008 (UTC)
“If an editor has copied text or figures into Wikipedia without proper attribution, politely refer him to Wikipedia:Verifiability, Wikipedia:Citing sources, and/or Help:Citations quick reference. Editors who have difficulties or questions about this guidance can be referred to the Help Desk. Editors engaged in ongoing plagiarism who do not respond to polite requests may be blocked from editing.”
I like Andries plagia-def, but I'm not satisfied with it. The "dishonestly" bit seems to me to exclude good-faith copying, which was evident in the recent incident which brought us here. Here we need the WP definition of what is and is not plagiarism. I'm also not comfortable with the ORI definition, since we are all about totally incorporating other people's ideas, any ideas other than our own in fact. Thus, I'm copying over my proposed initial text, to be dismembered at will:
"Plagiarism is the copying of material produced by others, either verbatim or with only minimal changes, without attributing that material to the original author. Material can be plagiarized from books and other printed media, websites, and GFDL-licensed works, such as the work of other Wikipedia editors. The copyright status of the work is irrelevant, directly copying a public-domain work is still plagiarism unless the original work is noted. Material in infoboxes (corporate data, species taxonomy, etc.) is not considered as plagiarized." Franamax (talk) 03:17, 20 June 2008 (UTC)
I'm not sure how the "other Wikipedia editors" bit works. Some editors release their contributions into the public domain (but there is no obvious indication of that), but more relevantly, copying of other editors' work takes place all the time. Attribution is sometimes only through the page history or an edit summary, rather than in the text. When you rewrite any article, you are adding yourself to a general list of authors, and kind of taking 'general credit' for the resulting article, so talking about plagiarising doesn't make sense here. What is plagiarising is if you use a Wikipedia article (or part of it) outside Wikipedia without crediting Wikipedia (ideally you provide a link so people can look up the authors, but at minimum people need to make clear "this is from Wikipedia - I did not write this"). The infoboxes thing needs tweaking as well - most data (but not all), but not "distinctive phrases of text". If you say in the infobox that someone is "best known for being 'The Man of Steel'", you should quote it and make clear it is a promotional, probably trademarked, tagline, not a description that you wrote (eg. "best known for climbing Everest"). Not the best example, I know, and the distinctiveness of the phrase means most people would realise what you meant, but still. One more point: "directly copying a public-domain work is still plagiarism unless the original work is noted" - people will think that it is enough to put a reference tag and give the source. That is not sufficient. You need to make clear by the layout of the text that the wording is not ours. In other words, the quoted text needs to be offset from the surrounding text, or the right template put at the bottom to indicate that the article is substantially (or wholly) from this PD source. Quite when the line is reached when incremental rewrites mean no "distinctive trace" of the original is left, I don't know. I don't think anyone is going to go back through all the 1911 stuff and check for that. Carcharoth (talk) 06:48, 20 June 2008 (UTC)
Plagiarising other wiki-editors can happen when you lift text from one place and drop it in another without giving attribution. Here's an example where another editor liked my work well enough to copy it elsewhere, with attribution. Another example would be copying text from a deleted article, Mr. wikibiz got all tied up in a knot about that a little while ago. Moving text within an article is fine of course, since it's tracked by the history.
The wording on infoboxes does need clarification, I was thinking of factual items that are impossible to restate.
And yes, this guideline needs to lay out exactly when and how you indicate that you've directly copied a source, preferably with some examples. Franamax (talk) 17:36, 20 June 2008 (UTC)
Copying from elsewhere on Wikipedia
(new sub thread)
As per Carcharoth, I normally only attribute Wikipedia copies with an edit summary. With that summary, it is similar in principle to cutting a paragraph from one section of an article and moving it to another. This is also normal practice for people who translate articles among the different language Wikipedias. Are you suggesting that this is somehow problematic, though we say on every edit page If you don't want your writing to be edited mercilessly or redistributed for profit by others, do not submit it. ?
Is there something unclear about "without attributing that material to the original author"? If you attribute in the edit summary, as in the example I gave above, you're doing it right. If you copy someone else's work and pretend it's your own original material, it's wrong. Translations are just the same, if you say "translated from de:wiki", you're giving the traceback to the original authors. If you translate it then say "I wrote this article", you're doing things the wrong way. This is why we need a guideline, to lay out what's acceptable and what's not. Franamax (talk) 20:00, 20 June 2008 (UTC)
I understand your drift now. Lets add a sentence to one of the existing guidelines or new editor guides to say that. --Hroðulf (or Hrothulf) (Talk) 20:26, 20 June 2008 (UTC)
Why?
why do we need a policy for plagiarism of public domain works. It is not illegal and lack of citation is already covered by established policy. Lack of a policy does not indicate need of a policy. Jeepday (talk) 13:28, 20 June 2008 (UTC)
The difference is between: (a) copying text and sticking a reference tag on something; and (b) explicitly saying (with quote marks or naming the source in the text) that you are using someone else's words to express their idea or concept. Wording like The event was described as "horrific" (Baker, 2001) and "horrendous" (Smith, 2007) as opposed to This was an horrific event <ref>Baker (2001)</ref>. Other forms (both acceptable) are This was an "horrific" event <ref>Baker (2001)</ref> and Baker, in his book published in 2001, described the event as "horrific". Again, not a great example, but the exact approach to take always depends on the exact context. As regards public domain material, Durova gave the example of Felbrigge Psalter. Look at the blockquoted section from Davenport (1903). If you fail to use blockquotes (or something similar), or fail to say "Davenport describes the back cover in the following manner", then you risk misleading the reader as to who is saying what. Consider it as being the difference between the editorial/authorial voice of the multitude of Wikipedia editors, and the voice of the sources. You can rewrite the latter to get to the former, but insufficient rewriting is plagiarism, and quoting without attribution is also plagiarism. Is that clearer? Have I got any of that wrong? Carcharoth (talk) 13:47, 20 June 2008 (UTC)
Sure it is wrong, but we are not going to block anyone for copying a public domain work. It happened en masse about 2002 when articles were imported for EB 1911 (a lot of those editors are probably highly respected admins now.) Another editor will come along and add the quote marks. That is the wiki way. Why do we need a policy for what editors do anyway? --Hroðulf (or Hrothulf) (Talk) 19:39, 20 June 2008 (UTC)
I'm a bit twitchy about the citation on the 1911 EB import, but at least the effort was made to credit the source—as far as I know, every one of those articles was started with a Template:1911 attached. As well, much of the 1911 material is dated in style or content, and I expect that it will tend to gradually erode out of Wikipedia as we get around to updating those articles.
I absolutely think we should block anyone who – after a warning – insists upon copying public domain material without attribution. Indeed, I will block any such individual who comes to my attention. (I would give credit to anyone who makes a good-faith attempt to cite their sources, of course—many people have never been taught how to properly footnote, and wikimarkup can be daunting even to academics. Just pasting a URL after a block of quoted text is enough of a pointer for a wikignome to use, and demonstrates the intention to give proper credit.)
Of course, I'm not sure that it's always wise to look back to the way things were done in 2002 to govern how we ought to manage things now. Just as one might say "another editor will come along and add the quote marks" now, back then someone might say "another editor will come and fact-check John Seigenthaler's biography". Out of that mess, we got a dreadful amount of bad press, and harsh policy imposed rapidly from on high: WP:BLP.
Do I expect this to become another Seigenthaler incident? Well, probably not. The press don't tend to have the patience or the appreciation for nuance to present this type of issue—but I could be mistaken. Still, this does have the potential to have a slow, steady, erosive effect on Wikipedia's reputation, and is likely to be most damaging among the experts whom we most want to recruit to our project. At some point – and I think it should be now – we have to stop saying "someone will fix it eventually" and start saying "let's start cleaning up, and let's not let it get any worse". TenOfAllTrades(talk) 23:01, 20 June 2008 (UTC)
Agreed. I also just found Category:Attribution templates (some useful ones there - there are a lot more than just the 1911 ones), another ANI debate here and I remembered my example of an on-wiki collection of PD-material - see here. Your comment (in the edit summary) that we need to step up is quite true. Similar sentiments were expressed here:
"When Wikipedia was young, the threshold for copying was similar to a blog or diary. Now that Wikipedia is established, firm and harsh rules must apply. Wikipedia must follow the same rules as print encyclopedias. No copying, no plagiarism, no moving a few words around. Those who do must be notified and asked to stop. We have to start acting like a trustworthy group, not a band of kids writing half-copied term papers. We also need to have good customer service and courtesy, not gossip, IRC, etc." Model710 (talk) 18:16, 19 June 2008 (UTC)
(edit conflict) To TenofAllTrades: After 1 warning? Boy, you are strict. We don't do that for vandals.
By the way, I agree that the EB 1911 import is horrible, and the pesky tag should be deprecated in some way. As far as I can tell we didn't have history back then (I am a 2006 newbie) so it is not always clear even to an editor where EB ends and Wikipedians begin. This is how I like to extensively copy from public domain sources, if I really have to, which is more or less how the Manual of Style tells me to.
We already have several policies and guidelines that cover public domain plagiarism and attribution in some detail. (I added some links and quotes above.) Maybe we need to add a brief sentence at or near Wikipedia:Copyright problems#Plagiarism that does not infringe copyright reminding people that changing a few words in a sentence or stream of thought is still copying? Instead of a brand new guideline page, perhaps what might be useful is a tutorial to help people teach themselves the difference between original writing and original research, as that is something distinctive to encyclopedia writing that you won't get in school.
After all that, all I am saying is: this is much easier to do than negotiating a new guideline page, getting consensus for its contents, and getting it read by new editors.
For what it's worth, when I said 'after a warning', what I really meant was 'after whatever warning framework we decide is appropriate has been employed'. I also didn't state that we would start with a ban indefinite block. Still, from the standpoint of protecting the integrity of the project, there are actually sound reasons why we might treat (some) instances of plagiarism more harshly than simple childish vandalism.
Detecting plagiarism in the first place is not necessarily straightforward. Often, material is plagiarised because it is good: well-written, detailed, professionally copyedited. As editors, we're sensitive to the insertion of content that is sloppy—stuff that introduces errors of spelling, grammar, style, or fact. We're usually quite grateful to see a few paragraphs of clean prose. Moreover, unless the contributing editor has left obvious web page formatting features behind (manual carriage returns at the end of each line, etc.) it can be very awkward to approach an editor to ask "Gee, your edit to foo looked awfully good. Did you really write that?"
As well, it is rare for a plagiarist to be caught after the first instance. An editor may copy material into Wikipedia for months or even years without the problem coming to light. Plagiarism is not as conspicuous as page-blanking vandalism. I have seen instances where a 'helpful' editor goes through a copy-pasted block of text to correct formatting and style and to add wikilinks. (WP:AGF is usually a Good Thing, but honestly—once a block of text is wikified, it is unlikely to receive further scrutiny.)
Enforcement is more challenging than for vandalism. A plagiarist may only be editing a few articles per week. Unlike the vandal who runs through three warnings in a couple of hours, followup is more difficult. I know of few admins whose memories are sufficiently reliable that they could consistently say, "Okay, I warned User:Plagiarist two weeks ago to start using citations; I guess I had better go back and check all his contributions since then." It is much less likely that other editors will spontaneously pick up on plagiarism compared to simple vandalism. Warnings may be less effective—in blind naivete, a good-faith editor may say "But I've been writing this way for years—why am I being hassled now? The admin's probably wrong." (The bad-faith editor might also observe that "Only one guy's noticed in two years...I can ignore the warning.")
We just don't have an infrastructure that is well-suited to catching plagiarised text. With images on Wikipedia, we deal with a discrete chunk of data; it has source info attached, and can be checked up on. (Perhaps instead of mentioning WP:BLP above, I should have brought up WP:NFCC as an example of where we were doing something that wasn't right by the lights of the project, and we finally got around to doing something about it.) With bare facts in the article body, we can demand citations to support claims, we can add {fact} tags, we can edit out unsourced statements. With plain prose, we have no straightforward mechanism for weeding out plagiarised stuff. It sits there, getting more and more deeply embedded—Wikipedia's dirty little secret. TenOfAllTrades(talk) 14:43, 21 June 2008 (UTC)
It may have been decided which was better without a debate. Someone wrote the block quote way into WP:V and WP:MOS and somehow it gained consensus. It is decent, honest and reasonable. Other people saw the 1911 example and continued to emulate it. (That was how Piers Sellers looked before I added the block quotes.) We should think of a kind and gentle way to stop them.
The 1911 example makes me uncomfortable for a number of reasons, but plagiarism is not the main one (after all the tag puts our collective hand up to that). The main issues for me are verifiability and accuracy (as none of those imports that I have seen yet cited EB 1911's sources!), and the stuff that was added after the import that never got attributed. This, to me, seems worse than the newcomer who comes along and writes a new article out of his head because at least we know that it did come from his head, and we should check it and fix it.
I've gone ahead and made a hack-job of a start for Wikipedia:Plagiarism. I do think it's an important enough topic to rate it's own guideline, so it can be easily linked. OTOH I won't shed a single tear if that mess I made gets reverted back. Oh yeah, can anyone spot the bit I plagiarised? :) Franamax (talk) 00:40, 21 June 2008 (UTC)
Why does that obvious remark about linking and sourcing require a separate page on "Plagiarism". I do not even see any evidence of plagiarism for that image. The uploader stated he was the photographer for the government agency that commissioned the photograph; that might be a lie but we don't know and it would be a bizarre and pointless lie. —Centrx→talk • 05:39, 30 June 2008 (UTC)
I wondered about that too and let it drop. Looking again at the essential "some context" link that MER-C provided, it seems there is more to the story than what is presented just on the face of the image page (though it's still a little confusing). I'm hoping that we can eventually somewhat separate image plagiarism from article plagiarism - they are somewhat the same but subtly different as well, it's difficult to separate the threads and proposed texts for these inter-related areas. Franamax (talk) 08:28, 30 June 2008 (UTC)
Wording of the proposed policy or guideline (1)
Thanks to Franamax for being bold and making a start on this. Some of the previous discussions and links in the "resources" section above will be useful, so we should mine stuff from there (um, with attribution within reason - attributing stuff from other Wikipedia pages in honoured more in the breach than in reality - altering stuff within a page is dealt with in the page history - acknowledging movement of text between pages is, and always has been, more problematic - many people don't see the difference). Anyway, my changes are offered up below for review if needed:
Example of how to quote and attribute material from other Wikipedia pages. [1] The astronaut example should also be given, I think, as that is more relevant here.
This bit here is a critical point. Is it right? Could it be made clearer?
I'm uncertain how to phrase the common facts and data bit. My attempt is here. Improvements and corrections welcomed.
I tried to address plagiarism of copyrighted works with this bit here. Again, improvements and corrections welcomed.
What else is needed?
The overall structure need an overhaul - some stuff in the lead section needs summarising and being moved to its own section below.
The resources section needs writing, with suitable external links - what links would be best?
The set of templates used needs to be overhauled and tidied up. Probably some new ones are needed as well, though that can swiftly become very bureaucratic.
Cripes Carch, you go on at length, I expand at the slightest provocation. We might as well bring in FT2 and ask NYB to do a cameo :) Which one of the Norns was it who held the scissors and cut the yarn when it was time? Or put another way, nice work, but we do have to watch the length to make this remotely accessible when it's done, so I hope we will all wield a judicious sword beside the pen. I'm happy to see this underway though :) Franamax (talk) 08:55, 21 June 2008 (UTC)
If we go all Greek, it was Atropos (yes, I had to look that up). I did write directly into the lead section, partly because the bits below were, well, not ready to be used yet! :-) Once the structure settles down, the lead section should be much shorter and most people will only read that. Some people will only read the nutshell. Do you agree with the concept of nutshells? Want to write the first attempt at a nutshell? :-) Carcharoth (talk) 09:14, 21 June 2008 (UTC)
I'm more from the Nordic tradition, there was one crone with a big foot from spinning the wheel, one with a big thumb from arranging the multi-coloured threads of the lives of men (women too I guess, they didn't language-ize in those days), and the third one with the big scissors, when the thread of a man's life was done, she cut the thread. Most mythological, also helpful at funerals. OTOH, Atropos gave us a useful drug. I'm iffy on nutshells, especially because they have to fit inside something small, but I gave it a try. The wonderful thing about Wikipedia is that you can be sure someone will come along sooner or later and fix up your lame efforts. And if they don't, maybe it wasn't so lame after all. :) Franamax (talk) 09:38, 21 June 2008 (UTC)
I dumped in a few examples on how to properly attribute PD material and sketched out some other stuff. What I'd like is
How to spot plagiarism, especially of the uncited type.
Cookie-cutter user warning templates
We need to say that Wikipedia is a scholarly work, etc in the intro. After all, this is why we're discussing this.
A good, free (software) plagiarism detector. Unfortunately, I don't think I'm 1337 enough for this task and my (offline) Wikipedia to-do list is embarassingly long as it is.
An article issue template/associated category, something like (once again, a crude sketch):
Scissors, glue but no (C)
This article may contain plagiarized material. You can help Wikipedia by editing this article and attributing it.
What do we do with articles which show up on WP:SCV and consist entirely of one copied sentence?
Incorporating other free content into Wikipedia is not plagiarism
The use of attribution templates to mark when free content has been incorporated into Wikipedia is a longstanding practice. As a free content project, we permit others to use our content if they attribute it to us - this doesn't mean they have to quote us, it means they can literally take our text and use it without quotation, provided they follow the terms of the GFDL. Similarly, provided that we give attribution in line with the license of the free material we use, it's perfectly acceptable for us to use other people's free content as part of our articles. We have always done so, both for images and for text.
This isn't plagiarism, it's simply the normal free-content process. If I copy some source code from one GPL program into another, and follow the terms of the GPL in doing so, I have not "plagiarized" the original program. Similarly, if I copy text from one GFDL project to Wikipedia and give attribution per the GFDL, I am not "plagiarizing" the original, I'm doing exactly what the author of the other content encouraged me to do when he or she made that content GFDL. This free-content model is very different than the academic model in which plagiarism is a concern. — Carl (CBM · talk) 02:38, 22 June 2008 (UTC)
I think this is a very important point, and I agree totally, as long as it is clear to everyone that "free content" in this context means content that is explicitly made free by its creator, not just content that has lost copyright protection. The 1911 Britannica is not "free content"; it may be freely copied, but to copy it verbatim without attribution and quotation is plagiarism, since the authors never intended that it be "free content" beyond the fact that its copyright would eventually expire. And certainly copying it verbatim without quotation in Wikipedia doesn't convert it into free content.
Arguably, material licensed by the Creative Commons share-alike license (derivatives allowed, no attribution required) could be used verbatim with no attribution. Is that plagiarism? How does Wikipedia simultaneously support academic integrity and free content? I don't see a conflict between the two. Because we're talking about plagiarism here, and not copyright, I think it's appropriate to clarify that this is a case where free content can be freer than public domain.--Curtis Clark (talk) 03:52, 22 June 2008 (UTC)
I think we're trying to be very clear here that copying free content is OK as long as it is attributed properly. It's actually exactly the same as the academic model - don't pass off work as your own, if you copied it, no problem, just say you copied it. Academic journal articles are packed full of references to other people's work, it's when you leave the references off that trouble arises. Curtis Clark raises an interesting point though - cc-sa, if no attribution is required, how do we handle that. That will need to be addressed in the guideline too I auppose, no-one said this was going to be easy! Franamax (talk) 04:14, 22 June 2008 (UTC)
That is partly a misconception about what we're trying to accomplish here, and partly a very valid point. No-one is trying to say that copying free content is wrong, we're trying to establish the parameters of how you copy things. No you can not cite Wikipedia as a source per se, but if you copy a chunk of text around en:wiki, it goes in the new spot under your own name/nym - now it's important to attribute the original authorship of that text, so people aren't misled into thinking those are your own sparkling pearls of prose. Keep in mind the distinction between using the wiki as a source for itself, which we just plain can't do; and copying around chunks of text within the wiki itself, which we do all the time, but need to be careful about.
And that said, I doubt that I've ever written a totally original FUR - I very commonly find a good one somewhere else, copy it and change the details. That's actually to be encouraged since it helps with learning and efficiency. The question we need to answer is - what are the boundaries and how do we properly attribute our copying? Franamax (talk) 05:09, 22 June 2008 (UTC)
Carl, you talk about GFDL, which is a good point, but what about public domain stuff? Material written by employees of the US government and released into the public domain on websites, for instance, should that not be attributed? Does releasing into the public domain remove the ethical responsibility of us, as Wikipedia editors, making clear who wrote what? I think the basic problem is that the way Wikipedia is written, it is exceedingly difficult to tell who wrote what - which bits were Wikipedia editors, and which bits are external bits imported in. And I think that distinction is important to maintain for editorial integrity, if nothing else. Carcharoth (talk) 09:25, 22 June 2008 (UTC)
Replying to several comments: Public domain text is one type of free content, which we can use; GFDL and CC-BY-SA are also acceptable to us. Regardless of how a piece of free content is free, we should attribute it. I think that if these practices are followed, there are few concerns:
Anything directly copied from an external source should be attributed, either as a quotation if brief or with an attribution template otherwise. Of course only free content can be directly copied outside of brief quotations. The dividing line is: if the external content is being used to reference our own writing, then it should be treated like any other reference. If the external content is simply becoming part of a WP article, then it should get an attribution template.
Extensive paraphrasing from a single document should be attributed similar to a direct quote.
When material is copied from one WP article to another, a comment should be left in the edit summary to mark this.
So effectively you're saying that even CC-SA should be attributed? That would certainly make the rules easier to follow, and I support as long as everyone understands that that's what we're doing.--Curtis Clark (talk) 19:54, 22 June 2008 (UTC)
Yes, I think that the same sort of attribution should be used as for all other free content that is incorporated into our articles. — Carl (CBM · talk) 20:30, 22 June 2008 (UTC)
If you'll require attribution templates, you'll need an attribution template suitable for the 1902 "Blue skinks of western Guatemala" when someone starts articles by copying from it. It's not a government document nor is it 1911EB, so which mandatory template is proper? -- SEWilco (talk) 13:21, 22 June 2008 (UTC)
Clearly the onus is on the person adding the material. There are a large number of templates available at Category:Attribution templates. If none of those will suffice, the person can create a new one, or just make a note in plain text. — Carl (CBM · talk) 13:25, 22 June 2008 (UTC)
A "note in plain text" violates the proposed practice of using an attribution template. Maybe you should be referring to attribution rather than attribution templates. -- SEWilco (talk) 18:46, 22 June 2008 (UTC)
Atrribution templates and when to remove them
I'm a little unclear on some of Carcharoth's (I think) wording, specifically in the lead: "subsequent rewritings should not lose the sense of the original, or lose track of where a concept, idea, or phrase originated from, unless the text has been so substantially rewritten as to be a new piece of work. A clear distinction should also be drawn between work submitted by Wikipedia editors as their own work (which can be "edited mercilessly") and Wikipedia editors submitting work written by other people (in which case, more care is needed)"
When the work is covered by an attribution template, I'd be leaning more toward yes it can be edited mercilessly, the sense of the original indeed can be changed, and eventually the origin of the concept, ideas and phrases will be smeared through the edit history, just like every other sentence we type in. Obviously this doesn't apply to blockquotes, which should only be wikilinked I guess - but even then, if you blockquote text containing "the planet Earth is flat in shape", I'm gonna have to do something about it!
Here's a simple case study, Anadyr River. The text originally copied from EB1911 is still there in ghostly form, but it has substantially changed in sense. "Barren and desolate" has changed with modern sensibilities to "tundra, with a rich variety of plant life" and "Reindeer...in considerable numbers" now includes "population...collapsed dramatically". Presumably, the EB1911 version has died a death of a thousand cuts due to the process of merciless editing.
Which leads to my other question, when should the 1911 attribution template come off, or should it stay there forever? If "reindeer" is still in the article, should that template remain? What's remaining in there? I can see "For nine months of the year the ground is covered with snow" - but I could write that with my eyes closed, it's Siberia people - so does that continued piece of text necessitate the template? There is another more compelling bit "the Ivashki or Ivachno", an obvious holdover from EB1911 (and quite likely not even right!). If I replace that, can I take the attribution template off? Please respond in the spirit intended, don't find another example to refute me :) Franamax (talk) 04:56, 22 June 2008 (UTC)
I'm happy with: "A clear distinction should also be drawn between work submitted by Wikipedia editors as their own work (which can be "edited mercilessly") and Wikipedia editors submitting work written by other people (in which case, more care is needed)" That is very similar to what Curtus Clark said above: "The 1911 Britannica is not "free content"; it may be freely copied, but to copy it verbatim without attribution and quotation is plagiarism, since the authors never intended that it be "free content" beyond the fact that its copyright would eventually expire. And certainly copying it verbatim without quotation in Wikipedia doesn't convert it into free content." He also said "this is a case where free content can be freer than public domain" and I'm going to try and work that into the article, while noting that explicitly releasing into the public domain is different ethically (maybe not legally, but then I'm not a lawyer) from something falling into the public domain (ie. copyright expired). Having said that, there does come a point when the passage of time and change obscures things to such an extent that original attribution is no longer possible and no longer makes sense. T-shirts of the pyramids as compared to t-shirts of the Eiffel Tower, maybe? To get back to text examples, people using the plot of the Odyssey in their stories, as opposed to using the plot of the Da Vinci Code (let's not look at where the plot for that book came from). An author might genuinely not realise that he had rehashed Homer's Odyssey until someone points it out to him. Ditto for people writing an encyclopedia article on Wikipedia. They may not realise that the article they expanded and rewrote and took to FAC contains scattered sentences that are remnants from Britannica 1911 text. If the attribution template has been removed, and the sentences in question have not been put in quote marks and cited to the 1911 Britannica, I think that would be misleading. Easily fixed, but still unfortunate.
I'm less happy with "subsequent rewritings should not lose the sense of the original, or lose track of where a concept, idea, or phrase originated from, unless the text has been so substantially rewritten as to be a new piece of work". Maybe something along the lines of: "Rewriting old public domain text is difficult, and care is needed. Distinctive phrases and original concepts and ideas should still be attributed after the rewriting and updating, as well as attributing the newer ideas and concepts introduced in the rewriting."? Carcharoth (talk) 09:20, 22 June 2008 (UTC)
Responding to your case study, I like your phrase "Presumably, the EB1911 version has died a death of a thousand cuts due to the process of merciless editing."! My view is that incremental change is possible, but that as newer sources are added, the attribution of the remaining bits of the older source (here 1911 Britannica) should change from a general attribution template at the bottom, to inline citations for the remaining possibly incorrect bits, or merely distinctive bits. The former should be checked and changed (and marking these bits as such helps later editors), and the latter should be placed in quote marks. Thus you make clear during the process which bits are from where, aiding both verification and attribution. Carcharoth (talk) 09:36, 22 June 2008 (UTC)
And comparing this with this (global diff here). Let's quote the initial 1911 entry in full:
ANADYR, (1) a gulf, and (2) a river, in the extreme N.E. of Siberia, in the Maritime Province. The gulf extends from Cape Chukchi on the north to Cape Navarin on the south, forming part of the Bering Sea. The river, taking its rise in the Stanovoi mountains as the Ivashki or Ivachno, about 67 deg. N. lat. and 173 deg. E. long., flows through the Chukchi country, at first south-west and then east, and enters the Gulf of Anadyr after a course of about 500 miles. The country through which it passes is thinly populated, barren and desolate. For nine months of the year the ground is covered with snow. Reindeer, upon which the inhabitants subsist, are found in considerable numbers."
The bits from the current article that remain appear to be: "The river rises in the Stanovoi Mountains as the Ivashki or Ivachno, about 67̊N latitude and 173̊E longitude, flows through Chukotka Autonomous Okrug, at first southwest and then east, and enters the Gulf of Anadyr of the Bering Sea after a course of about 800 kilometres (500 mi)." plus the other bits as well. It would help the reader if the bits from the 1911 text were explicitly marked as such. I may try and do that now. The other point is that the Gulf of Anadyr text ended up at Gulf of Anadyr, not in the original text, but with the merge at this point, leading to: "It is in the northwestern part of the Bering Sea and extends from Cape Chukchi on the north to Cape Navarin in the south, forming part of the Bering Sea." (compare to the text above), and now reads "It is in the northwestern part of the Bering Sea and extends from Cape Chukchi on the north to Cape Navarin in the south." That insertion should be cited as well. Now, I have no hope that this will be done throughout Wikipedia for all the 1911 or other imported PD-text (some of my articles suffer from vague attribution templates instead of specific inline cites), and it is long practice that vague cites at the end of an article are OK compared to specific inline cites, but the point still needs to be made that there is a difference between paraphrasing (with a general cite at the end) and copying verbatim and ending the article with a general wave of the hand to say "large bits of this text are from here". The point being that once other people start rewriting the article, the general citation becomes useless and you have to dig through the history to find out which bits come from where (again, this is also a general problem with Wikipedia as a whole - and was mentioned by one of the candidates in the WMF board election - in relation to BLPs I think). Carcharoth (talk) 09:59, 22 June 2008 (UTC)
While you can add these if you like, they are in no way required. The attribution template clearly and directly says that some text was taken from the EB 1911 - nobody is being deceived. This is why I edited the proposal yesterday to point out that there is no immediate need to change articles that use attribution templates. The practice you describe as "copying verbatim and ending the article with a general wave of the hand to say "large bits of this text are from here"." is perfectly fine provided that the material added is not a copyright violation; see my comments above about free content.
Especially in the case of GFDL attribution templates, the attribution template should never be removed, as the entire article is a derived work of the other document as well, regardless whether the text has been edited out over time. — Carl (CBM · talk) 11:38, 22 June 2008 (UTC)
What about the intermediate stages where some rewriting has occurred and it is no longer clear which bits of text are from where? The licensing concerns are covered by the attribution template and the page history, but the specific attribution of fragmentary remnant of text is not covered this way. Imagine that the article has been printed out. Carcharoth (talk) 12:00, 22 June 2008 (UTC)
I don't see that as a problem. If we have incorporated other free content into WP, that content literally becomes part of WP, so there is no reason that it needs to be possible to tell which parts of the article are descended from the old source and which are not. The attribution note at the bottom (which should appear in a printout as well) is a clear notice that some of our content is shared with the other source. I think it's important to keep in mind that in these situations we are not using the other material as a reference, we are relying on the same references that the other material did, and just using the text from the other material. So there is no reason to think that that text should be marked in the same was as a reference. — Carl (CBM · talk) 12:13, 22 June 2008 (UTC)
It depends on the context. A skilled reader can detect bits of 1911 or Catholic Encyclopedia text sticking out of an article like a sore thumb. Others won't, and that will deceive some readers. Take Pope Agapetus I. There are two attribution templates at the bottom. Can you tell which bits are from which source? If so, how? My point is that best practice will take a more conservative approach than just using attribution templates. At any point where the reader is left wondering "who is saying that?", some more specific attribution of the text is needed. And as far as "we are relying on the same references that the other material did, and just using the text from the other material" - that is OK as long as we can tell which bits in the article are from where. Carcharoth (talk) 12:35, 22 June 2008 (UTC)
(unindent) The simple solution would have been to require the attribution template to link to the diff showing the actual text being added and/or to a wikisource version of the text. That would allow an easier comparison. It still doesn't get over the problem of "authorial or editorial voice", and the need to accurately make clear which sources are "talking" in a particular paragraph or sentence (with the default being that the editorial voice is Wikipedia's unless stated otherwise). Carcharoth (talk) 12:37, 22 June 2008 (UTC)
A further point: "If we have incorporated other free content into WP, that content literally becomes part of WP, so there is no reason that it needs to be possible to tell which parts of the article are descended from the old source and which are not." - that is fine for Wikipedia editors who have clicked "save" and agreed to their text being mercilessly edited. It is not fine for text written by people who never agreed to have their text mercilessly edited. Carcharoth (talk) 12:43, 22 June 2008 (UTC)
Re the last several comments: I don't think it matters whether we can tell which parts are from which source. The reader doesn't have to wonder who is saying what: Wikipedia is saying all of it. This is what I mean about the difference between referencing text and incorporating it. If we were using the text as a reference, it would be necessary to be precise about what we are referencing. But if we are using the text in our own voice as part of our article, we don't need to mark it in any special way. The attribution template gives credit that the work of others has been used to help build Wikipedia.
Compare programming: there's no reason that I would go out of my way to mark which lines of code I took from another GPL program when writing my own program. I would, however, make a note that I have used some code from the other program. In the end, the code I copied and the code I wrote myself form a unified program, and someone reading that code doesn't usually worry about which parts came from where. The underlying premise of free content is to share (both give and take) with other free projects. — Carl (CBM · talk) 12:47, 22 June 2008 (UTC)
"The underlying premise of free content is to share (both give and take) with other free projects." - this runs straight into one of the major differences between free content and copyright-expired public domain material where the authors had never even heard of the free content movement. Let me give you an example. If some famous speeches fall into the public domain, how would you suggest the reuse of such material is handled? People attribute in those cases because they know they are using the power of someone else's words to give power to their own ("as Martin Luther King said, "I have a dream that one day this nation will rise up...", and that is how I feel about <insert issue here>"). Similarly, if we are using the power or credibility of someone else's text, that should be acknowledged at both the article level and the individual "distinctive phrase" and "individual research" level. Don't get me wrong, if the text used to present an idea or concept is rewritten enough in your own words, then just an inline citation is all that is needed. But it should be clear, if you don't cite your sources, you are either expressing your own opinion, or you are plagiarising someone, even if you are not fully aware that this is what you are doing. And even if you cite the source, you may still be plagiarising if the material is badly paraphrased or rewritten at any point after the text is added to the article (an over simplified example of this is removing quote marks from a quote, but a more insiduous change is introducing a modern idea next to an old one, and not making clear what has been changed and which bits come from where).
To expand on that last point - if a distinctive opinion or phrase is used from an old public domain text, should we not be more specific about who's voice is talking? Used in isolation, without the rest of the text, you would rightly insist such an "opinion" paragraph is quoted and attributed. Used as part of huge chunk of text, you say that a general attribution template is sufficient. I'm saying that if that large chunk of text is rewritten until only the distinctive sentence remains, then the attribution template is insufficient and we need to directly quote and attribute the fragment as if it had been placed in there on its own. Two different routes to the same result, but inconsistent templating and citing of the end result, depending on the route taken. Is that any clearer? Carcharoth (talk) 13:28, 22 June 2008 (UTC)
If only one sentence remains of the original, and you want to add more specific attribution for it, that is of course fine. But that doesn't mean that we should change the general practice of incorporating free content (which includes both public-domain content and content that was explicitly placed under a free license) into our articles, marked by general attribution templates. These do not rely on the power or credibility of the original text; they simply rely on the same credibility as the rest of the wikipedia article, the credibility that the wikipedia editors have done a good job with the article.
The attribution template is not meant to give extra authority to the article, it simply gives credit to the authors of some text that has been used in the article. For very old texts, I expect that a lot of editing will need to be done to bring the content up to date, so I would give more credibility to the contemporary wikipedia version than the original. I think that very little of the content marked by attribution templates consists of distinctive, memorable phrases. Most of it is just ordinary prose about the topic at hand. — Carl (CBM · talk) 16:25, 22 June 2008 (UTC)
That's the point, though. A general attribution template does not give specific credit, because it doesn't say what it is giving credit for, but only gives general credit for some unspecified amount of text that might not even be there any more. But leaving that to one side for the moment, what do you think about attributions templates having a "link to the diff showing the actual text being added and/or to a wikisource version of the text", as I suggested above? wikisource:1911 Encyclopædia Britannica exists (though it is incomplete). Indeed, at wikisource:1911 Encyclopædia Britannica/Vol 1:16, there is a link for "Anadyr". Browsing through there, I see that wikisource:1911 Encyclopædia Britannica/Amber (resin) is there. I looked around a bit and found Template:1911EB, and added that to our amber article. Now imagine that this could be done for all public domain materials added using the attribution templates. People could compare the two articles much more easily. Surely this is a better method than using attribution templates and then losing the original text in a series of changes and rewrites that leave it unclear which is which? Carcharoth (talk) 17:05, 22 June 2008 (UTC)
In the case of Anadyr River, I just hit "earliest" as it seemed the logical thing to do. I had been thinking of that issue though, how do you know in general what is the "template attributed" portion of the article? I kept quiet about it because of the immense volume of previously imported material. However, I do like the idea of specifying in some way exactly what the PD content was, before it went through the normal wiki slice-and-dice. Franamax (talk) 18:03, 22 June 2008 (UTC)
It should all be identified and stuck on Wikisource and then proofed again. Failing that, a new source should be found and there should be an effort to link to wikisource versions of the original text. While checking some of the Anadyr River stuff, I've discovered that the link to Stanovoi Mountains (the supposed source of the river) takes you about 3000 kilometres to the south-west, down near Lake Baikal. It seems the names used in the EB 1911 are rather out-of-date as well. I've been trying to sort out exactly what mountains are which, but it is a confusing mess. And I'm still no closer to finding out what the current name is of the mountains that the Anadyr rises in. Hopefully the RefDesk people will be able to help. Carcharoth (talk) 18:11, 22 June 2008 (UTC)
There's another excellent reason to carefully attribute the text - so we can say it was someone else's mistake! Looks like the source is in the Khrebet Kolymskiy to me. I'd been thinking about this a little more, seems to me there is some distinction between the grizillion articles we have stuffed with unattributed factoids and the few where virtually every sentence has a reliable source. Eventually we will get all of them to the high-quality state and at that point, we will be referencing every sentence that came from EB1911. Until then, we're in a kind of indeterminate state, although that of course is why we're trying to create this page. Franamax (talk) 18:31, 22 June 2008 (UTC)
A requirement for the original to be stuck on Wikisource seems at conflict with our acceptance of sources which are not online. Do we also have to stop accepting printed sources because their text is not available online? I've reused PD text where even I didn't have the original in a version that could be put on Wikisource, because I did not type in the text exactly as it was on paper (such as due to rephrasing or replacing "f" with "s"). -- SEWilco (talk) 18:55, 22 June 2008 (UTC)
I am only talking about complete documents for a wikisource link. If you've typed out a complete PD-document, it should go on wikisource anyway (if they will take it). If you've only typed out a small bit, then you just quote and cite a source as normal. Rule of thumb: if it is not your work, don't hit save in such a way that it implies that you wrote it. It is also good practice to provide the original, and only then make changes, however minor. Carcharoth (talk) 19:29, 22 June 2008 (UTC)
And certainly, if you've accessed the source online, it's incumbent on you to provide the link, even if it's a TIFF scan of a printed document. The key themes here are to make sure people are aware of your source (for instance, you may be rewriting the PD source in modern English, but the structure and content of the writing are not yours); and to provide the original raw reference online if it can possibly be done. Franamax (talk) 19:37, 22 June 2008 (UTC)
So now we'll be able to be blocked if someone finds an online source and thinks we violated the requirement to provide the link. -- SEWilco (talk) 20:46, 22 June 2008 (UTC)
Not at all. This is still a proposal, and if you provide a citation without a link, that is fine. It is just typing in the text without saying where it came from that is a problem. Public domain doesn't mean we don't need to say where things come from. If someone else finds an online link, they can add it to what you wrote and point it out to you, and you might then thank them for finding that online source. Carcharoth (talk) 21:34, 22 June 2008 (UTC)
Re Carcharoth:
"A general attribution template does not give specific credit, because it doesn't say what it is giving credit for, but only gives general credit for some unspecified amount of text that might not even be there any more."
This is all that we promise our own contributors, as well - that other people will give them some sort of general attribution when the content is reused. The GFDL doesn't require that we actually provide diffs to the content that each editor on WP has contributed (the history page is for our convenience, but isn't required for GFDL compliance). I don't see why we would need to do differently for other free content.
There's nothing dishonest or sneaky about explicitly saying, "this article uses text from X", any more than it would be dishonest for a program to say "this program uses some source code from program Y". I don't see that there is a need to track down which parts of which articles came from which sources. If I don't like the text in an article, I can always improve it, regardless where it came from. Knowing the original source of the text isn't particularly important, provided it isn't a copyright violation. — Carl (CBM · talk) 18:28, 22 June 2008 (UTC)
That's an interesting statement. It seems like a lot of attention is paid to maintaining correct article histories, often with the specific motivation of "required by the GFDL". Franamax (talk) 18:35, 22 June 2008 (UTC)
I think the GFDL is invoked because the article history is often the only record of the editor, and thus is the record of the holder of the copyright for that edit. When public domain text is reused, there is no copyright tracking requirement of the original text and the editor only has obvious copyright on whatever changes he made (the unchanged PD text technically has no copyright protection, but in practice it is hard to identify whether changes have been made). The original text has no legal protection from being changed, and as Wikipedia points out once it's in Wikipedia it may be edited mercilessly. Requiring original text be quoted and obsessively identified leads to walling off text such as this; if you look at the current article you can recognize some of the original text exists, although much has been rearranged (such as putting descriptions of one part of the building together). -- SEWilco (talk) 19:15, 22 June 2008 (UTC)
But this is not about legal protection and copyright. It is about plagiarism and academic honesty and making clear who said what. When we use the text of a long-dead author who never gave permission for that work to be "mercilessly edited" (the bit on the edit screen says "If you don't want your writing to be edited mercilessly" (my emphasis)), then we have a moral responsibility to take more care, and this also holds for living authors of public domain text. Carcharoth (talk) 19:22, 22 June 2008 (UTC)
The original authors can be protected from our damage by not mentioning them. There is conflict between giving the authors the credit which they deserve and protecting them from being blamed for the result of our work. We have to be able to edit things to improve them, but that risks damaging someone else's work. -- SEWilco (talk) 20:54, 22 June 2008 (UTC)
Our attribution does not claim we are making the same arguments as the original author, or that the original author would agree with our article. It only claims that we have taken some text written by the original author, and possibly edited it significantly since then. Its purpose is to provide partial credit for writing text, not to associate us with the author's opinions or associate our opinions with the author. — Carl (CBM · talk) 20:58, 22 June 2008 (UTC)
Keep in mind that an attribution template tends to behave like having a similar citation as a Reference, where the citation merely states that a certain book was used as a source but does not identify what parts of the article have information from the book. Such a citation only states that someplace in the article are concepts or text from the cited source. Both concepts and text might vanish during later editing; at what point does one decide that concepts or text have changed "too much" for a cited source to be obsolete in the article? How does one measure text changes? -- SEWilco (talk) 19:15, 22 June 2008 (UTC)
That's a weakness of the wiki-system, not a strength, and is why the page history is so vital to piece together what happened. Carcharoth (talk) 19:18, 22 June 2008 (UTC)
(to CBM) What if there is a possible mistake? Knowing where the original text came from is useful then. That is an argument for having a link to a wikisource version of the text, which would address some of my concerns. It still doesn't address the concern that the model of taking a chunk of PD-text and rewriting it on-wiki can cause confusion if done poorly. Again, having a link to a wikisource version (or a permanent record to the moment the PD-text was added) would allow people to compare the changes. You will say that this is no different to being able to do this for any text anyone adds, but the difference here is between a Wikipedia editor adding their own text, and adding work done by others. When you add work done by others, it is important to link to a clean version of the original text, and to ensure that more care is taken to integrate the text as compared to that written by other Wikipedia editors. It is the difference between a Wikipedia editor writing something and saying "what I've written is based on this source", and a Wikipedia editor copying and pasting something and saying "I've copied this from this source, now I'm going to leave it here for anyone to edit as they see fit". It's not the same process because the former is someone creating a paraphrase of the sources and then leaving it for others to edit, while the latter is taking a chunk of original text direct from a source, and leaving it for others to edit. Carcharoth (talk) 19:17, 22 June 2008 (UTC)
I don't see that there is a large difference. Whether free content is originally written by a WP editor, or written elsewhere and then incorporated into WP, I think we should give attribution and then simply treat the text as part of the article to be improved by others. This is the very purpose of free content - that it is not necessary to redo everything over and over again. The work of others can and should be reused to create new works, and once incorporated into a new work it can and should just be edited as part of that new work. Those who create content know perfectly well that, under our system of copyright, their work becomes public domain under certain circumstances, and the implicitly agree to this when they publish their work.
As for providing links to the original source, I believe we already often do so via our attribution templates. But I don't see that there is a significant benefit in giving explicit diffs and quotes (if there was, it would have been evident when the EB material was added in 2002, no?). Because we aren't using the incorporated text as a reference for our own claims, but are simply repeating the same claims that the incorporated text made, checking our claims will require sources other than the text that was incorporated. If we change our article to no longer agree with the original incorporated text, this is no problem, since we are only claiming that our text was originally taken from the source, not that our text agrees with the source or continues to rely on the source in any way. — Carl (CBM · talk) 20:45, 22 June 2008 (UTC)
But Wikipedia is not based on the US Constitution, nor the US Bill of Rights. Many, many of those protections are not present here, to take an example, Wikipedia fair-use is not the same as US consitutional fair-use at all, and many challenges to blocking based on free-speech rights have been batted down out-of-hand. We have the opportunity and the right here to develop a global resource that does not rely solely on the wording of the consitution of one country. Franamax (talk) 21:49, 22 June 2008 (UTC)
First off, "Those who create content know perfectly well that, under our system of copyright, their work becomes public domain" does not apply to the content creators of EB1911 in any way. Secondly, I think what Carcharoth is saying is that it is important to denote which exact text was originally copied from an external source, in the case under discussion, a source which has become public domain only through the expiration of copyright, not through explicit granting of rights by the creators. Take a look at Anadyr River and its talk page, we've actually discovered some inaccuracies brought on by time. As with all of Wikipedia, we don't claim it to be a definitive resource, we only claim it to be a resource to find the original sources. In the example of that article, it becomes important to let the user discover why it is claimed to rise in the Stanovoi Mountains. We constantly harangue the younger ones to be aware of article history, talk page discussions, the necessity of checking sources. Here we have a case of a blanket attribution to EB1911, not available online, and now seemingly demonstrated to be incorrect because terminology has changed. Easily available references to the original source are indeed beneficial here - not everyone is so accustomed to the wiki-hunt. Franamax (talk) 22:12, 22 June 2008 (UTC)
Carcharoth has asked me to join in, so here are a few quick comments before I come up to speed. For the EB1911, Wikisource has the entire wikiproject dedicated to the transcription project, and a complete set of pagescans with an index to assist people find the article they want. see s:User:Tim_Starling. We also have tips on how to obtain the text without transcribing it, as there are many online copies, however they are sadly all either very poor quality, or they are not faithful to the original (i.e. they have done unspecified value adding, and are usually vague about what they have added; in short they have injected enough copyright material in order for them to enjoy the benefits of copyfraud.) We have a few people who regularly add a few pages per month, and the style guideline has recently been updated to gel with current best practises. In short, a Wikipedian should be able to put their first EB1911 article on Wikisource with only about 5 hours worth of stumbling along and asking questions as they go.
As it happens, we believe that every EB1911 article has an equivalent in WP, as of early 2006. Given this claim, there is no space left for someone to put their "first EB1911 article" up. I'm being a little cagey here, because I believe the 1911 index was first cleansed by a script that determined "WP already has an article on this topic" based on subject alone, and obviously the topic could be an entirely different person/thing with the same name. I did find at least one example.
However, the initial phase of dumping 1911 content into EB (and, yes, I see the discussion below, but it does seem to be a loud case of stable-door-slamming) did result in some poor quality copies, especially by some over-enthusiastic authors who uncritically copied text from other online transcriptions, complete with OCR errors (Greek text, yuck) and other intentional corruptions. That's why there is a project to go slowly through the archive, comparing WP with copies of the 1911 ur-text, either Tim's or Wikisource's. That is naturally a slow project that turns up few positives, but I've been refining the technique by working through the B's on and off and have the end in sight. More at Wikipedia:WikiProject_Missing_encyclopedic_articles/1911_verification: also here and here. David Brooks (talk) 02:47, 23 June 2008 (UTC)
Thanks! Some interesting links. I obviously didn't dig enough through the old EB1911 discussions. Would you have any comments on how the lessons and ongoing approaches from that could be applied to other cases, either other projects to systematically import PD-text, or one-off cases of a single editor importing a chunk of PD-text? One more question. When you say "a project to go slowly through the archive, comparing WP with copies of the 1911 ur-text", what would that have done (or did do?) for Anadyr River, where some of the original text remains (but seems to have had errors, see Talk:Anadyr River), but was split between two locations (some got merged without attribution to Gulf of Anadyr). It didn't help, I suppose, that the original 1911 EB entry was their equivalent of a disambiguation page. Oh, and sorry about the loud stable-door slamming! :-) Carcharoth (talk) 03:11, 23 June 2008 (UTC)
The need for specific attribution in relation to plagiarism
This is another attempt to explain what I see as the fundamental problem in relation to plagiarism and dumps of PD-text into Wikipedia articles. If an academic, whose career depends on their academic integrity and intellectual honesty, wrote any sort of resource or document or encyclopedia article, and based it largely on a copy of text from PD sources like the 1911 Encyclopedia Britannica, made only a few changes and expansions, and then stuck a footnote at the bottom of the new article saying "This article incorporates text from the Encyclopædia Britannica Eleventh Edition, a publication now in the public domain", they would quite simply get laughed out of the building no matter how much they insisted "but I was co-writing the article with the authors of the 1911 Encyclopedia Britannica". It is the difference between doing the proper research needed to write a proper article and write and paraphrase from your sources (rather than copying them), and taking a lazy shortcut and reusing the work of others.
The critical thing is deciding at what point the article becomes our own work again, independent from the work of the 1911 Encyclopedia Britannica authors. That can only be answered by having the original text and comparing it side-by-side with the Wikipedia article. Then you look for bits of unaltered text, and additions of new text, and corrections of old text. At some point, the old text will have been rewritten, rephrased, reordered and corrected enough to count as "our" work and not "their" work. This is the normal process of writing from sources that occurs on articles every day on Wikipedia, the rewriting taking place in Wikipedia editors' heads between their books or other sources, and the Wikipedia servers. With the 1911 and other PD text, the rewriting from the sources takes place not in people's heads, but live, on the wiki.
At the end of this rewriting process, some fragments of the original text may remain, and those can be placed in quote marks and specifically attributed. At that point, the article should have reached the point where we can honestly claim it is our work. Until that point, though, the article will always be open to charges of being a work plagiarised from the Encyclopædia Britannica Eleventh Edition. A generalised attribution template at the bottom of the article is sufficient to cover licensing concerns, but not plagiarism concerns. Ditto for any other articles based largely or solely on PD-text. The exception being when the entire article is quoted verbatim, but that is the domain of wikisource, not Wikipedia.
Short version: a spectrum exists from a pure, original PD-text (eg. on wikisource) to a fully rewritten and corrected article (here on wikipedia). The process in-between, if the changes are not specifically attributed to the new sources and it is made clear at every stage that the rest of the text is still from the original source, is where we are open to accusations of plagiarism, of having articles based on the work of others rather than our work, or of mixing the two and not making clear what the differences are. Legally, there is no problem, but there can be problems in terms of intellectual integrity and honesty. Carcharoth (talk) 23:01, 22 June 2008 (UTC)
Actually, I should probably make clear that I'm talking here about a "muddled sources" type of plagiarism, rather than a full-blown "claiming the works of others as your own" plagiarism. They are both still plagiarism, but the latter (intending to deceive) is more serious than the former (being incompetent). Carcharoth (talk) 23:08, 22 June 2008 (UTC)
I'll pretty much second Carcharoth here. There seems to be some fundamental confusion or conflict here between those who think that copying free sources is fine no matter how it's done, and those who wish to adhere to a more hard line taking in the moral and ethical aspects of proper attribution (general statement not directed at any participants in this debate). The latter is perhaps a new initiative which doesn't necessarily align with the attitudes of 2001/2, when the imperative was to expand the encyclopedia (an era which I cannot describe first-hand). We are indeed arrived at a new age, with new responsibilities. Franamax (talk) 23:18, 22 June 2008 (UTC)
I'll third Carcharoth :-) I've tried to explain this to users after tagging pages with {{copy to Wikisource}}, but this insidious practise is ingrained into the Wikipedia culture, and many think it can be fixed by more clearly tagging where the content came from. Wikipedians didnt write it, it is not created under the GFDL, and it smacks of copyfraud, sorry to say it. John Vandenberg(chat)23:29, 22 June 2008 (UTC)
Hello, I am the insidious practitioner that Jayvdb mentions ;^). I just wanted to mention that in the case he links to there, not only is there a "This text was taken from..." notice, there are also 12 <ref/> citations attributing the copied text to its original source, one on each paragraph. I'm also working on a rewritten version of that particular article, incidentally. --❨Ṩtruthious ℬandersnatch❩13:35, 23 June 2008 (UTC)
Carcharoth, it's simply false that a WP article that directly says "this article took text from source X" is subject to any serious criticism for plagiarizing source X. Attribution and plagiarism are, by definition, incompatible. Reusing the free content of others is not a "lazy shortcut," it's the very essence of what wikipedia is about. There is no "our work" and "their work", there is simply free content, which we are always free to use (and have publicly declared our intention to use).
Of course the free content methodology we employ would not be acceptable for academic writing, but that isn't related to plagiarism in any way. You're misapplying a concept from academic writing to wikipedia, despite the stark differences between them. — Carl (CBM · talk) 00:03, 23 June 2008 (UTC)
I realize this is strongly worded, but I think it's necessary to directly point out that the argument you are making seems predicated on the idea that someone might criticize us for putting free content in an article without giving the author credit, when the article in question directly gives them credit, and the article is part of a project that explicitly promotes the reuse of others' free content. I don't see any traction in such criticism. — Carl (CBM · talk) 00:14, 23 June 2008 (UTC)
There is still the issue of what text was taken "from source X". How much was tooked, how much was an original contribution? Where is the indication for the casual reader who wishes to enquire? And again, re-using the free content of others is not a problem - re-using it without indicating exactly what has been re-used is. Franamax (talk) 00:13, 23 June 2008 (UTC)
There is no free content license that I am aware of that would require that. The GNU people spent a long time writing the GFDL, and the Creative Commons people spent a long time writing their licenses. If there were serious concerns about this among the people who spend copious time thinking about free content, wouldn't one of those groups have raised them or written them into the license? The standard, accepted practice in free content is simply to give credit that content has been taken from another source, which we have always done. There is no substantial difference between "original contributions" and other free content that we incorporate. Any standard we apply to other free content should also apply to original contributions of wikipedians, and vice versa. — Carl (CBM · talk) 00:19, 23 June 2008 (UTC)
Was the Creative Commons setup with the aim of creating new content, or reusing old content? Why do you think is is called "creative"? Because people wanted to be creative with the way they reused old works, or to create a common area for people to freely share what they created? Carcharoth (talk) 00:49, 23 June 2008 (UTC)
Free content and public domain are subtley different things. The issues being discussed here are one of those areas where free content is freer than public domain. When you say "others' free content", you still fail to see the difference between explicitly released work, and copyright-expired work. Regarding definitions of plagiarism, I admit I am using an "academic" model of plagiarism, but under that definition, attributions and plagiarism are not incompatible. See here and here, and you will see that mixing up several sources without clearly delineating what bits came from where is still considered plagiarism. But let's put that to one side and just call it "muddled sources" or "unclear sourcing" or something less emotive. Can you accept that: (1) the process of starting on Wikipedia with a block of PD-text, and rewriting that text, should eventually end up with an article the same (or very similar) to (2) an article that is written the normal way, and which only paraphrases and cites whatever it needs from the older, PD-text (linking to something like wikisource)? Or do you take the position that published copies of Wikipedia, or "finished" articles (eg. featured articles), can be articles that consist solely of PD-text from another source. To take an example, if some eminent writer wrote a comprehensive, high-quality article on a topic, and the text was released into the public domain, would you be happy with that article being used by Wikipedia and promoted as a featured article? As Jayvdb says: "Wikipedians didnt write it, it is not created under the GFDL, and it smacks of copyfraud". Or to put it another way, do you consider the articles created by importing PD-text to be "finished" articles? If not, at what point do they become acceptable? Carcharoth (talk) 00:49, 23 June 2008 (UTC)
If someone wrote an article that happened to meet all our FA standards, and released it under a free license, then of course we could copy it here and put it up for FA immediately. I doubt that it would sit unedited for long, since people have different tastes, but in theory it could become a featured article without anyone else editing it if the person did all the work somewhere else. Indeed, that's the type of cross-project collaboration that we want to encourage.
Stubs based on EB1911 are not going to be featured articles, however. I'm all for giving attribution to EB1911 or any other source when we use content. What I object to is the argument that doing so somehow constitutes plagiarism. Wikipedia articles are not term papers, and have different expectations of originality. — Carl (CBM · talk) 01:39, 23 June 2008 (UTC)
How many of the Wikipedia pages which started from an EB1911 article clearly indicate who the EB1911 author was? Do you realise that is illegal in many countries?
There is a big difference between "free license" and "public domain". The former has been donated to the commons with strict instructions on copying/use/reuse; the later has not, and is in the commons as enough time has passed that it is considered more useful for the common good that it is able to be readily copied, studied, and re-used. Good scholarly practise is to not alter literary works of others; or, if it is altered, the changes are clearly documented and justified.
And this is not just a nice cute little practise only done by scholars: where moral rights apply it is legally actionable if someone destroys the integrity of a public domain work; that is called abusing the public domain, and is akin to copyfraud if ownership over the new work is asserted, as is done by Wikipedia. --John Vandenberg(chat)03:23, 23 June 2008 (UTC)
Some quotes:
"Moral rights include the right of attribution, the right to have a work published anonymously or pseudonymously, and the right to the integrity of the work. The preserving of the integrity of the work bars the work from alteration, distortion or mutilation. Anything else that may detract from the artist's relationship with the work even after it leaves the artist's possession or ownership may bring these moral rights into play. Moral rights are distinct from any economic rights tied to copyrights. Even if an artist has assigned his or her rights to a work to a third party, he or she still maintains the moral rights to the work." - Wikipedia article on moral rights
"Independent of the author's economic rights, and even after the transfer of the said rights, the author shall have the right to claim authorship of the work and to object to any distortion, mutilation or other modification of, or other derogatory action in relation to the said work, which would be prejudicial to the author's honor or reputation." - Article 6bis of the Berne Convention for the Protection of Literary and Artistic Works
What I'm not sure of is how that applies to authors who are dead (John's comment about "abusing the public domain" probably applies), and whether this is mentioned in the various free licences people talk about. I suppose you could argue that Wikipedians, by editing the work of others, lose all moral rights over the work they submit. I must admit that I had heard of moral rights, but hadn't realised they were so explicitly mentioned in the Berne Convention. I should have remembered the blurb at the front of a lot of books where the author asserts their moral right to be identified as the author of the work. Carcharoth (talk) 04:34, 23 June 2008 (UTC)
Whether a reused PD text must be edited depends upon the text. There is no need to assume that reused text will have to be edited before it can be considered as truly being a Wikipedia article. And we also encourage editors to contribute how they can, even if it's just fixing a typographical error or filling in a gap with information which they happen to have (I recently was able to describe what someone had done during a several-year period which had not been mentioned in their biographical article, but that's all I had about the person). If all that someone can contribute is a relevant text, we recognize that it is at least a contribution to the effort and if someone can improve it then so much the better. But there is no need to assume that reused text must be altered before it acquires enough Wikipedianess. -- SEWilco (talk) 04:02, 23 June 2008 (UTC)
If this was a copyright text, how would you correct it? Something becoming public domain doesn't mean you can silently correct it. To take an example, the 1911 EB text of Anadyr River stated
"a river, in the extreme N.E. of Siberia, in the Maritime Province. The gulf extends from Cape Chukchi on the north to Cape Navarin on the south, forming part of the Bering Sea. The river, taking its rise in the Stanovoi mountains as the Ivashki or Ivachno, about 67 deg. N. lat. and 173 deg. E. long., flows through the Chukchi country, at first south-west and then east, and enters the Gulf of Anadyr after a course of about 500 miles."
The problems with this (not all corrected in the article yet) are: (1) Maritime Province is nowadays another name for Primorsky Krai, which is thousands of kilometres to the south of the Anadyr River. (2) The Stanovoi Mountains are similarly thousands of kilometres away, near Lake Baikal - it seems the river actually rises in the Kolyma Mountains. (3) "Ivashki or Ivachno" is a complete puzzle. (4) Modern sources give the length of the river as around 715 miles - I haven't checked the location co-ordinates. Given all this, what is the correct approach here? When I found the article, "Maritime Province" had been dropped altogether (though it does, in fact, give some clue as to why there is a reference to the Stanovoi Mountains), the bit about the gulf had been moved to Gulf of Anadyr (without attribution - I've since added the attribution), the "Chukchi country" bit had been changed to Chukotka Autonomous Okrug, which is a reasonable change, and "Stanovoi mountains" had been changed to Stanovoi Mountains (which was itself independently made into a redirect to Stanovoi Range), and the length of the river remained unchanged. What wasn't clear was which bits were the EB 1911 remnants. If those had been clearly marked, then that would have made things a lot easier. The silent correction meant that nothing was clear, and that we (Wikipedia) had partially corrected the entry and partially left it uncorrected, but that there was no indication that this had been done, just a mish-mash in a state of flux (yes, I know that is most Wikipedia articles, but we should be careful what we add to the mish-mash, both for our reputation and that of others). To quote John again: "Good scholarly practise is to not alter literary works of others; or, if it is altered, the changes are clearly documented and justified."Carcharoth (talk) 04:34, 23 June 2008 (UTC)
Forgot to say that if the text is not altered, it should just be on wikisource (for complete documents) or quoted (for short bits of text), not placed in the article to look like something we wrote. Carcharoth (talk) 04:52, 23 June 2008 (UTC)
Public domain is not free content. Yes, the law states that you can make any use of it that you want, including passing it off as your own work (plagiarism is not illegal at least in the US). But that is not the same as the creator of content explicitly making it free.
If I were to copy a portion of a book, large enough that it could not reasonably be considered Fair Use, with a copyright expiring in 2009, and place it in Wikipedia without attribution, that would be a copyvio. And it seems that everyone who has posted here would agree that it is also plagiarism. Let's say instead that I did the same thing two years from now. It's no longer a copyvio. Is it no longer plagiarism because the work has passed into the public domain?
I agree that specifically free content can be reused without it being plagiarism (although Wikipedia could choose to eschew that), but IMO that's pretty much the limit of it.--Curtis Clark (talk) 03:52, 23 June 2008 (UTC)
Whether including material is plagiarism depends solely on whether we attribute the material, not whether it is copyrighted. Our article free content explains the ordinary definition of the term, which encompasses public domain works. — Carl (CBM · talk) 05:03, 23 June 2008 (UTC)
Not just whether, but how we attribute. Insufficient or incomplete attribution is still plagiarism by some definitions. The free content article says: "Most free content licenses contain provisions specifying that derivative works must attribute or give credit to the authors of the original, a requirement which promotes intellectual honesty and discourages plagiarism without imposing so great a burden as to weaken the claim of such licenses to being truly free." I'd be interested to know what free content licenses say about moral rights - nothing, I presume. Our article on public domain doesn't mention free content or moral rights, though the latter is at List of intellectual property related topics. Carcharoth (talk) 05:36, 23 June 2008 (UTC)
The free content article is also specific that public domain content is free content. — Carl (CBM · talk) 14:16, 23 June 2008 (UTC)
To interpret that to mean that it can be refactored without it being plagiarism is IMO intellectual dishonesty. I have nothing more to say in this venue, and will be taking it off my watchlist.--Curtis Clark (talk) 03:49, 26 June 2008 (UTC)
Here's a proposal for a kind of partial solution, which I have just (quickly and poorly) implemented in {{Parity}}. Take a look at History of Goffstown, New Hampshire, which is a copy-and-paste-with-attribution article I created a little while ago. If you try to edit the page without an understanding of how the {{Parity}} template works, you get an ugly error message asking you to do a complete rewrite rather than an incremental rewrite. Someone who really, really wants to make their change can rip out the template or fool it but that's an issue with almost any solution I would expect. --❨Ṩtruthious ℬandersnatch❩13:35, 23 June 2008 (UTC)
To make this clear - I'm not proposing that the particular error message I threw into that example is what should be the policy, I'm just demonstrating a way to encourage whatever the recommended action should be when altering copied text. This discussion, while edifying and appropriate to have, is in many cases wandering far afield from producing a practical Wikipedia policy or course of action for Wikipedia users. I agree with Carl that in both principle and practice: "Wikipedia articles are not term papers, and have different expectations of originality." --❨Ṩtruthious ℬandersnatch❩13:52, 23 June 2008 (UTC)
Oh my, what is novel way to avoid putting the text onto Wikisource! :-)
A more interesting approach would be to transclude from Wikisource to Wikipedia, like images are visible on Wikipedia from Commons. I'm not sure where the software is currently at with that, but there is an open bug in bugzilla for it... John Vandenberg(chat)15:42, 23 June 2008 (UTC)
That particular example really should be on wikisource instead. It makes a lot of claims that ought to have some reference, but we can't use the original text as a reference if we are using it as our text as well. So it would be better to use the original as a source, and write an article from scratch that merely refers to it. — Carl (CBM · talk) 18:35, 23 June 2008 (UTC)
OK, so that example should be on wikisource. I think a number of people agree on that. Wikilinking old documents can be interesting - I think that can be done at wikisource - but as Carl says, it is full of claims where we need to provide a citation for our readers. I think something should be in this guideline saying when something should be in wikisource, and used as a reference, rather than as a starting point. So can this document provide any guidance on that, and if not, where should the guidance be? Can anyone give an example of a large PD-text that it would be OK to dump into Wikipedia and start rewriting? Where is the dividing line? Carcharoth (talk) 19:26, 23 June 2008 (UTC)
Obviously the optimal solution in every case would be, instead of any half-way measures, to write an original, thoroughly-researched and thoroughly cited full-blown article. I spend quite alot of time doing that, but I thought this entire discussion was about what to do when you don't have the time to do that. In the course of writing one article I frequently come across ten or twenty chunks of text like this, on related topics that there is no existing article for, and since it frequently takes me months to get a single article finished I certainly don't have the time to write the ten or twenty companion articles. I do have the time to create well-crafted stubs though.
I believe in Wikisource but I'm not going to create an entry for a 400-page book there and only put three paragraphs in it when I have no intention of ever expanding that. I'm going to put my effort into creating a quality, categorized, cross-linked, maintenance-templated and stub-templated Wikipedia stub and whoever eventually decides to create a Wikisource entry for the related book is just going to have to move forward and cut and paste my three paragraphs from Google Books themselves. I'm sorry John, I know that's part of your vision there, but I think the high number of Wikisource entries which are like that make the project look pretty crappy. I've actually run into them several times in the course of research - a small fragment of a text placed on Wikisource a year or two ago, next to a whole pile of redlinks pointing to the absent rest of the text - and it really isn't anything of any value, it's more frustrating and annoying than anything if what you need is the rest of the text.
So as I said, I simply am not going to be creating Wikisource entries in lieu of Wikipedia stubs, certainly not when I'm doing this in batches of five or ten or twenty articles at a time and my entire objective is to get WP articles in place that I can link to from the main article I'm working on. I was trying to propose some method of addressing the various concerns being discussed in this conversation. If the only answer is "you shouldn't create the WP article, you should only create a Wikisouce text" then I think we're working at cross purposes and hence there's nothing I can contribute.
Furthermore - John, why did you create that "original" subpage here on Wikipedia rather than creating the Wikisource entry if that's what you want everyone else to do? And how the heck is what you created there any different from what I did? Aren't you the one coming up with a "novel way to avoid putting text onto Wikisource"? Or is it okay for you to save yourself some work, but not for me? This seems just a little bit hypocritical.--❨Ṩtruthious ℬandersnatch❩02:42, 24 June 2008 (UTC)
I was trying to see how your Parity idea could be improved. i.e. I was improvising. I dont like the idea, but it is novel, and I may even warm to it. John Vandenberg(chat)03:55, 24 June 2008 (UTC)
Oh, I gotcha. Sorry, that was stupid and boorish of me. A good improvement, and I'm sure there's others that can be made. How about this aspect of it: are there other things that could be added, or separate processes and templates, which could facilitate the creation of corresponding Wikisource entries, or other wise make it advantageous to create the Wikisource entry at the same time you're creating a stub?
Another thing that has occurred to me frequently is, it seems as though, given how sophisticated MediaWiki is at doing diffs of two passages of wikitext, it ought to be possible to create some feature or arrangement that can automatically indicate exactly which portions of a passage are from the original and which have been altered by Wikipedia editors...
Oh, wow - you can do a diff across history entries of completely different pages, I didn't know that. (The preceding link is a diff between John's /original subpage and the current revision of the main article.) That's nifty, although I guess an earlier history version of the article itself would have been just as good for a diff.
So there's another option besides {{Parity}}... perhaps a policy mandate to say that the attribution notices in these cases must include a diff link that demonstrates precisely which portions of the text are original and which have been modified? --❨Ṩtruthious ℬandersnatch❩05:35, 24 June 2008 (UTC)
One of the advantages of having the "original" as a separate page is that a Wikisource admin can import it without importing the edit history of the Wikipedia changes; see s:Talk:History of Goffstown, New Hampshire/original. If it was to be that we allowed Wikipedians to create pages on Wikipedia containing the original text, Wikisource admins could periodically come along and import them all, and tidy them up. Not ideal, but if it would help, I am sure Wikisource admins set it as a net gain for the project.
My views on this have been requested so here they are.
The one misconception that runs as a throughout this page is that plagiarism can be fixed by paraphrasing or otherwise rewriting what has been said on the source. Plagiarism is not just about the way that something is expressed. Unlike copyright, it is also about the underlying ideas. If an article is primarily drawn from the 1911 Encyclopædia Britannica it still needs to be attributed even after the wording has been completely changed.
Attribution is just as good if it's done in plain text. There is no need to have "attribution templates" If the template doesn't fit what you are trying to say change it to plain text.
Plagiarism is a violation of the moral right of attribution, and to that extent it is illegal. Infringement of moral rights is included in United States law, but no penalty is prescribed therefor. In some countries it is considered more seriously.
In some countries, for quotations to qualify as fair dealing attribution is a precondition.
When plagiarism is found the best thing you can do is fix it yourself. If you know where it's from you already have all the information you need to fix it yourself; their is no need to cause a lot of drama by harassing the person who put up the material.
The threat of blocking mentioned in the policy should be removed completely. This is exactly the sort of thing that creates an atmosphere that we are more intent on punishing the least wrongdoing than producing a good encyclopedia. Let's draft policies that are meant to be remedial rather than punitive.
Thanks for this. I agree with what you say here, with a few minor quibbles. Attribution templates help to track the articles that contain this content. I agree that plain text attribution is often more specific and more helpful, but the template is still needed, if anything, as a general attribution for the whole article per your first point. So I'd say plain text attribution plus templates (actually, I'd say "write the article first and then cite any extra bits from the older text as needed", but that is a completely different approach to the "edit a PD article as a starting point" approach). Regarding point 5, in most cases, yes, fixing is the best option. Sometimes a polite note, and pointing towards some guideline, is needed if you see or find that someone is doing this a lot. Number 6, I agree, and I'll remove it from the proposal now. If anyone objects, they can propose it on the talk page. Please do hack away at the proposal, as it is very incoherent at the moment! Carcharoth (talk) 04:49, 23 June 2008 (UTC)
One real benefit of templates is that they can be used to insert hidden or overt categories, and as "what links here" resources to examine all instances of a given source. I suppose we will need a generic "PD attribution" template to cover the many miscellaneous free sources that may arise.
Blocking does not have to be mentioned prominently, but will there not be an eventual sanction for someone who persists in passing off work as their own? Of course education is the best approach and we certainly don't want to create another easy way to cry "a witch, where's the ban-stick!" - I suppose if this becomes a guideline or policy, it will become part of normal consideration for assessing misbehaviour. Franamax (talk) 06:28, 23 June 2008 (UTC)
The ban-stick will be effective at making people contribute to things other than Wikipedia. Actually it's unlikely for proper articles to not mention text sources; the attribution required by WP:V will tend to cause original sources to be disclosed. One might copy-paste PD text and add different sources to support all the facts, but it's more likely that someone will also include the PD document as a source to satisfy WP:V. -- SEWilco (talk) 16:57, 24 June 2008 (UTC)
In response:
If someone has provided plain-text attribution he should not be faulted. If someone else believes strongly that a template should be used, the {sofixit} principle should apply.
"Ban-sticks" are rarely effective, and their mention tends to encourage adversaries with little life experience to use them as a first resort rather than a last resort. If anything, an initial do-this-or-else approach tends to make the contributor more argumentative. While serious disciplinary measures are regularly required they are best dealt with in more general policies that de3al specifically with disciplinary issues. Education is indeed the best policy, and friendly discussions and negotiations can bear tastier fruit than dicta. Eclecticology (talk) 17:11, 26 June 2008 (UTC)
and another
this is something I have long cared about, and I'm glad to see other people now finally treating it seriously. Unfortunately, as tends to be the case here, the reaction is a little excessive and considerably over-formalistic. as I see it:
All responsible writing quotes sources properly and attributes ideas and makes clear when something is a paraphrase. This is part of fundamental intellectual honesty. The violation of this is known as plagiarism.
There are different degrees and typesof plagiarism, and they are not all equally serious.
there are different levels of citation and they are appropriate for different purposes. A scholarly edition of documents has one standard--a popular book another. Excessive footnoting is detrimental to understanding in non-academic writing--and wikipedia is not academic writing and makes no pretensions to it--it is just a general encylopedia.
It is customary in general encyclopedias of even the highest repute to give fairly general sourcing. If the ideas for a paragraph come from one or two places, it is usually to put one or two footnotes at the end of a paragraph--not try to attribute each word to where it comes from. Enough references have to be given so all the facts can be eventually checked, but it should not be assumed the primary purpose of readers will be doing so. This can be very different in an academic monograph, where most readers may well be interested in ryingto catch the author out in any possible misinterpretation)
This has nothing whatsoever to do with copyright. The need to properly cite public domain sources is every bit as great as citing anything else. The only difference is that ist is possible to use the PD sources in very extended quotations or even full text if they are appropriate and cited correctly. Nor does the GFDL license have anything to do with it--the tracability requirements of GFDL are best discussed separately)
Some PD sources are highly appropriate for use as is in Wikipedia. This is particularly true of scientific material from US government agencies.
Some PD sources very rarely are--such as the 1911 EB or the old Catholic Encyclopedia--totally out of date material known for incompleteness, erratic sourcing in the case of the EB, and bias. They can be used as is for restricted purposes, but usually would be best quoted to indicate the attitudes of the previous century. I would be reluctant to use sources of that date for any factual material. much less statement of consensus,. unless verified in modern sources.
The problem is particular acute in articles taken from such sources and supplemented slightly with modern material. The value of the article depends on knowing which part comes from where. This seems to be the highest priority at present in this general area, for it directly affects WP:V and WP:RS.
There are already perfectly good academic conventions for indicating sources. They just have to be used. (and taught). The guide available for undergraduate academic purposes will do quite nicely for this, we do not have to reinvent them. I note though that guides in some subjects intended for high-level scholarly work often use extremely abbreviated conventions appropriate only for those scholars who know them, and which will be incomprehensible for general readers.
There is considerable variation in the details of these guides--I dont think they make any difference. Our practice should be as with footnotes--any method that gives the information and that is consistent in an article is acceptable.
I see some practical point in using templates--they can serve to guide he inexperienced contributors here. But they tend o be over rigid and over prescriptive. I think we would be very wrong indeed to do anything resembling the structure of image tagging. The assumption should be that material not cited is composed by the editor out of the general references given. We have enoguh barriers to beginning contributors already. DGG (talk) 06:30, 24 June 2008 (UTC)
Thanks for this, DGG. I'm at a bit of a loss as to which direction this discussion, let alone the proposed guideline/policy should go in now. It seems to have sprouted many related but different discussions. If anyone can pull all the threads together, or tackle things step by step, that would be good. What DGG wrote above might be a good starting point. The "Resources" section I wrote above should probably be updated as well. Wikisource and its role (it also has a set of templates used on articles as needed) needs to be brought in somewhere, plus other points that hadn't been raised before. Use of the MediaWiki "diff" function could be integrated to show not just the original text, but also the point at which it entered the page history. A word of caution though - the MediWiki "diff" generator sometimes produces strange results. Sometimes comparing the text by hand is more reliable. Carcharoth (talk) 11:31, 24 June 2008 (UTC)
Text based on a PD source might be different from the original. "Diff" can only show text which enters Wikipedia, which might be different from the original source, so can't be required as a way to show the original text. -- SEWilco (talk) 16:49, 24 June 2008 (UTC)
I don't know which thread to add this observation to, but here goes: in the case of EB1911, the herd of horses are well out of the barn. It's all very well to argue what should be (and I take the point, particularly the hazard of changing the ur-text and interfering with the original moral right) but let's deal with what WP is, not what it should have done.
One guiding principle of the EB1911 transcription was that authors should feel free to modernize the text, both factually and stylistically. The vision was that the original text should be thought of as just like any other initial entry in WP: imperfect and incomplete, and waiting for wiki-evolution to incrementally improve it as knowledgeable and motivated authors work on it bit by bit. One problem with that is that many of the articles are so obscure that they are unlikely to attract anyone with the required knowledge or interest. And one reason that isn't a problem is, pace DGG, that many such articles, primarily biographies, are fine and unlikely to be improved since the state of scholarship in their subject is pretty much unchanged. Medieval French poets come to mind.
Still, we do have the state where there is no guidance on what to do about an article that does get significantly improved while leaving chunks of 1911 text in there. I also bet there are some articles tagged with the 1911 template that have no original text left (I do look for these in the verification step). (You do understand that, by convention, articles that contain 1911 text should have {{1911}} on the article page and {{1911 talk}} on the talk page).
End rambling. Any proposal to deal with articles originally sourced from, or with mix-ins from, 1911 needs to deal with the situation we have, not the one we might have been in had this discussion started 4 years ago. For example, in principle we could go back and mark the original text, either visually or by metadata, and/or insert the identities of the original authors. I'd also observe that the two online versions of 1911, lovetoknow and jrank, are both in the business of "improving" the text by some sort of moderated wiki mechanism. That's not to excuse the practice, but we aren't the only ones potentially violating the moral rights of the authors. David Brooks (talk) 18:33, 24 June 2008 (UTC)
Perhaps a personal history of the WikiProject Missing encyclopedic articles would be in order. Back when WP was enjoying explosive growth, it seemed that most of the articles were about obscure garage bands or every video game on the planet, and it was observed that we wouldn't be a "proper" encyclopedia unless we at least had articles on the same range of topics as more established works. We're going back here to March 2004 or earlier. Initially the focus was on making sure there was at least a stub in place corresponding to the contents of EB2004 and Encarta, seen as the primary rivals, and it was indeed done in the spirit of competition. This ensured breadth, and was relatively quickly achieved, but of course not depth - no copying of content was permissible. Later, other major reference works were added to the tabulation.
To achieve depth, we relied on the evolutionary approach, but in addition we started using EB1911 as a source (and other texts like the Catholic Encyclopedia). The intent was that the text would be used as a basis for evolution, although some of us tried to be mindful of topics where it was unlikely that another editor would show up any time soon, and took extra care with the transcription. The project founders urged care in bringing the articles up to date, especially in geographic topics, and eliminating POV, especially in ethnography and arts, but an enthusiasm for completion took over and some people simply copy-pasted articles from the existing online versions, complete with OCR errors, missing text, and possibly those sites' own post-1911 changes. That's where problems like Anadyr River come from. Although we did have a guideline that not every 1911 (and 2004) article had to match an article of the same name in WP, if the editor could argue that the equivalent content is in a more appropriate place, not everyone had the critical judgment to comply. Maybe it would have been better to allow some articles to say something like "There was a river called Anadyr in 1911 and here is what Encyclopædia Britannica had to say about it, quote".
The verification project is intended to go back over the tracks and correct all these flaws, but naturally there is less enthusiasm for such work. Right now I seem to be the only participant and won’t live long enough. I don’t know a solution other than evolution, and Carcharoth seems to be proving that evolution works at least for Anadyr River. David Brooks (talk) 20:16, 24 June 2008 (UTC)
Oh, yeah, one more thing. At the time of the MISSING project, I don't think Wikisource had a significant amount of EB1911 material, so there was not a standard convention to add {{Wikisource1911Enc}}. It's only just dawned on me that I should have been doing this during verification; I didn't start doing so because I didn't realize how much there is in Wikisource now. It does provide a form of attribution and allows relatively painless source comparison. I'll add that suggestion to the project page. David Brooks (talk) 20:43, 24 June 2008 (UTC)
Medieval poets--let's take Villon. Looking just in worldcat, there have been 80 books, in English alone, exclusive of phd theses, devoted solely or primarily to him, published since 1911. (some of them are by famous writers & critics). i havent look at the French ones yet, but there have been several major revolutionary changes in literary criticism since 1911. Pick a subject, look for books--not even scholarly articles, where often the real research frontier is. There is almost nothing where the knowledge is static. (there wont be as much on other French poets, but take a look--the bulk of the available information is almost always since that period. Take my actual favorite author, Samuel Johnson--Boswell's diaries werent even discovered until 20 years after that date. His letters have still not all been printed in full. Look at any modern comprehensive history of anything, and look at the dates of the references. I have the latest vol. of the Cambridge history of the British empire right here; for the period before 1649, 12 of the 14 books listed are after that date. (and these are just non-technical wrks in english--it's a general undergraduate level book) Please feel free to give me some challenges. I can only regret i wont have time to give more than the count of books to be taken into consideration.
and the very thought that people were encouraged to use it and modernize it, without indicating what they were doing --though indeed I had more or less guessed as much--horrifies me anew. I can think of no more irresponsible way to use sources. that some other projects did even worse is not much of a comfort. The situation is not irretrievable. Every article based on EB1911 should be discarded and started over, not just modernized. it's not true we can't possibly find the people. it is true that maintaining a cavalier attitude towards the most basic of academic standards, and an institutional ignorance of modern scholarhship in the humanities, will greatly discourage anyone knowledgable from working on these subjects here. The first step in doing it right is the resolution to do it right. As for the effect of the popular culture emphasis--speaking for myself, if I didnt have to waste time trying to keep the decent content there even if on topics of no interest to me personally from disappearing, i'd have time for something I actually thought interesting--which is what I intended to do when I came here. I've noticed some or most of the people working with me to try to rescue the material from deletion are also well capable of serious work or traditional topics.
Let me give a comparison. When Citizendium got started, they decided to make sure that they wouldnt have dramatic gaps at first, and copied over everything then in Wikipedia. A few months later, when they looked at the material, they realized what the effect was--whatever they were trying to do better than Wikipedia would be swamped entirely. They then discarded everything someone wouldnt actively vouch for and promise to work on. They'll still take text from us, but the policy is they wont actually approve it until it is completely rewritten, and they advise people to rather start afresh. My apologies, David, i dont mean this to be personal. I know that one has the feeling at first of needing to improvise. But that time is over. I took a look at your project. I'd like to identify and check a list of the articles you looked at and didnt find to need improvement. DGG (talk) 04:25, 25 June 2008 (UTC)
Well, the comment about the poets was remembered from a claim made by another 1911 transcriber and I may have that wrong. There is nothing wrong with what you say, but I think you underestimate the difficulty of what you propose and of finding volunteers. And trying to delete the generated content will generate protest from those who still work in WP:MISSING. Frankly, there is a lot of much worse content in Wikipedia.
As to the last sentence - verification of articles does by no means imply they don't need improvement. The bar is very low: the article has been checked for transcription errors, POV, and basic categorization and wikification, and particulary egregious out-of-date ones have been tagged with one of the specialized templates. Some of them have had images and categories added. That's all. Those working on the project don't have the polymath abilities to update. David Brooks (talk) 06:16, 25 June 2008 (UTC)
I am not unaware of the difficulties. All the same, i would like to know how to find a list of the articles that have been passed as satisfactory. It would help me understand the project better and perhaps make some more positive suggestions. DGG (talk) 03:38, 26 June 2008 (UTC)
A random sample of pages using {{1911}}. I dont know if these were passed as satisfactory, but they all begin with "A" and I think I recall someone saying that the verification project is now working on Bs.:
From my experience of looking at Wikipedia articles based on EB1911 (excluding the ones that have been added to Wikisource and noted on the Wikipedia article), either they give no clues as to which facts are from the original article, but I guess that we could give the reader the opportunity to read the EB1911, to confirm the cited fact did appear there, by cite the very first revision of the Wikipedia page as a references in the most recent edition.
Tonight I copied the original edition of Anbar and André Marie Ampère over to the Wikisource articles. I had intended to do the same for Apocrypha, except that it doesnt exist in the EB1911 pagescans, as far as I can tell, as EB1911 has "Apocryphal Literature", and it is very different to our base article here.
Also note that the EB1911 page on Wikisource about Abbey contains 21 subpages, focusing on different abbeys. Some of these abbeys do not have articles on Wikipedia.
I started on the B's because someone else had started on the A's, and in general because I didn't want to bump into another editor. But that someone else soon gave up, so the A's are mostly still raw.
As I've said elsewhere in this thread, I regard it as more urgent to do a superficial fixup of the hastily included text, than to research each of, what, 40,000 articles in depth, so long as the reader is minimally aware that this is material a century old. Since, quite frankly, some of the original editors dumped text from the online OCR'ed articles, the first order of business should be to remove artefacts like additional material and OCR errors, either occasional or systematic (Greek characters are replaced by visually similar Roman characters; for example Π tends to become II). These online copies contain other dangers; we know that the LoveToKnow article Mormons was changed substantially, removing any material critical of Joseph Smith; fortunately we already had Mormons of course. I tagged the most egregiously obsolete articles with {{update-eb}} or {{1911POV}}. To be honest, I've been comfortable with the result. If the topic is mainstream, there will probably be a modern article on it already and no need to rely on EB material. Much of what's left is biography of somewhat notable people throughout the second millennium, and those bios are actually surprisingly decent if you pull out the occasional POV, mainly artistic judgments.
On providing visibility to the original source, which is discussed elsewhere in this page: given the desire not to rip up all the work that was done four years ago, it might seem that the right fix would be to link to the wikisource page. But that runs into another problem, which I don't think is sufficiently highlighted in these discussions. At least in the case of EB1911, the Gutenberg transcription, and the Wikisource project that seems to be dependent on it, has gone essentially nowhere. Perhaps there is little motivation to work on what is basically an endeavor with zero creativity. Generalizing, I doubt the sense of relying on someone else to bulk up WIkisource for you. So the lesson here is: it's OK to refer to existing PD/free online content, but you can't wish a Wikisource into existence as a way of solving the attribution problem. People need to put effort into it, and people aren't. Back in October 2006, Keith Edkins processed two quarter-volumes for WS and tagged each corresponding WP article with {{Wikisource1911Enc}}; since then I don't think the WS has grown at all.
I don't have a solution for the very real problems that you are highlighting, and you do acknowledge that 4 years ago was a different time. I can just tell you that, right now, the current solutions of completing the WS EB1911 and verifying the WP 1911 absorptions are not happening, and won't happen without a significant technological boost. David Brooks (talk) 19:59, 27 June 2008 (UTC)
Oh Lord, I just realized what you said here: "Tonight I copied the original edition of Anbar and André Marie Ampère over to the Wikisource articles".
You can't assume that the original saved edition of a WP article, even one closely based on 1911, is identical to the 1911 text. There was no such guidance for WP:MISSING (four years ago, different era, remember). Some editors did dump the text uncritically, perhaps throwing in a few wikilinks and categories. Some did save the ur-text before editing it. But some (myself included) worked more efficiently, and did their first saves after restructuring the text to conform better to WP standards, and maybe even adding material from other sources. Restructuring may have been as trivial as inserting paragraph breaks (1911 loved long paragraphs), but I remember also making biographies conform to our then standard bio format (firstname lastname (dob — dod) was a nationality occupation. He did this.), while 1911 bios are more variable in their intros.
In your specific examples, you seem to be close with Anbar, but you missed the close quote on "granaries" (what is the WS convention on curly quotes?) and have spurious italics on "the founder". For Ampere, apart from all the links (what is the WS standard on internal links?) the differences start with those I just outlined: changed the heading from "AMPÈRE, ANDRÉ MARIE" to the WP ordering. Left in two spurious sentences that were added to the first para and inserted one of those para breaks I talked about. Spelled Ampere without the accent, a mistake that may indicate a copy from LoveToKnow. I stopped here at line 5 because I've made two points: not that I want to criticize your effort (I'm glad you are being provocative) but that your assumption is incorrect, and that it is hard but boring to make a WS entry correct, without the discipline implied by the Gutenberg project. Someone should fix those entries, by the way. Not looking, but I'd hazard a guess that Apocrypha contains textual inclusions from Apocryphal Literature. David Brooks (talk) 21:31, 27 June 2008 (UTC)
As far as I can see from the samples entered by Gutenberg people, conventions include: an EB1911 template with previous, next, and a WP backlink parameters; topic in bold with large caps (using <big>), ASCII hyphens, ASCII quote marks (which is unfortunate; the printed EB is clearly curly), no internal links, sometimes a ===See=== before the biblio, and sometimes "Endnotes", and one Category: link. David Brooks (talk) 22:57, 27 June 2008 (UTC)
Of course I realise that the original articles on Wikipedia may not match the EB1911 text. That is why I found the pagescans for them, in order that they can be proofread. I did not create the Apocrypha entry because the Wikipedia article isnt even a good starting point. The EB1911 project on WS is alive and well, and is growing slowly but surely. Wikisource does transcription; it does proofreading; it does everything that Gutenberg and Distributed Proofreaders combined do, but does it better than both of them. The only thing we lack is manpower. John Vandenberg(chat)02:34, 28 June 2008 (UTC)
I stand corrected on the continued existence of the EB1911 WS project, but the fact remains that it is still a small part of the whole text and won't be ready for a long time to stand in as the original-text reference for WP.
And I'm a little confused about your recent action, since you recognize the problem. Copying known invalid WP text to WS has corrupted the latter; were you planning to go back and fix the WS text, and make it conform to WS style? Actually, there doesn't seem to be a style guide for WS EB1911, and there are notable inconsistencies, but at least the text and punctuation should be correct, surely? (this isn't the place to go into more detail on the style issues). David Brooks (talk) 18:01, 28 June 2008 (UTC)
There is a style manual for WS EB1911, and the Wikipedia text I copied is higher quality. As I said, I found pagescans for each and I did quickly check what I copied to WS was close to the printed original. I do intend to fix the WS pages, but it is a wiki... fingers crossed someone might beat me to it. John Vandenberg(chat)13:18, 1 July 2008 (UTC)
I was very surprised at how few WP articles transclude {{1911}} and start with "B". I presume that another tag has been put on the B's instead. Here are five that transclude the "1911" template.
I think we are considerably offtopic here, except as a window into the inaccuracies and incompleteness of the Wikisource EB1911 (and I don't intend to start taking responsibility for them!). But your observation may be based on a misunderstanding. The original "1911 parity" punchlist excluded articles that already had an adequate equivalent in WP, whether or not they were derived from EB1911. It was necessary to reduce the scope of the task and focus it on where it would be of most use. Perhaps I didn't make that clear above. Thus there may be some 1911-sourced articles that didn't get tagged, despite best efforts (this would have been more than 4 years ago, when there was little discipline over acknowledgement of PD sources), but the majority simply have the same topic but are completely modern articles. The "expansion" list is now the "verification" list, and if you look at the first B index page, you'll see a few that are tagged "no1911" because they now have no 1911 text and hence no {{1911}} template. Some of them have {{include-eb}} on the talk page in case another editor felt like padding the article a little more. The rest do indeed have {{1911}} and {{1911 talk}}. David Brooks (talk) 17:36, 1 July 2008 (UTC)
There should always be some notice on the article indicating that the article may be derived from 1911 text if there was an import of 1911 text at any point. Simply removing the 1911 tag is not good enough. At the minimum, there should be a link to the 1911 wikisource text, and if it doesn't exists, then wikisource should be updated before the tag is replaced on Wikipedia. Ideally, there would be both: an indication that there is a 1911-import in the article's page history, and a link to wikisource. Carcharoth (talk) 19:45, 1 July 2008 (UTC)
I'll address that comment in its own narrow context: to be clear, the tag removals are very few and far between; I don't remember removing any specifically, but would only have removed one if the article clearly owes nothing to 1911 text or structure (e.g. in the case of a complete blank and rewrite by an expert on top of an obsolete EB1911 import). The idea of linking to wikisource under those circumstances seems counter-intuitive. Why not link to 1911 WS for every article that owes nothing to EB1911? Also, creating an accurate WS article is something of a specialized operation; even the currently implemented small subset has obvious stylistic inconsistencies and unforced errors.
On the broader topic of what the relationship should be between WS and WP: see comments on this page passim. Your proposal, which may make sense today, cannot restart four years of work that was based on another set of proposals, well thought-through at the time but without the benefit of understanding 2008's attitude to plagiarism and attribution. The crew finished long ago and has moved on. David Brooks (talk) 20:59, 1 July 2008 (UTC)
Quotation marks
People seem to think that I can copy Stephen King's latest book, post it on Wikipedia, and if I have a footnote listing King in the bottom it's not plagiarism. Integrity requires that more than a few words, a unique phrase, and longer passages have quotation marks. Can this be put on the beginning of the page? It's not clear from the WP:Plagiarism that quotation marks are needed. --Blechnic (talk) 06:55, 25 June 2008 (UTC)
Well, if Stephen King released his latest book into the public domain, to a certain extent you could post it to Wikipedia, couldn't you? And to complicate that, what if he used a CC license that doesn't require attribution? One of the problems with using quotations is that it does not allow improvements to the text. In the case of King, we would end up with a very strange-looking and -reading article as we add new facts and references about the supernatural. The solution would seem to be: post King's book to wikisource and reference it from our article on creepy-ass people in weird little towns, however we already have a large cohort of articles where that was not done in the past. So we really have two problems, our response to existing "plagiarism" (which has been good-faith article-building); and guidelines for the future. Franamax (talk) 07:15, 25 June 2008 (UTC)
It's so hard to communicate in cyberespace. If you pull sentences, in their entirety, from a source, and credit the source with a footnote, but you haven't put quotation marks around the sentence, you are plagiarizing, because you are making it seem as if you've synthesized or rewritten information from another source, when all you've done is copy. --Blechnic (talk) 07:20, 25 June 2008 (UTC)
I agree, but please go ahead and put something like that on the main page. The only way it will improve is if people dive in and edit the main text to express some of the things being discussed on this talk page. Carcharoth (talk) 07:34, 25 June 2008 (UTC)
One solution that occurs to me is to keep a separation of activity, insert the copied text with an edit summary of "copied from (source)", then make subsequent edits to that text in the normal fashion. This now preserves the true history of whose ideas are being incorporated, and would be the preferred method of copying intra-wiki too.
Re: "If you pull sentences, in their entirety, from a source, and credit the source with a footnote, but you haven't put quotation marks around the sentence, you are plagiarizing, because you are making it seem as if you've synthesized or rewritten information from another source, when all you've done is copy."
Copying free content with attribution is perfectly acceptable, and isn't "plagiarism". This is, again, a misapplication of the originality standards from academic work to Wikipedia. Our goal is not to rewrite everything - our goal is to compile a free encyclopedia. This means we explicitly permit ourselves to copy free material other people have made.
Compare free images. We take free images other people have made (for example, from flickr) and put them on our articles, with attribution to the source of the image. In the same way that we don't need to recreate free images before we can use them, we don't need to rewrite free text. — Carl (CBM · talk) 11:54, 25 June 2008 (UTC)
I'm not sure that comparison with the way we handle images is the way to go here. In practice, all images on Wikipedia are treated like direct quotations. The boundary of each image is clear – no quotation marks are necessary to set an image off from text, as the distinction is obvious – and each image is 'footnoted' via a click-through to its information page.
Original but free images are fully credited to their original source. Derivative images that we produce from those originals have to identify their origins. The equivalent for text would be if you could click on any bit of text in an article to see a full history of who edited what, and what their original source text was (if the material was originally written elsewhere). TenOfAllTrades(talk) 13:07, 25 June 2008 (UTC)
Derivative text that we take from free content (including public domain) text also must identify its origins. Just like we don't make the derivative image point out which parts are "original", or complain that the person should have made the derivative image from scratch, we don't need to mark which individual sentences were from an original textual work, we just need to give attribution for the starting point from which we made our derived text. And this attribution is what we have always expected in our articles. — Carl (CBM · talk) 03:01, 26 June 2008 (UTC)
Yes. This thread seems to start from the misapprehension that quotation marks can solve the plagiarism problem. If I take an old PD text and change only a few obsolete words it would be inappropriate to suggest that it was a direct quote from the source, but to avoid plagiarism that source would still need to be acknowledged. Eclecticology (talk) 18:11, 26 June 2008 (UTC)
If you're worried about EB1911, come over to 1911 verification
All this hand-wringing about "moral rights" is a waste of time. True, original authors are legally entitled to assert moral rights. However, all the authors of EB1911 are DEAD, so no one can assert moral rights against our plagiarism. It is simply not going to happen, and is not worth continued debate. If you're worried about proper sourcing and accuracy, however, those are valid concerns. If you are interested in addressing those concerns (instead of just talking about them), come to Wikipedia:WikiProject Missing encyclopedic articles/1911 verification. That's where the action is :) Kaldari (talk) 18:17, 25 June 2008 (UTC)
This is not just about EB1911 (and I do care about EB1911); moral rights do not expire, nor can they be waived. Creative Commons has codified some of the moral rights, but the law says the morals rights defined by law can not be reduced by a CC license. That these people are dead is the worst possible reason to deny them their rights. John Vandenberg(chat)22:10, 25 June 2008 (UTC)
And how do you propose that these dead people would assert their moral rights? Dead people do not have any interest in moral rights, only the living. And since moral rights cannot be transferred, the rights essentially die with the person. The Berne convention defines moral rights as the rights of the original author to claim authorship of the work and to object to certain uses of the work. Unlike copyrights, moral rights do not exist outside of the author (at least in the US). Kaldari (talk) 23:36, 25 June 2008 (UTC)
The death of the author does change things, but not as much as you might think:
"The Berne Convention also recognizes the existence of moral rights after the death of the artist at least until the expiration of the economic rights, except in those countries that do not provide for moral rights protection after the death of the artist (Berne Convention 1978). This exclusionary clause was adopted to offset the disparities in national laws (Françon 1992)."[23]
The Berne Convention provides for moral rights in article 6bis, which states:
(1)Independently of the author's economic rights, and even after the transfer of the said rights, the author shall have the right to claim authorship of the work and to object to any distortion, mutilation or other modification of, or other derogatory action in relation to, the said work, which would be prejudicial to his honour or reputation.
(2)The rights granted to the author in accordance with the preceding paragraph shall, after his death, be maintained, at least until the expiry of the economic rights, and shall be exercisable by the persons or institutions authorised by the legislation of the country where the protection is claimed. However, those countries whose legislation, at the moment of their ratification of or accession to this Act, does not provide for the protection after the death of the author of all the rights set out in the preceding paragraph may provide that some of these rights may, after his death cease to be maintained. - [24]
The whole of the second article is an interesting read about what moral rights are (a legal concept, not a moral one). Carcharoth (talk) 00:39, 26 June 2008 (UTC)
Although there's nothing in there to indicate that the moral rights do extend past the expiry of economic rights, so in the case of 1911 EB at least, they are now probably thoroughly extinguished. I read that to mean that even though Michael Jackson has the rights to Beatles music, he couldn't sell George Harrison's songs for use in a porn flick. Sometime around the end of the century though, all those bets will be off. None of which prevents Wikipedia from asserting more enduring moral rights for its own private ends. Franamax (talk) 01:17, 26 June 2008 (UTC)
Well, it could quite well be true. Tell you what, after you die, if you feel at any time that you're being done wrong, file a report at WP:ANI ;) Ay, what dreams may come, when we have shuffled off this mortal coil... (I plagiarized that!) Franamax (talk) —Preceding comment was added at 05:20, 26 June 2008 (UTC) Thank you! SineBot is my friend :) Franamax (talk) 05:39, 26 June 2008 (UTC)
Yeah, I was just reading and commenting on some of that ANI stuff. I resisted the urge to invite people to comment here, we have a reasonably good signal-to-noise ratio right now. What we need to do is pin down exactly what our terms are as far as moral rights, specific attribution, what has happened up 'til today on wiki, and how we want to define things in the future. Carcharoth asked above for a summing-up, if no-one else does, I might try one tomorrow. Franamax (talk) 07:26, 26 June 2008 (UTC)
Heh heh, I was actually responding to Kaldari rather than you Franamax - curse this clumsy indent-based discussion threading! But as a matter of fact, though I haven't tried my hand at bot programming yet, rigging up something for posting posthumous reports to ANI sounds like an interesting challenge...
I was somewhat serious in my criticism of the "they're dead so it's not wrong to plagiarise" notion. To me that's sort of the same thing as saying "if I plagiarized a living author but that author never found out, it's not wrong".
Another thing I'd note is that there's a difference between "not likely to be legally prosecuted" and "not infringing on moral rights." It seems to me that some people here are talking about the former while their statements are literally referring to the latter. I think that Wikipedia ought to pursue the high road of proper attribution and if we can't we ought to at least have a flag or something to mark articles as "may contain plagiarized material". All the talk is great but to some degree we're just taking a wiki-club to a dead horse that has been beaten for thousands of years. --❨Ṩtruthious ℬandersnatch❩07:52, 26 June 2008 (UTC)
Actually, the indenting is working fine, I just chose to comment on your response to the other editor ;) I made a jocular comment about a serious topic and I made a backhand attribution to the Shakespeare quote, I would have been more specific but I think you and most people would recognize it. Maybe not, if anyone is wondering, it's from the soliloquy in Hamlet.
In fact, I take the attribution thing very seriously. If you copy someone else's work, you must attribute it, otherwise you are doing wrong. It's as simple as that, copyright, CC-license, moral right, dead-men-don't-care be damned - copying without saying so is just plain wrong. That's the principle I think we should be enphasizing here, regardless of the niceties. Copy if it's allowed, but make sure people know that's what you're doing. And to that end, I favour indicating exactly what you've copied.
Where I may depart on the moral rights issue is when it comes to changing the work. Wikipedia is like homeopathy, you put something in, shake the living hell out of it, dilute it constantly (we call those edits), not one molecule of the original comes out the other side, but it's good for you nonetheless :) If we are going to plead moral rights as making the original text inviolable to that process (thus invalidating a huge number of PD-based articles), then I will resort to a strict reading of moral rights, which says that they do not survive beyond the term of copyright. We need that PD text - with the strict proviso that we indicate the exact original text and the history of every change to that text. That position extends to the original structure and content of the PD-text, I think the original provenance should be noted, but we must also be free to move forward from the original authors work. Franamax (talk) 09:13, 26 June 2008 (UTC)
Hmmmm... I'm having difficulty reconciling what you just said with where you said "it could quite well be true" about the dead not having moral rights. But perhaps that was entirely and completely jocular? Or perhaps it's all my own dunderheadedness and I'm not understanding an overall point you're making; I recognized the Shakespeare quote but I thought it was simply a general reference to death or an afterlife, from which I might dream of posting complaints to ANI. (I hope that the afterlife is better than that.)
I too think that indicating exactly what has been copied is the way to go. I'm fiddling around with finding some way to provide an easy and clear diff link to indicate the variance from the original text. --❨Ṩtruthious ℬandersnatch❩10:21, 26 June 2008 (UTC)
Jocular but with a kernel of truth, per my usual style :) The living consider the dead to have surviving moral rights. The dead themselves don't really care anymore, or possibly have bigger problems to worry about. Thus, the moral rights of the dead are only for the benefit of the living, I was making a point about who is really claiming the "rights". But yeah, the undiscovr'd country, from whose bourn a man might yet post to ANI - worth considering, isn't it? We must construct a mechanicon that would give the gods themselves pause to consider. Or keep working on this page, whichever you think :) Franamax (talk) 11:13, 26 June 2008 (UTC)
My kingdom for a zombie-bot that will shamble out of my grave and take wiki-vengance upon those who besmirch my articles, a feat that shall only be eclipsed by the awarding of a barnstar to a thrice-martyred administrator from a Wikipedian yet unborn. --❨Ṩtruthious ℬandersnatch❩12:03, 26 June 2008 (UTC)
(disindenting) Technically, in serious article space "My kingdom for a zombie-bot" would still plagiarize Shakespeare's Humpty-Dumpty play. Kaldari's view that we are dealing with legally enforceable moral rights puts the whole discussion on the wrong footing. In some countries those rights expire on the author's death, in others they expire with the economic rights, and in yet others they last indefinitely. In some places where economic rights the state may assume those rights based on the principle that defamation of a nationally iconic author is a defamation of the nation. In most cases the legal pursuit of a moral rights case is a complete waste of everybody's time. The French courts did find that a sequel to Les Misérables violated Hugo's moral rights, but only imposed a nominal penalty of 1 Euro.
No, but if there's an easy way to arrange an article to have that kind of link, it seems much more practical to me for policy to mandate that the original PD text get pasted in and the diff link created, rather than mandating that all contributions to Wikipedia must be comprised of completely original text. The latter case to me seems obtusely and unrealistically impractical and to enact something like that with no serious expectation that it would be followed would adulterate the efficacy of Wikipedia policy. --❨Ṩtruthious ℬandersnatch❩08:46, 27 June 2008 (UTC)
Do the original authors also have the moral right to not be insulted by Wikipedia policy defining their public domain work as being worse than stuff written by Wikipedia editors? Text which perfectly describes a topic should not be marked as scarlet letters merely based upon the status of the author's text. -- SEWilco (talk) 18:39, 26 June 2008 (UTC)
Even had I written "My kingdom for a horse", a five-word phrase is a bit short to be copyrighted. If indeed Shakespeare invented that phrase himself and never heard it anywhere else, which seems somewhat doubtful to me - at most he can probably be credited for popularizing it, which doesn't quite reach into the realm of plagiarism should I choose to re-use four of those five words. And as a matter of fact, if someone did not recognize the allusion to Shakespeare and simply took those four words literally in their understanding of my statement I would not have represented any of Shakespeare's thoughts or work as my own.
Now if Shakespeare had trademarked that phrase, and I was competing in the same market as him, say if I'd written and was offering a play incorporating the same phrase...
(Anyways, my point being that 21st-century corporate media notions of intellectual property (or even 21st-century climbing-the-academic-career-ladder notions of intellectual property) don't fit very well with Renaissance works and are actually somewhat obtuse and provincial in the greater scope of human intellectual history.)
Another thing that's worth bringing up is the inverse issue - that simply rewriting something doesn't really get rid of the issue of plagiarism. If you rewrite a passage of text so that your Wikipedia article presents the same information and ideas in the same order as an EB1911 article, and it simply uses your own turns of phrase, failure to acknowledge the antecedent EB article is still plagiarism. There's more to originality in IP, even synthetic originality, than original wording. --❨Ṩtruthious ℬandersnatch❩08:46, 27 June 2008 (UTC)
I don't think there is a minimum length for what is copyright, but this isn't about copyright. Some litigation relating to songs are founded on a single bar of music. As to whether Shakespeare himself plagiarized Shakespeare, I don't think I want to go there. :-) The allusion may be fine in a talk page like this where the context is understood, but may not be so in article space. Your last point is exactly what I am getting at. Eclecticology (talk) 12:00, 27 June 2008 (UTC)
There definitely is a minimum length for a copyright - copyright is not granted to a single word, for example. You might be able to trademark a single word within a particular market but a trademark is an entirely different thing from a copyright. (Which isn't to say you couldn't initiate litigation that demands that one person have exclusive rights over a single word - heck, you can litigate over anything, even "look and feel" - but I've never heard of copyright being granted on a single word or phrase.)
As to my last paragraph there, I don't think we're talking about the same thing, certainly not exactly the same thing. I'm not saying that changing a couple of words is insufficient to avoid plagiarism issues. I'm saying that it's possible to completely rewrite something and use 100% original wording and still be expressing someone else's work and thoughts. There seems to be an assumption above that if a public domain source is rewritten to a great enough degree that diffs would show no overlap with the original text, that takes care of plagiarism and attribution issues, but I would argue that it does not. Even some policy or practice that would force all original text into Wikisource would simply be deferring or deflecting development of proper and thorough attribution practices. --❨Ṩtruthious ℬandersnatch❩18:46, 2 July 2008 (UTC)
If you're going to quote the Berne Convention's moral rights in Article 6, read on to Article 7. All protections under Berne end at the same time. If copyright has expired under Berne, the moral rights also expired. We can be courteous to authors, but they don't have eternal rights. -- SEWilco (talk) 04:12, 8 July 2008 (UTC)
GFDL / Copying from within Wikipedia
I believe that above it has been stated that GFDL doesn't require article history to be preserved and that diff's aren't strictly required. I've read through the GFDL Text and I'm seeing the following:
Under 4. MODIFICATIONS
A. and B. - requirements to list on the Title Page a title (the &oldid provides the "distinct" title), authors responsible for the Modified Version, at least five principle authors of the Document. There are waivers here, but I'm not aware of which waiver I have given implied consent.
(There's also a bit in there about preserving the previous Title, i.e. renames)
I.Preserve the section Entitled "History"... and add...year, new authors.
J.Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. (and this is waived after four years or on permission)
From 5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions
In the combination, you must combine any sections Entitled "History" in the various original documents...
From 8. TRANSLATION
Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4.
Obviously others may have different interpretations, but it looks to me as though copying around the wiki, copying interwiki, mergers and renames, all have a strict requirement for attribution and history, and preserving the network location of the original Document means that, thanks to our devs, diff's come along with the package. We now return to our regularly scheduled discussion of 1911 EB. Franamax (talk) 06:45, 26 June 2008 (UTC)
One common point of confusion is that the "History" section describe in the GFDL doesn't need to include the previous text or diffs of any sort; it's about attribution rather than tracking changes. The precise requirements are in section 4.I. The "network location" of wikipedia article X is simply http://en.wikipedia.org/wiki/X. — Carl (CBM · talk) 11:21, 26 June 2008 (UTC)
But /wiki/X points to the Modified Version, not the Document. Here is the network location of the Document of which you have created a derived work. Its title is 221851430. I license it for reuse, under the terms of the GFDL. Franamax (talk) 11:46, 26 June 2008 (UTC)
Well respectfully, history/shmistory. The key point here is 4.A. At least for articles I've created, Im not aware of where I've given permission to use the same title as my version. There are three ways out: shut down the wiki; get every author's permission to use the same title; or use the pgid as the Title, and everything falls perfectly into place - iff we maintain a page history. Note that in the link you provide, it says "You may be able to partially fulfill the latter two obligations by providing a conspicuous direct link back to the Wikipedia article" (emphases mine) but it also says "you are encouraged to provide this authorship information and a transparent copy" - and the transparent copy in this case would indeed be the Document titled &oldid=123456789, since it would be the specific copy of pgid=123456789 and not whatever currently occupied the slot at /wiki/X. Franamax (talk) 12:19, 26 June 2008 (UTC)
This deserves a page all to itself, as it is only after a very long time that Wikipedians come to understand how GFDL is applied within Wikipedia itself, and the Transwiki system. What they learn is that everyone interprets it slightly differently, a lot of people get anal about it, but nobody really minds provided that editors leave plenty of clues behind so that the history can be retraced, even if it means many, many hours of painstaking wiki-archaeology. The point is that when it is required, someone will put in those many many hours in order to ensure we end up with attribution. We should encourage Wikipedians to leave plenty of clues, giving examples of how to do so in edit summaries, or on the talk page. John Vandenberg(chat)12:47, 26 June 2008 (UTC)
Indeed, this area needs to be carefully written. Wiki-copying is common and casual and in many cases to be encouraged. Copying across the article space itself is the focus here, we should probably disavow anything else. I wouldn't expect this guideline to apply to someone copying my template code or monobook.js (though it does strictly apply).
The primary issue is: can the wiki-ologist find it at all? The example here is copying text from a previously-deleted article (GFDL says "transparent" link, not "ask-an-admin") and various copy-paste moves and mergers where text suddenly appears under the name of a non-author, sometimes with a blank or terse edit summary. Franamax (talk) 08:40, 27 June 2008 (UTC)
I am not happy with the premise that it is best practise, or even acceptable, to base an article on a public domain source, and then heavily revise it until it is fully rewritten. I dont mean to disparage wikiprojects that have encouraged this in the past; that was then, and the decisions that have been made in the past have made Wikipedia what it is today. However, we are here now, and it is our responsibility for determining what Wikipedia will be tomorrow.
I hope that we will be part of the global change which is adopting open access. Many publishers are opening up their archives because they realise this change is happening and they are now seeing the good publicity of being part of open access to be more beneficial than the returns from access fees; NYT being the most prominent example. I hope that we will strongly encourage the archiving PD sources, which fundamentally improves the quality of our articles, as facts can be checked quicker, and context and POV of sources can be analysed.
If a PD source is archived in an accessible manner, Wikisource being one of the best methods I know of, I see no reason why the Wikipedia article should not start as a stub with a prominent link to the PD source, e.g. using {{Wikisource}} or {{Wikisource-inline}}. The pertinent facts can be incorporated, and the article will grow more quickly than it would if it looked complete because it was revised copy of a PD source.
John Vandenberg(chat)13:38, 26 June 2008 (UTC)
Let's discuss Mali as a example here. Despite being a 7 year-old article about a major African country, this article was crap until someone plagiarized extensive portions of the Library of Congress's "Mali country profile" to expand, improve, and in some cases completely rewrite several sections of the article. The Mali country profile is cited in each of those cases, and a note appears at the bottom of the article stating that the article incorporates text from the profile, which is in the public domain. Obviously, it would have been better for that editor to write original material from scratch using several sources, but given the choice of leaving the article in shambles for who knows how many more years, or plagiarizing the Library of Congress, I support their choice of the latter. Kaldari (talk) 15:18, 26 June 2008 (UTC)
John, I'm not clear on why you consider this practice to be unacceptable. (Possibly I simply have not read the rest of your comments in this page thoroughly enough.) Do your concerns extend beyond the possibility of inadequate attribution?
The way I see it, no work is truly original; everything anyone authors has antecedents and tendrils extending up into the aggregate intellectual cloud of human knowledge, from which we are all educated and informed and inspired. Everyone essentially uses public intellectual property, if only indirectly, in their own work. And the event of a work passing back into the public domain is the reverse of that process. To get airily poetic, private intellectual property devolving into public intellectual property is a parallel process to our bodies decomposing and rejoining the Earth that nourishes us.
So in my book, the properly-attributed addition of public domain content to Wikipedia is as good as, as intellectually valid as, the author's shade sidling up to the edit page and pressing "submit" on their own work. No, it's not voluntary in the same manner, but that's kind of the point - when intellectual property has passed into the public domain that means it's beyond the point where the author has ownership of it and the right to control how it is used. Attribution is deserved, yes, and necessary in the derivative work for intellectual validity - but the work itself has now become part of the big pile of modeling clay that the original author themselves pulled from, which we all get to use. --❨Ṩtruthious ℬandersnatch❩17:11, 26 June 2008 (UTC)
I pointed out several days ago (someplace above) that there is no need to assume that reused public domain text should be rewritten. The need to alter an article should not be based on a mere desire to be different. -- SEWilco (talk) 18:27, 26 June 2008 (UTC)
true, sometimes it is so well done that we can hardly do better, but usually it is better to recast them into our style and format. The identical ones get quotation marks and a footnote, the rewritten ones get a footnote. Personally, I often write by taking an article and going through and rewriting the paragraphs one by one, whether PD or not. The gift for excellent original writing is not all that common here--and when it is, the wiki nature of contributions from multiple parties would soon overwhelm it. DGG (talk) 18:44, 26 June 2008 (UTC)
This is one of the fundamental divides in this discussion. The approaches seem to be: place PD-text within inviolable quotations; place PD-text into the article, thus allowing it's full rewriting; place original text into Wikisource and reference from a stub; put the text into both Wikisource and the article; place transcribed/rewritten PD-text directly into the article. Deprecating the strict attribution and diffs issue here, and hopefully bypassing anyone's defence of their past practices, which is best overall going forward? Which is most likely to achieve compliance? And to liven it up a bit, I'll have to note the flurry effect - given a sufficient number of editors watching an article, changes tend to attract more changes, and this argues to some degree toward plopping text right in, where it may be more intensively wikied into shape. Franamax (talk) 08:22, 27 June 2008 (UTC)
I'm glad you have summarised the approaches being discussed here, and I've a bundle of half written thoughts in reply, but to start at the end ..
Regarding the flurry aspect, the same principles can be said to exist in all of the approaches you have presented. i.e. I believe that the dynamics of massive online collaboration (offtopic: these two papers are an interesting read on the topic as it pertains to Wikisource translations) is an orthogonal topic to the approach used to create an article heavily based on a PD-text. For example, if I were to dump the PD-text onto Wikisource and create a stub, and facts are merged in from the PD-text over the course of a few months using copy-paste-rewrite-cite-commit type edits, each of those edits will bring in editors. The dynamics of this will certainly be different to the dynamic of a big-bang PD-text dump followed by irregular fixes and minor rewrites, but the net effect will still be clumps of edits by numerous people.
And, while discussing the group dynamics, I would also like to point out that group dynamics also applies to the adoption of a "best practise". Provided we exclude any fundamentally broken options, any reasonable option can become the common practise due to the positive re-enforcement of incremental adoption. John Vandenberg(chat)10:47, 27 June 2008 (UTC)
My own style in rewriting a text for whatever reason is to start with it, and then systematically rewrite it, usually paragraph by paragraph, merging in new material as needed. Usually the style of the source article is so drastically different from what is desired, thtat all the wording gets changed around in the process, but I check back to see if anything remaining needs quotation marks. At some point it may need reorganization--that becomes obvious as one goes along. But there are multiple ways of getting there--people write in very different ways. The result should be an article in modern idiom that reflects current knowledge. DGG (talk) 02:27, 29 June 2008 (UTC)
Good progress
As often happens, I've been pulled in a lot of other directions. The How to identify plagiarism section needs development. Suggest listing changes in writing style as a clue to plagiarism, and cutting/pasting phrases into search engines as methods of locating possible plagiarized sources. With the latter, need to explain false positives from mirror sites. I've also supplied a new picture. :) Best, DurovaCharge!10:11, 27 June 2008 (UTC)
Mirror sites
There may be non obvious websites that copy from wikipedia. The obvious mirror sites are the ones that uses some bot to literally copy from wikipedia. When I was correcting stupid mistakes last month in wikipedia's thermodynamics articles (many articles were affected) I was curious about the source of these errors. A correct statement can become wrong if you leave out something. So, I searched the internet for sources to back up the old statement.
I found that there were many websites that had literally copied the wiki page. To my horror, Eric Weisstein's "world of physics" had based some of their articles on the flawed wiki articles. So, I never got to the bottom of the mistakes, rather I found that mistakes had propagated from wikipedia and even made it into sources that one would a priori regard to be very reputable sources independent from wikipedia.
Now, the very term "combined law of thermodynamics" is erroneous. It was never based on an official source, it was invented by the wiki editor User:Sadi Carnot who is serving a one year wiki prison sentence for subverting the thermodynamics articles (on other grounds than what I'm talking about now). More importantly, the statement:
dE <= T dS - P dV
is wrong. This conclusion was reached on the basis of flawed original research published in the old wiki article on this subject, see this version The fact that this version actually cites Weisstein's cite was very misleading: it would have been a circular citation had the editor of Weisstein's site cited where he had copied that formula from :)
It contains flawed statements and derivations. The source of these mistakes was the old version of the wiki article on this subject see here
The text of the world of physics article is different, so we are dealing with a world of physics editor who uses information from wikipedia and then writes it up in his/her own words.
So, when looking for plagiarism, we have to keep in mind that reputable sources may sometimes simply copy from wikipedia. Count Iblis (talk) 17:42, 27 June 2008 (UTC)
Dealing with appropriate or inappropriate overlaps between wikipedia articles and their sources is something that can be aided by having suitable tools for editors.
And we need tools to support such editors. In particular, there's a need for a plagiarism checking support tool. What i have in mind, and suggested in passing somewhere else before, is a program / bot / whatever which would run a diff between a given wikipedia article and any other specified internet webpage (or list of webpages). It is often the case, both in student papers submitted for class assignments, and for wikipedia articles, that plagiarism is from sources cited in a paper or article. For professors to check student papers, there is a commercial website Plagiarism.org / Turnitin.com, which some schools subscribe to which performs this check (against everything on the internet plus some literature databases content plus everything previously submitted to Plagiarism.org / Turnitin.com). However, the basics are just to run a Unix diff utility and/or other program to match up word sequences in two texts, and then to highlight the overlaps and/or report them in some way. There is freeware available to do this that can be adapted. If such a tool were available for wikipedia articles (or if there were a general website that would allow comparisons between any two internet websites), then a plagiarism checker or two could process the DYK nominations routinely. Such a tool would also be useful for regular editors who are trying to clean up articles and ensure that material adapted from a source is either reworded or put in quotes where it should be credited. There is no such tool available to us now however, and I am not even sure where to suggest it, so i am mentioning it here just to put it out again. doncram (talk) 18:43, 28 June 2008 (UTC)
It's not just a matter of using a tool. Rewording does not necessarily get rid of the plagiarism; giving proper credit does. The tools alone will only give a false sense of security. Eclecticology (talk) 23:15, 28 June 2008 (UTC)
Of course, proper credit should be given. A tool like this would be a help though, when you are trying to sort out how to properly credit something. Many wikipedia articles have passages that are mixed, of reworded and not reworded. Both need to be credited. If there is a long passage of text that is copied verbatim, giving proper credit should usually include putting it in quotes, to give credit for the actual wording to the original author, rather than just including a footnote which usually implies that it has been reworded. It's a requested tool, not a solution. doncram (talk) 23:46, 28 June 2008 (UTC)
Apparently, there is copyscape.com. I know nothing about it, when I get home I'll mess with it some. As Eclecticolgy says, just changing the wording doesn't get around the issue; however if there is something that will spot rearranged text fragments, that will catch many of the most blatant violations. Franamax (talk) 06:19, 29 June 2008 (UTC)
When you use it, don't enter the Wikipedia article as the page, enter one of the article's sources and see if you get the Wikipedia page as a return. If the article is brand new, I don't know if there will be a delay. --Blechnic (talk) 23:15, 29 June 2008 (UTC)
The pointlessness of this page is made clear by the fact that the first paragraph is just a dictionary definition of "plagiarism", followed by a quotation from Wikipedia:Copyright problems, followed by an unpolished imitation of Wikipedia:Citing sources, followed by a quotation from Wikipedia:Non-free content, etc. The only salvageable potentially unique part of this page is relevant specifically to copying in general, not plagiarism, and is more useful for detection of copyright infringement because the incidence of non-copyright plagiarism is so low. —Centrx→talk • 05:35, 30 June 2008 (UTC)
Cool! I wrote most of that first paragraph ab initio, thanks for the comparison to a dictionary :) I've already heard the Franamax=pointless comparison though, so nothing new there - you should talk with my ex-wife :)
The point of a unique guideline here is to lay out the exact basis for editors to guide themselves as to what is accepted practice and best practice, and to outline a basis for judging other editors conduct. There have been several discussion threads recently which refer explicitly to plagiarism. Plagiarism is an issue which subsumes copyright, in fact plagiarism may be more important to our purposes of creating a reputable encyclopedia.
"Copying in general" is actually plagiarism in general, unless the copying is done properly. The specifics of its many forms are what we are trying to pin down. If you can salvage any part of this page, especially the unsalvageable parts, please do so. It needs trimming and expanding and refactoring and lots more discussion - hack away!! Franamax (talk) 08:43, 30 June 2008 (UTC)
OK, maybe I shouldn't have used the term "hack" - I meant slice, not chainsaw. Centrx, can we discuss each of these separately, rather than you just removing the bits you don't like? This is a proposal after all, lets work it through. There's no deadline, and lots of space here on the talk page. Franamax (talk) 08:53, 30 June 2008 (UTC)
The parts I removed were only the most egregious; the remainder is still mostly problematic, and none of it belongs under the title "Plagiarism". —Centrx→talk • 04:52, 1 July 2008 (UTC)
No insult intended; it is good, such as it is which is redundant and unnecessary.
Plagiarism does not subsume copyright: a tract of copied text that cites its source is not plagiarism but can be copyright infringement.
Copying in general is not plagiarism. Plagiarism requires specifically that the copy be without acknowledgment and "passed off as one's own", that is "stealing" or "literary theft" (see OED or Webster). This confusion is evident especially in the sentence "Copying the works of others and presenting them as your own is not acceptable": No one presents any article text as their own on Wikipedia; we do not own our articles, and the articles organically evolve and morph away from their previous authors. "Plagiarism" may be relevant to academic scholars whose reputations and authority depends on their originality, but not Wikipedia. —Centrx→talk • 04:52, 1 July 2008 (UTC)
Even if we were to grant the premise that very little of what is going to be said here is not also part of other policies and guidelines, I don't think that I would call a plagiarism guideline pointless. Many of our core policies and guidelines intermesh and overlap. (WP:5P, WP:NPOV, and WP:NOT, for example, all have significant overlap with one another, as do WP:CIV and WP:NPA.) We generate new guideline pages when we feel it is appropriate to emphasize or expand on a specific policy or aspect of a policy, or when we want to bring together ideas from several different guidelines and policies in one place to address a specific issue (for ecample, WP:TE).
Per Durova, copyright and plagiarism are distinct concepts, and it is possible to have one without the other. (Confusion sometimes arises because we so often see them appear together.) A page on what plagiarism is and how to deal with it would be useful even if it served no other purpose than to explain the distinction between plagiarism and copyright infringement, and how to avoid plagiarism. TenOfAllTrades(talk) 02:59, 1 July 2008 (UTC)
WP:5P is a highest-level summary of the encyclopedia. WP:NPOV and WP:NOT are original concepts and contain overwhelmingly original material and mention each other only in passing in individual sentences (with the singular exception of the section "Wikipedia is not a soapbox"). WP:NPA is useful only because it is a bright-line rule for behavior, much like WP:3RR -- if you want that for "Plagiarism", it requires only a paragraph that states the rule and links to the pertinent information in Wikipedia:Citing sources and Wikipedia:Copyright violations, without copying text back and forth or having diverging imperfections.
However, there is nothing uniquely wrong with "plagiarism" on Wikipedia. Everything wrong with plagiarism on Wikipedia falls under copyright or sourcing. We do not claim to originate ideas in Wikipedia articles--in fact we specifically forbid it--, and just as there is nothing wrong with the many writers who write "there's the rub" or "the pale cast of thought" without citation, copying a well-written tract of text from Britannica 1911 is bad only because it is unsourced, not because it is "stolen". There is no specific issue or different idea to address in a new Guideline. —Centrx→talk • 04:56, 1 July 2008 (UTC)
Centrx, I appreciate some of the points you are making, but without causing offence, can I ask whether you have read the entirety of this talk page? If you have, I apologise, but I think some of the points you are raising have been discussed above, while some haven't, and it is good that you've raised them. The key problem I think is that there are varying standards of plagiarism, and the "edited mercilessly" bit does make it difficult to pin down what plagiarism is on Wikipedia. I'll address some of your points in more detail below, feel free to intersperse comments. Carcharoth (talk) 09:32, 1 July 2008 (UTC)
"the incidence of non-copyright plagiarism is so low" - Centrx, is this anecdotal, or do you have specific examples and surveys in mind here? I think the poorly cited updatings and rewritings of Encyclopedia Britannica 1911 text is an example of widespread text-plagiarism of the "confused sources" type. Not intentional, and fixable, but still imprecise presentation of material obtained from the sources. I also think that any article that started off as PD-text and is slowly being rewritten and updated is in danger of verging into text-plagiarism at any point in the process during which it is being changed from "PD-text quotation" to "paraphrase of PD-text". Carcharoth (talk) 10:37, 1 July 2008 (UTC)
I have spent a substantial amount of time finding and correcting unattributed copied text. Invariably they are copyright violations (or publicists authorized to copy the text), if only because copyright is so encompassing, and careless people are more interested in the present and recent past than in text from long ago. Regarding Britannica: again, no one is intentionally "stealing" the text, and correcting the problem is no different from dealing with Wikipedia:Cite your sources or Wikipedia:Copyright problems—Centrx→talk • 00:27, 2 July 2008 (UTC)
"a tract of copied text that cites its source is not plagiarism but can be copyright infringement" - this is a very narrow definition of plagiarism, one that says if you cite your source you are not "claiming it is your work". You also fail to make the necessary distinction between quoting a passage from a text and merely citing a piece of text. The use of quote-marks and direct mention of the source in the text is often vital to avoid misleading the reader. Merely citing a source is often not enough. There are also broader definitions of plagiarism that are similar to the concept of "derivative works" in copyright law. These broader definitions focus on copying of the meta-structure of a work (chapter headings and the order of a list, and the style and approach taken when writing about a particular topic), as well as the extent to which a concept has been rewritten in the author's own words. This rewriting often needs to be done, and can be done without original research. I think the confusion here comes between plagiarism of ideas and plagiarism of text. We, as Wikipedia editors, cannot contribute new ideas, and any ideas we write about have to be sourced. What Wikipedia editors can be guilty of is plagiarising the text others have written to describe such ideas. Carcharoth (talk) 10:37, 1 July 2008 (UTC)
Plagiarism requires appropriation without attribution. You may have invented a new meaning for the word, or you may have misunderstood the word, but that is irrelevant to this discussion; there is already a word for what you are talking about: copying. Weak citation also is not plagiarism; standards of citation are already covered in Wikipedia:Cite your sources and related pages. —Centrx→talk • 00:27, 2 July 2008 (UTC)
The concept of "rewriting" is particularly relevant to Wikipedia, because, by "mercilessly editing" text, we are constantly rewriting what others have written on a certain topic, even if the original ideas about that topic are cited to external authorities. The problem arises, in my opinion, when Wikipedia editors submit chunks of public domain text written by others, and then the rewriting (paraphrasing is the technical term, I think) takes place "live" on the wiki, rather than in the editor's head. This is also combined with updating of the text and addition of new material. If done poorly, such "on-wiki" rewriting can lead to a mess where the original attribution of an entire chunk of text to one source becomes an unclear mess where sources are mixed up with no clear distinction being made. The crucial point here is that: "A paraphrase that is not accompanied by adequate acknowledgment of the source is still a form of plagiarism" (taken from paraphrase). And here, "adequate acknowledgement of the source" is critical - what is "adequate" for Wikipedia might not be adequate for others. We need to make clear what is adequate for our purposes. Carcharoth (talk) 10:37, 1 July 2008 (UTC)
What is adequate for our purpose is citation and sourcing and is already covered under those guidelines; this has nothing to do with plagiarism. —Centrx→talk • 00:27, 2 July 2008 (UTC)
In other words, copying PD stuff written by others is OK if you quote them, provide attribution, and cite the source, but once you start to rewrite that same material, you need to remove the quote marks, paraphrase what was written, and update the sources (if you are adding new material), and make the citations more specific for the older material (if you leave that old material there). It is a complex, delicate process, and from what I've seen, Wikipedia bludgeons its way through this process. Carcharoth (talk) 10:37, 1 July 2008 (UTC)
You can also have plagiarism of copyrighted material. If the rewriting is minimal, it can in some cases be considered a derivative work, so I agree that those cases can be considered copyright violations. However, some people will insist that by changing or rearranging the wording, such actions can never be considered a copyright violation. Pointing them here might help. Carcharoth (talk) 10:37, 1 July 2008 (UTC)
Even when extensive rewriting takes place, if the overall plan of the work is copied (eg. biography chapter headings) then you still need to say that explicitly in the text. Just citing that biography as one of the "sources" is not enough. You need to explicitly say "the layout of this version of this article uses the layout of this biography". Failing to say that is using the layout created by someone else, without attributing that layout to them, ie. plagiarising the layout of the article from the biography. Carcharoth (talk) 10:37, 1 July 2008 (UTC)
You have in many words not addressed the arguments I stated above, especially how the "Plagiarism" described in this page is different from copyright and sourcing issues in any way relevant to Wikipedia or requires any different response. —Centrx→talk • 00:27, 2 July 2008 (UTC)
Category:Wikipedia rejected proposals
I give my support to everyone who has contributed to this proposal, you have a done a lot of good work and recorded a lot of thoughts, but...
I have tried to read most of the discussions here on why there should be a policy on Plagiarism, but it is challenging to read because the same two things get said over and over, and it makes you want to take a nap.
- We don't have a policy on Plagiarism and Plagiarism is bad so we need a policy
- We already have policy to cover everything bad about Plagiarism
A review of Wikipedia:Policies and guidelines#Sources of Wikipedia policy and the rest of the page shows that the none of the 3 sources for a new policy exist, 1. There is no need to document existing practices as they are already documented, at WP:V. WP:OR, WP:COPY, etc. 2. no one is seeking a change in practice and no consensus for a change is emerging. 3. There has been no declaration from the board, or legal imperative.
I believe at least one open question has been highlighted by this discussion: there is no policy on how to properly attribute PD sources. In the frontier days of Wikipedia, the assumption was that they could be absorbed without ado, but we now believe that they should be acknowledged and if possible made available. Is there an existing policy that addresses this question, and if not, is there a policy that could be updated to do so after appropriate further refinement? David Brooks (talk) 18:51, 2 July 2008 (UTC)
Jeepday has challenged me on this assertion (thanks for taking it private; no problem) and I guess it is true that the policy on citing PD sources is covered by WP:CITE. I was thinking of the related issues raised by some suggestions on this page: citing a copy of a public domain source is problematic especially if it is on a wiki, and there seems to be a preference for not only citing an online copy but linking to it, which gives it an extra responsibility to be correct. For example, citing wikisource, jrank, or lovetoknow, for 1911 material, can be hazardous to your integrity. David Brooks (talk) 03:57, 3 July 2008 (UTC)
Hold on. Before this became a proposal, it was a redirect to a section somewhere. The links that were pointing here that should be pointing there should be fixed before anything is rejected. I'm also not clear why the proposal can't be given more time to develop. There are a lot of points on this talk page that could be integrated into at least an essay. I'd also like those saying this is covered elsewhere to clearly quote the text they think covers plagiarism, and why that text elsewhere does the job it does without a proper definition of plagiarism. Carcharoth (talk) 21:46, 3 July 2008 (UTC)
How to deal with persistent plagiarism?
I've been dealing with an editor on and off for over a year who habitually commits plagiarism. His articles often contain direct quotes from other webpages or books without quote marks or proper attribution, as well as lightly-rewritten sentences. The editor will usually rewrite sentences when informed of problems, but the results still seem problematic. The wikipedia article will be different enough from the source(s) that a Google search will not locate the original, but a comparison between the wikipedia article and the source(s) shows a strong correspondence in sequence of thought, emphasis on particuar details, etc.--in other words, the wikipedia article looks like a slightly altered version of one or more other sources.
What's to be done about this? It's not clear to me that plagiarism of this sort is a copyright violation, as the copy isn't verbatim, but I still think the articles (and the editor's writing practices) are problematic. --Akhilleus (talk) 01:27, 3 July 2008 (UTC)
It sounds like this is an honest person contributing productively to Wikipedia: you can explain why sourcing text and ideas on Wikipedia is important for any reader or future editor who wants to find out more information about the topic, and for helping protect against sneaky vandalism where unsubstantiated statements are added. If he is contributing to Wikipedia because he wants to contribute to a useful encyclopedia and expand human knowledge, he should see the wisdom of the first argument. If he is contributing to Wikipedia because he wants to codify the Truth, or even merely for his own personal reference, he should see the wisdom of the second argument. Also, copyright violations pose a legal problem for Wikipedia, which can threaten the platform to which he is contributing, and which can result in his contributions being eradicated.
If he cannot see the wisdom of these arguments--that is, if he is an irrational nihilist--, he will need to be warned and, if he persists, blocked. —Centrx→talk • 04:32, 3 July 2008 (UTC)
I'm going to revert Centrx's action in pointing this page again to Wikipedia:Copyright problems. There has been good participation here, ongoing attempts to get more people involved, and even specific plagiarism questions directed here. We should continue this effort unless there is consensus to abandon, which I don't see at this moment. Centrx, do you have any ideas for how to flesh out this page into one which provides a comprehensive resource for editors wondering about plagiarism, beyond redirecting them back to the wiki-hunt? Franamax (talk) 00:24, 7 July 2008 (UTC)
Also, the very first line on this page is a pointer to the page to which it previously redirected, so nothing is lost by leaving this nascent page to develop. Franamax (talk) 00:26, 7 July 2008 (UTC)
Thank you. There is room for a guideline here, even if it doesnt become a policy because other policy documents cover the territory better.
There have been no developments for quite some time, and no one has justified having a guideline. —Centrx→talk • 02:59, 7 July 2008 (UTC)
Well, I for one have been letting it sit waiting for other comments before my next work session. It's not exactly being ignored: [25][26] but we would love to have others help to expand and contribute.
As far as your recent deletion of the "most egregious" content, here is the genesis: original wording, another idea, a responsive edit, an objection, and tworefinements and a moderation. These are painstaking and hard-won efforts. Can you outline what is particularly egregious here? Is it the statement that "Wikipedia is an encyclopedia"? Or is it "Copying the works of others and presenting them as your own is not acceptable practice here."? One or the other must be bothering you, let's hash it out here on the talk page. Franamax (talk) 04:39, 7 July 2008 (UTC)
Material can be plagiarized from
"Material can be plagiarized from…" has one meaning that it is acceptable from the listed sources. Less ambiguous phrasing is needed. -- SEWilco (talk) 03:43, 8 July 2008 (UTC)
SEW, I changed around your change, trying to incorporate various viewpoints. I don't know if I achieved anything - I was trying to strike a balance, material within quotes is inviolable, material placed in the article is fair game for editing - IMO as long as there is a firm link to the source PD text as inserted.
As far as the section heading above, I might have missed that. Can you elucidate where "material can be plagiarized from" originates? My understanding is that we are outlining how exactly nothing can be plagiarized - possibly you've identified a hole in the wording, I guess I don't quite follow you... Franamax (talk) 04:39, 8 July 2008 (UTC)
Paraphrasing, direct copying, quoting and incremental editing
I made a set of changes here, in an attempt to address what SEWilco changed, and to try and restore the point I was trying to make. I think it would be helpful to lay out here the actual processes that text goes through, and where it comes from. One of the points is that "merciless editing" does not mean "anything can be changed" - there are still limits, set by intellectual honesty, on how the presentation of things can be changed. Talking about this might make things clearer to those who don't see a need for this guideline. I don't have time to lay out what I see as the differences between paraphrasing, direct copying, and quoting, when to use them, and how they are affected by incremental editing, but if others could say what they see as the meaning of these terms, that might help. Anyone? Carcharoth (talk) 11:55, 8 July 2008 (UTC)
If some free content text (either explicitly free or public domain) is incorporated into a WP article, there is no requirement that the "original sense" cannot be edited out. There's no reason to treat it differently than any other text in the article. I don't see the issue of "intellectual dishonesty" at all. We are neither claiming that we have exactly copied the other person's text nor that we are presenting it as they did, so there is no dishonesty. On the other hand, if the text is explicitly marked as a paraphrase but not a direct quote, then of course the paraphrase must be accurate to its source. — Carl (CBM · talk) 16:28, 8 July 2008 (UTC)
Sadly, the change you made from:
"Incremental editing of this last sort of "block copying" may distort what was copied - leaving the text as a direct quote may be the better option."
to:
"Unless the goal is to use text of this last sort as part of the article, which can be mercilessly edited, it may be better to use smaller pieces as direct quotations."
...managed to lose entirely the point I was trying to put across. Look, we all get the "merciless editing" thing drilled into us from day one, but "merciless editing" does not mean "do what you like with the text in front of you". All editing still needs to be careful and accurate and not misleading. If a piece of text is directly attributed in the text (eg. "Smith said in the classic paper Demonstrating that the Sky is Blue (1955) that the sky is in fact not always blue...") then editing has to be done with care to not change the sense in which the source has been paraphrased. Ditto if it was a direct quote: "Jones said in his landmark paper The Dark Side of the Moon (1978): "one of the most pervasive astronomical myths is that the far side of the Moon is always dark, despite the classic album title"..." (forgive my clumsy pseudo-examples), then there are limits to what you can do with this text unless you remove it entirely and start again. The same thing appplies to large blocks of PD text placed into, or used to start an article. Great care must be taken, because, for a start, the text may itself contain bits of text that are rigid and inflexible (but are not clearly marked as such - the standards for other PD texts varying from our sourcing standards), and where large changes will end up distorting what was being said. With other Wikipedia editors, you can talk to them about what they meant by what they wrote. With blocks of text from other writers, and crucially from those writers that have not provided sources with the text (and who may now be dead, or otherwise uncontactable), there is less room for manouevre. This is actually getting away from plagiarism and towards general writing and research skills, but the point is that poorly done and minimal rewriting can often end up being a loose bit of plagiarising, instead of a carefully rewritten and updated piece of PD-text. That is the key point I've been trying to get across here. The two situations are: (A) Block of PD-text starts an article and gets incrementally rewritten with lack of rigorous sourcing = mess that may one day recover and be OK; (B) Article written from scratch and the entire PD-text of a source is available on wikisource or its original website, and paraphrases and quotes are used from the PD-source to make small improvements to the article, but overall the article is the work of Wikipedia editors, not a reworked and "improved" version of the PD-text. As I said, ideally the same endpoint, but the dangers of reworking a block of PD-text are legion if not done properly. Carcharoth (talk) 22:09, 8 July 2008 (UTC)
I'm a little confused here (it does seem to be a bone of contention). If we set the standard that the original PD-insertion is clearly indicated, then a diff can always be made for comparison purposes. Does that satisfy all sides?
Of course, push it a little further, where we see editor assertions that they both transcribe and rewrite the text in the same edit, my idea starts to break down a bit. Franamax (talk) 22:21, 8 July 2008 (UTC)
I think, at a very minimum, that it is important to reach a consensus guideline, that a PD text, when copied in, should be copied in cleanly without rewriting in the same edit, so that diffs can be assessed later as to how much wikipedia editing has changed from the original text. doncram (talk) 23:14, 8 July 2008 (UTC)
No. When typing an old text, contributors should be able to replace "f" with "s" when appropriate, correct spellings, and make other changes without having to in effect retype it a second time in order to provide a first Wikipedia version. We don't require editors to provide digital copies of other source materials. Requiring such a rewrite also invites the contribution of only the original text, with cleanup being left to other editors. -- SEWilco (talk) 03:02, 9 July 2008 (UTC)
Re Carcharoth, I simply disagree with "The same thing applies to large blocks of PD text placed into, or used to start an article." It makes no difference whether the text is originally from a PD source, another GFDL free content project, or a wikipedia editor. We do need to provide attribution to the original source of any text, but once that attribution is given we can proceed with editing as normal without worry of plagiarism, which is the topic of this page. — Carl (CBM · talk) 02:26, 9 July 2008 (UTC)
But once you start editing, the attribution is wrong. It is no longer "X wrote this", but "X and some Wikipedia editors wrote this". Our attribution system does not make this clear enough, and that is where the plagiarism concerns arise. Carcharoth (talk) 08:10, 9 July 2008 (UTC)
The attribution doesn't say "X wrote this", it says "some of the text here incorporates or derives from text written by X". Just as we don't mark which parts of each article were written by a particular editor, we don't explicitly mark which parts are derived from other free content sources. — Carl (CBM · talk) 13:37, 9 July 2008 (UTC)
And that is what I am trying to say. The attribution is not explicit enough. When an attribution is too vague, you run the danger of not knowing which bits are from where. For Wikipedia editors, we don't need to know who said what. For articles that start off as, or incorporate, complete copies of a entry from a PD-source, then we do need to know which bits have come from where, or at least the bits that shouldn't be changed (or would need drastic changes if they were changed) need to be marked out clearly before further editing takes place. If we don't do that, then we do a dis-service to later editors who have to deal with the resulting mess. I've looked at Poincaré recurrence theorem, and it is a mess. It looks like a hybrid Planet Math/Wikipedia article, and it suffers because of that. More explicit attribution and clarity would help there, but as I don't know the maths, I can't really help. Carcharoth (talk) 14:29, 9 July 2008 (UTC)
What are "the bits that shouldn't be changed" is always dependent upon the facts of the article's topic. When describing how hard rocks are, any of the phrasing can be changed as long as the proper meaning exists, whether the definition is (or was) comparisons between materials or quantitative measurements of hardness. The only bits which can't be changed are direct quotations because there the words themselves are the facts. -- SEWilco (talk) 04:13, 10 July 2008 (UTC)
As an example, look at Poincaré recurrence theorem. This is marked as using some material from the GFDL site Planetmath. Editors don't need to be concerned with which parts are from Planetmath and which are from other sources; they can simply edit the article to improve it when desired. That's the fundamental essence of free content. I think the Wikipedia:WikiProject Mathematics/PlanetMath Exchange project is mostly finished now, but they have explicit advice about attribution on their project page. — Carl (CBM · talk) 02:38, 9 July 2008 (UTC)
Free content does not require more care than other text. It can be edited in any way, as long as the information is correct. And it is always up to the editors to know what is correct. If you don't know which way to thread the reins on a harness, don't change that part of the meaning during your edits. There's no need to remove or rewrite content differently based upon its origin. If I contribute something from a religious text which merely describes how to weave a monk's robes, does the holiness of the words need to be preserved? We can create a Template:blessedletter to attach to every holy typographical relic. -- SEWilco (talk) 03:45, 9 July 2008 (UTC)
And I still disagree with this. I think it is a fundamental misunderstanding of the difference between text that you have written, submitted to Wikipedia by pressing "save page", and copying free content from elsewhere and adding it to Wikipedia. Both are allowed and encouraged, but the process is (or should be) different. In the former case, you are writing from your head, drawing on sources and paraphrasing and citing and so on, but in the latter case, you are trying to merge two texts, or start a new article based on an external text. When the whole messy process is all in-house, with Wikipedia editors rewriting each other's text, that is fine. But when the whole messy process uses an external text as a starting point, numerous problems start to appear, and that is why I am saying "great care" is needed. Carcharoth (talk) 14:29, 9 July 2008 (UTC)
Can you explain in more detail why the process should be different, and how this relates to our goal to be a project that both gives and takes free content? What problems appear, assuming that we explicitly say that we have used the other text in writing our article? Not knowing which parts came from which sources is not, on its own, a problem - it's the usual situation when free content from several sources is combined to make a single document. The same thing happens with source code when GPL programs are merged. People improving an article shouldn't start by researching who wrote which sentences in it - they should examine the article as it is and make improvements as they see fit. — Carl (CBM · talk) 14:41, 9 July 2008 (UTC)
"People improving an article shouldn't start by researching who wrote which sentences in it" - not explicitly, no, but after reading the whole article, they should be able get a sense of which bits are from which source, which bits are unsourced, and then proceed from there when improving the article. I don't think it does impact our goal to "be a project that both gives and takes free content" (I think it impacts our goal to be an accurate and verifiable encyclopedia), and I think the GPL computer program analogy breaks down at several points - an encyclopedia article is not a computer program. Ultimately, I suppose, what I am saying is that I think large blocks of PD-text should be rigorously sourced to our standards, before being imported. It might be free, but that doesn't stop it being poorly sourced. And if we are going to rely on the 'source that the PD-text was from' as a source for the information in the PD-text, then the citations need to be alot more explicit than a template at the bottom of the article. Does that make things any clearer? Carcharoth (talk) 15:03, 9 July 2008 (UTC)
What do you mean by "sourced to our standards"? We only require material to be verifiable, not actually to be sourced, and the material we are talking about here is almost all going to be verifiable. There's no requirement for editors to preliminarily give sources for uncontroversial information, we just rely on editorial discretion when adding content. In any case - whether articles meet WP:V doesn't relate closely to plagiarism. — Carl (CBM · talk) 17:45, 9 July 2008 (UTC)
I think I actually meant "rewritten to our style". Piggybacking on the style and text of others is fine, but as soon as we start editing and changing what others wrote, we should have the honesty to completely rewrite and update it, and not do an incremental rewrite that leaves the text for long periods of time as an unclear mish-mash of our writing and their writing. That "mish-mash" style is the current state most Encyclopedia Britannica 1911 articles are in. Carcharoth (talk) 21:36, 9 July 2008 (UTC)
That's fine to say, but a lot of content comes from IP's and casual editors, who won't know and won't look for PD-attrib templates. Similarly, RC patrollers and watchlisters will not catch these incremental changes as being in the PD portions (unless it's a blockquote). Regardless of the validity of your statement about full rewrites, we have a conundrum, as the current mish-mash so clearly indicates. Franamax (talk) 21:47, 9 July 2008 (UTC)
Again, I disagree with "as we start editing and changing what others wrote, we should have the honesty to completely rewrite and update it". It isn't a matter of honesty in any way, as we never had a goal of writing everything ourselves. As an openly free-content project, nobody should be surprised if we use external free content to create our articles. The idea that there is "their content" and "our content" runs directly against our espoused philosophy as a free content project. — Carl (CBM · talk) 23:47, 9 July 2008 (UTC)
I know some people are sensitive to anything perceived to run against the "free content" philosophy, but this is a red herring. This is not about avoiding the use of free content. It is about addressing the inherent tensions between "it's free content, mercilessly edit it and do what you like with it" on the one hand, and "we must be careful when editing to ensure that verifiability, attribution, citation, and appropriate balance between sources, and so on, are maintained, and to ensure that we don't distort what has been said". To take an extreme example, are you saying that it would be acceptable for someone to copy our article on a topic, to make misleading and distorting changes, and then to say "this article incorporates text from the Wikipedia article"? Or to take another example, would it be acceptable for someone to take the text of one of our featured articles, to rewrite it in a biased and inaccurate way, and to then proudly proclaim "we wrote this based on the Wikipedia article - but we aren't going to tell you who wrote which bits - why don't you try and guess?" There are limits to what it is acceptable (not legal, but acceptable) to do with free content. It might be legal to do anything with the text, but to my mind, something as complex as another encyclopedia article should be used "complete" (that is what mirrors do, after all), and only rewritten for clear reasons and with care and respect for the original. Carcharoth (talk) 00:12, 10 July 2008 (UTC)
(←) "To take an extreme example, are you saying that it would be acceptable for someone to copy our article on a topic, to make misleading and distorting changes, and then to say "this article incorporates text from the Wikipedia article"? "
Yes - anyone can take text I have written here, add false claims to it, distort the wording, and then use it for purposes I find morally reprehensible. I have explicitly granted them permission to do so, provided they follow the terms of the GFDL. If I only wanted to allow people to use the text in ways that reflect my opinion, or ways I approve of, it would not be free content. This is a well-known concept in the open source movement known as "no restriction on field of endeavor". — Carl (CBM · talk) 01:04, 10 July 2008 (UTC)
Agree with what you say, especially because it's true. However, this is not necessarily a discussion on the open-source movement, nor is it about GPL or even GFDL in general. This is about Wikipedia, which is singular or at least a bellwether. So let's semi-invoke Godwin's Law - if you wrote an article on Hebrew grammar and a little later an article appeared about Jews hate Earth and want to kill it, and that article had a template on the lines of "this article incorporates text contributed by CBM" - would that be a plus-good thing to happen on Wikipedia? Franamax (talk) 10:17, 10 July 2008 (UTC)
It's terrible for them to publish antisemitism, but (in the U.S.) they are permitted to do it, and even though I would denounce the publication they are able to use my text when writing it. I don't think this is actually very far-fetched - I'm certain that at some point people will start adding "this takes text from Wikipedia" to try to give credibility to their hate speech. But it's quite unlikely that anyone would view a Wikipedia article as hate speech; any objection to having your content included in Wikipedia would be for some more subtle reason, especially since the person objeting can always edit our article themselves. — Carl (CBM · talk) 13:10, 10 July 2008 (UTC)
What kind of care?
work marked as a paraphrase of other people's actions or words (which can be edited as long as the original sense is not lost); and direct copying of large blocks of free content written by other people (in which case, more care is needed).
(1) When originally importing a block or entire article of PD-text, it should be done cleanly (without any corrections for style and spelling - such corrections can easily be done in the next edit - it is lazy to not separate two different types of edit when doing so makes it easier for others to see what has been done). Ideally, use wikisource to provide a separate record of the original text.
(2) Make it clear what has been imported - where the imported text starts and where it ends and where it came from - in the case of an import creating a whole article, a template at the end does this, though as the rewriting progresses, this template becomes less and less relevant, though it shouldn't be removed altogether.
(3) When making corrections of inaccurate material for old PD-text (eg. outdated geography or outdated history or outdated names), always add in a source for this correction, thus making clear where the text has been corrected. Be aware that sometimes the old outdated stuff is of historical interest, and a bit more research can yield a footnote of the sort "the old name for this area was <insert name>, as given in <cite old PD-text>".
(4) Direct in-text attributions of the sort "Smith said", shouldn't be mangled. They should be checked, though, just like any other source. If you can't check this, make clear that it is not Wikipedia's editors saying "Smith said", but the original author of the PD-text saying "Smith said".
(5) Addition of completely new material should also be carefully sourced and marked to make clear that it is not part of the original text. If images came with the original PD-text, make that clear as well. If the images are new, say so.
(6) By this stage of the rewriting process, you may have a much changed document. Possibly not much will remain of the original, but equally it is possible that the bulk of the document is still the original. If there are only skeletal remnants of the original text, it is important to check for distinctive phrases and paragraphs from the original and to individually cite those to the original, sometimes elevating them up out of the text with quote marks. If the bulk is still the original text, this needs to be made clear somewhere with a note (linking to a specific version) that "this article is still largely based on X, published in 1911, but has been updated for 2008 using the additional sources cited in the text. For the original text, see here.", or maybe "this article is still largely based on X, published in 2005 as part of work for the US government, but has been continually updated and expanded since then, using the additional sources cited in the text. For the original text, see here."
At each stage of the above, compare what would be done if you had been rewriting text submitted by a Wikipedia editor as their own work in writing an encyclopedia article (as opposed to submitting encyclopedia-ready text written by someone else):
(1) You would state in the edit summary where you got the GFDL text from if importing or merging or splitting from another article, and hopefully you would import all relevant footnotes and sources at the same time.
(2) Not relevant here, as you are using GFDL text.
(3) This still applies to any text.
(4) This still applies to any text.
(5) The merging here can be more seamless, because you have more latitude with GFDL text.
(6) Again, the merging here can be more seamless, because you have more latitude with GFDL text.
Rule of thumb: anywhere you might find yourself saying "PD-source said this", then don't mercilessly edit and subsume that bit of text into the whole so that its origins are lost amongst all the other edits, but instead paraphrase it or quote it, and provide a citation to that PD-source to make the fragment of content in question stand out from the rest of the text. It is a simple courtesy to the reader so that they can see where the relevant bits of the article have come from and who (among several sources) is saying what. In other words, don't mix up the Wikipedia "editorial voice" with the "PD-text voice". Is that any clearer? I don't want to go round in circles again, so can we try and get some agreement on this or say where the root of the disagreements are? Carcharoth (talk) 08:06, 9 July 2008 (UTC)
We don't want to simultaneously use the text as our article and as a reference for our article. When PD text is copied into the article, we are claiming its voice as our editorial voice, which means we need to go to other sources to back up claims we make (including claims we make by copying from the text). Dubious statements should be removed, and we can look for other sources as usual. If someone wants to give the PD text a voice of its own, it should not also be copied as part of the article. This applies mostly to #4 above, and also to the last paragraph.
6 seems to go beyond what we actually do on wikipedia; can you give any example of this happening in practice? I don't see the need for it - the attribution template at the bottom remains as correct after heavy editing as it did when the text was originally copied in. Lower down, it isn't true in any way that there is more latitude with GFDL text than PD text. Legally, there is less latitude with GFDL text, but we treat them very similarly in practice. — Carl (CBM · talk) 13:50, 9 July 2008 (UTC)
6's suggestions of a couple alternative descriptions of the use of PD text in wikipedia articles seem very constructive. These could be the basis for several alternative templates that could be used. Accurate labelling of the use of PD text goes a long way towards addressing concerns about plagiarism. Plagiarism, by my understanding, is a discrepancy between the apparent crediting and the actual crediting due, for use of others' material. Labelling in template is part of making clear where and how much credit is due, hence avoiding plagiarism. doncram (talk) 15:19, 9 July 2008 (UTC)
Please let's not get bogged down in arguing "what we actually do in Wikipedia", that has been used in some previous discussions to block change in practices and comes across as a commitment not to ever change practices. Also, practices in Wikipedia vary widely: what is done in WP:SHIPS is very different than generally accepted practices in WP:NRHP, and what was done in the past is not the same as what is done now or what could be done in the future. doncram (talk) 15:19, 9 July 2008 (UTC)
Re SEWilco, I agree that "more care" is too vague. I think that phrase only refers to the need to give attribution; once the content is in the article, there is no additional special care that I know of. — Carl (CBM · talk) 13:52, 9 July 2008 (UTC)
Why are items 2, 5, and 6 different for GFDL text? Sources need to be specified, text has to be edited based upon what the article needs, and if it is necessary to review what has been changed from a non-GFDL source then why not also review what has been changed since a GFDL contribution? -- SEWilco (talk) 04:25, 10 July 2008 (UTC)
I was specifically referring there to "internal-GFDL" changes. The stuff that happens when you rewrite what another Wikipedian has written - assuming that is that the Wikipedian in question has made sufficiently clear the difference between what he or she paraphrased and what he or she copied from elsewhere. When a Wikipedian doesn't do that, then the GFDL-original-paraphrasing-by-a-Wikipedia-editor and copied-in-PD-copyright-expired stuff gets mixed up. My point is that mercilessly editing the phrasing a Wikipedia editor uses is fine - they've agreed to it. Mercilessly editing phrasing copied from elsewhere is less acceptable in my view. It should be either quoted, or paraphrased and completely rewritten. The inbetween state where the phrasing is minimally rewritten, and the resulting text ends up as a joint effort between Wikipedia's editors and the original author, is, in my view, and those of some others (see the quotes below) a form of plagiarism, unless a footnote is used to make clear that the vast majority of the phrasing is copied verbatim from the original source. For an example, look at the footnotes I placed in Anadyr River and Gulf of Anadyr. It is now clear to anyone reading the article that the geography descriptions are from 1911, and that the other stuff was added based on more recent sources. Before, this was not clear. The river length is actually wrong, so I should have updated that, but care has to be taken - sometimes the older measurement is not wrong, just done differently or the river's length has actually changed - without more from the sources, we can't say. Now, this is a good example as far as specific sourcing goes, but not in terms of plagiarism due to insufficient attribution (the template at the bottom is not good enough once the article is far long the road of being rewritten, because it is no longer clear which bits of the text were incorporated verbatim from which source). For that, I will have to find a different example. Carcharoth (talk) 09:43, 10 July 2008 (UTC)
Here is an example of what I consider an acceptable "internal-GFDL" action: [27]. The authorship has been clearly attributed, the date, name and article allow direct verification of the original text, and I have accepted the "merciless editing" caveat that allows that text to change over time. Everything there is fine and living or dead (no I am not a Shakespeare-bot come to wreak havoc), I have accepted those exact terms. Contrast this with Carch's points above, which deal with modification of works by others who never signed up for this project. Franamax (talk) 10:00, 10 July 2008 (UTC)
If the source is PD the author has no rights; we give them the courtesy of granting them credit (and WP:V requires sourcing their facts). The WP contributor who added the PD text has agreed to merciless editing, so there is no ("6") requirement to hunt down the rotting remnants of the original bits... but in practical terms the identification of distinctive phrases to a source might help avoid confusion of editors who think they discover copyright violations from other sources which also reused the original material. -- SEWilco (talk) 01:09, 19 July 2008 (UTC)
Looking back at previous discussions
Thanks to doncram for providing links to previous discussions at WT:CITE. I had been completely unaware of those. I'm going to copy some of what was said in those discussions, as I found myself nodding in agreement (apologies, given the context, for not attributing these quotes to the individuals concerned - all quotes are from here and here):
Quotes from previous discussions
"Keeping track of which text is copied from where by using quotes is part of sensible, normal practices of good referencing / proper sourcing"
"It is fair to label an article having the general disclaimer "incorporates text from" as poorly referenced, given higher likelihood that OR assertions can/ may well have crept in, given no separation between sourced vs. non-sourced material."
"Unquoted copied text simply looks bad, is disappointing and discouraging to come across, and obscures the verifiability (the specific sourcing) of the material."
"It is too easy for someone to paste in whole blocks of PD text without thinking..."
"when a big text is copied to create an article, it is usually not obvious which assertions might require specific support. And it is being copied in by one editor who, perhaps uniquely, believes the PD material is all true, when it may not be. The PD material is given undue credence, it is pasted in and appears to be the received wisdom (behold, it appears in an encyclopedia, it must be true), when in fact the PD source may be outdated, inappropriate, suspect due to bias, and so on which is obscured by not putting that material in quotes."
"But if an article is a total quote from PD but the quoted part is not indicated, and if someone wants to reword a paragraph, how will the reader know what has come from the original source and what may have been altered and therefore may not represent the original PD author?"
"Does that mean that anyone who alters anything in a PD article MUST reference the alteration?" (I would say yes!)
"if readers wish to know what passages remain unaltered from the original, they will just have to buy the original and compare" (this really annoys me, and I disagree strongly)
"...it is daunting to verify a humongous copy/paste article. Easier to start from scratch and find sources."
"Reused PD text is just text donated by an editor." (uh, no, it is PD text - no donation going on anywhere)
"Perhaps you need to conceptualize "we the collective of wikipedia editors" as one big collective author. "We" wrote one big article, say, and then choose to split it, "we" have still written each of them. To the extent that either/both contain text sourced from some PD source OUTSIDE of wikipedia, that can be / ought to be tracked, but you don't have to put in quotes "our" wording from the other article. It is "we" that wrote it wherever it is."
"the point of Wikipedia [...] is that it's meant to be 100 percent our own work, free for others to use. Not 100 percent someone else's work, copied and pasted by us."
"It's a waste of time to rewrite PD text simply for the sake of rewriting"
"If your priority is to get the material into Wikipedia in whatever format — information over presentation — then I can see your point."
"Perhaps some other sort of legacy DANFS template, like "this article was originally based on a DANFS entry, but no longer incorporates any text from that" would allow you to express your appreciation to DANFS while a) taking proper credit for wikipedia editors and b) avoiding muddying the waters about the extent to which wikipedia is merely copied from public domain texts."
"But if you were writing an article and suddenly introduced part of a copied PD text, you would have to say something like: "The X Encyclopedia writes that ..." and then quote them, or place the text in blockquotes, or something to indicate that this had been lifted from elsewhere. That's just common sense, and it would take seconds to add it, so I can't see why anyone wouldn't want to do it."
"Reused text is not a quotation."
"The only thing worse that long copied in quotes, which require lots of referencing, is copied in text that a reader/editor may sense is copied from somewhere else without proper attribution. It is NOT proper attribution to attach a footnote at the end, or even a template that states "text is incorporated from" somewhere. That is the amount of referencing that is appropriate for crediting substantive content to some source. It is inadequate referencing for showing which actual wording is to be credited (or blamed) to a specific source."
"Isn't the box at the bottom of the article explicit attribution of the source?"
"These templates rarely are specific enough. It's better just to use the PD text as you'd use any other. Read it, summarize it, and give a citation. Doing that avoids all these issues." (this is one quote that really stands out for me)
"...too often not credited is the actual writing done, as opposed to the content [...] There is a distinction between crediting the source for content and crediting the source for the writing that I believe is not adequately understood in wikipedia. To credit a source for content, it is enough to provide a reference (perhaps okay to do this in template form that links to the content). However, in my view that needs to be distinguished from crediting the writing, which can be conveyed by use of quotation marks or block quotes. [...] There are different types of plagiarism that are not adequately understood. One type of plagiarism is not giving attribution at all to a given source. Another type is not giving credit for wording copied, although the source is stated. There are other types of plagiarism. [...] it is perhaps especially abhorrent to copy the bibliography of a given source, which falsely implies that the author of a plagiarized work has consulted those sources."
"...it is not difficult to summarize a text and use quotes for exact reproductions. On wikipedia, where so many editors are involved, it make it possible for other editors to freely edit the article without worrying about what is an exact copy and what is not. Granted that summarizing is not as easy as a straight copy/paste, but wikipedia is supposed to be about editing."
"...it is being copied word for word because people don't want to take the time to write it in their own words. Surely that's not what we're here to do."
"The question is twofold: First, should they be doing it? Why not take the time to read the passage and write it in your own words? Failing to do so can lead to some terrible articles if the PD source is POV or badly written, and they often are. Secondly, a problem arises when only some of the article has been lifted in that way. Do we acknowledge the change of style with quotation marks, or just leave the mishmash of texts?"
"For an encyclopedia, however, the sourcing is important in establishing the validity of assertions made, as well as in crediting either the external authors or implicitly crediting the wikipedia editorship as the writers of the material."
"If it's in the public domain because it was written before 1923 it's very likely the article could benefit from a rewrite. If it's a recent technical article written by the leading authorities in the field, which they generously decided to put in the public domain, the Wikipedia editor may not be able to improve on it."
"PD text should be treated the same as prose written by Wikipedians -- edited mercilessly" (a point I disagree with - editing should be done with an eye to the quality of the original text and the credibility of its authors - something that is not generally done with text written by Wikipedians, since there is no presumption of credibility for Wikipedians)
"The point of free content is that we can use other people's free content, if we like, and they can use ours."
"Likewise, it is perfectly fine for our articles to use traditional referencing which puts copied text into quotes, to show full attribution of wording composed by another author."
"A generic attribution template is, in my view, insufficient to identify which passages are explicit quotes from this one source, vs. willing-to-be-anonymous wikipedia editors' wordings of facts available in 2 or more sources, or vs. what might be new original research creeping into the article."
"narrower view of what plagiarism is (as copying with inadequate attribution plus intent to deceive) and a broader view that plagiarism is copying with inadequate attribution"
"a wikipedia article with long copied passages that are not set aside in quotes is plagiarized. An attribution template at the bottom provides some mitigation, but is inadequate. That degree of attribution is appropriate for pointing to a source, but is not adequate to show which passages are copied. The wikipedia article with long copied passages is in fact deceptive to the average reader, who has a reasonable expectation that unquoted passages are written by the collective wikipedia editorship, not by an external source." (I agree entirely with this)
"When an attribution template is encountered at the bottom, the reader gets to do a double-take, and is given the suggestion that all is not as it seemed. The reader is left unclear as to what degree he/she should reevaluate the implied claim: should the reader assume the entire article is entirely copied, should the reader assume hardly any is copied, should the reader go investigate the original source and compare it to the wikipedia article, etc. This detracts from the reading experience [...] and it undermines the public perception of the general quality of wikipedia..."
"I have seen no other editors advocating your position about plagiarism." (hopefully that attitude is changing)
"I just want to calibrate readers expectations about authorship of wording with the true state."
"I don't think this is quite such an open-and-shut case as some make it out to be; while it is our right under the law to reuse free content pretty liberally, it's often not good intellectual practice or very friendly. [...] The attribution template significantly understates the degree to which this text is the work of someone unaffiliated with Wikipedia, and it does not even name that person."
"The attribution being proposed very much misleads the reader about who produced this content. There's a matter of degree involved and the same template might not be appropriate for every case."
"I don't think readers will understand that in this one instance, the footnote means that you've lifted and copied their text. The implication a normal reader will naturally draw - that the text is supplied by Wikipedia authors but based on the source presented - is false. This is just a consequence of the fact that readers who encounter footnotes, quotation marks, etc. on Wikipedia are likely to assume that we are using them in the same ways as the rest of the English speaking world."
"The discussion here [is] about how PD text is used. We use it by dropping it in an article and letting editors chop it up as need be, not by putting walls around it."
"We use PD text, despite flaws, as fodder to start new articles."
"This isn't about copyright law; it's about fairness and intellectual honesty. Your versions of this article, in my view, falsely imply that you are the author of this text. A reader seeing this article would most likely believe that it is the work of Wikipedia users and look to the history to find author information. In fact the contributions made by Wikipedia users to this text are insignificant. Adding citations to the work you are are claiming as your own is in some ways a slap in the face to the author of the piece, because you are deliberately using footnotes in a way not congruent with normal practice."
"That's not crediting her for writing the text, that's acknowledging that the facts came from her article. You use a citation method which indicates that the text is written in your words, not hers; you know this to be false; therefore it is plagiarism by any widely accepted standard. A reader who is familiar with existing standards will probably assume that we are using one of them, and be misled: since you are claiming the text, they will look in the article history to find the authorship, and mistakenly cite you as the author of these words. I do not believe that misleading our readers promotes the progress of anything."
"I think the concept that different intellectual standards apply to free content simply because they have that status is unique to Wikipedia; the copyright status is irrelevant here. [...] In general, I think that rejecting the normal practices of all other modern publications sets Wikipedia apart in a negative way."
"Is it improper attribution to take it out of quotes and mix it around, and just leave a generic template that this article includes text from an unspecified GFDL document?"
Right. That was probably a bit too much quoting for readability concerns, so I put it in a collapse box, but there is a lot to get through here, and I completely missed those two previous discussions. I am going to ask some of the people in those discussions to join in the discussions here, if they will. SlimVirgin, Christopher Parham, and Dank55, as I thought they made particularly good points. Those who are already here should, of course, feel free to notify any of the others who took part in those discussions. Carcharoth (talk) 21:29, 9 July 2008 (UTC)
Requirement to rewrite
If material inside a quotation is inaccurate or out-of-date, the entire quotation must be replaced with rewritten text.
No. An example in the article is that Jason will measure within 15 miles of shore. If some clever fellow figures out how to use Jason data to measure within 5 miles of shore, the existing text could be modified to include that new piece of information. If the existing text is not PD then care has to be taken with the quotation marks. If the existing text is PD then if the original were "will measure within 15 miles of shore", possible alterations within the rest of the text might be: "was designed to measure within 15 miles of shore but was able to reach within 5 miles[1]", "got data within 15 miles of shore but Doe extracted information to one-third that distance[1]", etc. -- SEWilco (talk) 01:41, 19 July 2008 (UTC)
I'm not sure I follow this. To me, material within quotation marks is inviolate. It is quoted text and must be preserved as is, with the stated exception of wiki-formatting. I routinely revert changes to text inside quotations which no longer match the quoted source, regardless of free-ness of the source. A quote is a quote.
I would however add to this that if only a portion of the quote is inaccurate or out-of-date but the substantive meaning remains correct, the proper action would be to immediately contradict the quote in the following text, so in the example the quote would be immediately followed in plaintext by "Clever fellow extended the Jason data to allow precision of 5 miles[1]" or some such. Same for inaccuracies, if there happened to be a direct quote from the dude who looked down wells long ago to measure the Earth's diameter, the direct quote "Earth's diameter is thus 431,300 cubits", this should be immediately be followed in regular text by "modern measurements of the Earth's diameter show it to be 8,226 mi (426,120 cubits)[2]" - as opposed to changing the Greek guys number.
And if the entirety of the quoted material was just the statement of Earth's diameter, then it would be necessary to completely rewrite as "old Greek dude looked down wells and determined a diameter of 431,400 cubits[3], the modern value is 8,226 mi (426,120 cubits)[4]". Franamax (talk) 03:03, 19 July 2008 (UTC)
Material within quotation marks is inviolate, but the example is within a section about public domain material. If the quotation marks are removed, the text can be updated. There is no need to mandate that it must all be rewritten. If the text is PD then the quotation marks are only mandatory if you're stating exactly what someone said; if the text is not PD then some text might have to be kept as quotations — but that depends upon how an editor blends the corrections with the material. Let the editors edit, don't order them to rewrite a piece of text. -- SEWilco (talk) 16:03, 19 July 2008 (UTC)
So, an encyclopedia...?
Wikipedia is an encyclopedia. Copying the works of others and presenting them as your own is not acceptable practice here.
I don't see anything in the definition of an encyclopedia which involves how the material is created. Maybe instead of "an encyclopedia" there should be mention of being a collective work? But that doesn't necessarily have a connection to copying. Should the first sentence be removed and just use the second sentence, which is long phrasing that means plagiarism is not acceptable practice in Wikipedia? -- SEWilco (talk) 01:52, 19 July 2008 (UTC)
If you choose to read those two sentences as premise-conclusion based on the word "encyclopedia", you may well have a point. Try an alternate reading: the first sentence presents an aspiration; the second sentence presents an ideal that may be common to many aspirational works. You are making a conclusion that because the preceding sentence states that Wikipedia is an encyclopedia, the following sentence refers to "encyclopedia" rather than "Wikipedia". Even so, can you show an example of a (reputable) encyclopedia which has ever been based on plagiarism? I'm not saying that no such has ever existed. My original first sentence was "Wikipedia aspires to be a scholarly work.", which you justifiably tagged. Carcharoth then changed it to the current wording.
The first sentence might have to go, but considering that sentence alone, is it correct? And second sentence alone, is it correct? If so, then the problem must be in the logical inference you draw, so maybe the inference is what should be addressed.
The intent is to provide an introductory aspiration i.e. what sets Wikipedia apart from other websites where it is perfectly acceptable to copy without attribution? What makes us different? This is key to aid the understanding of the new reader of our entire basis of operation. It's a point of confusion for many, many new editors - why exactly shouldn't we plagiarize? And this proposed guideline is where we must answer those questions. Franamax (talk) 02:28, 19 July 2008 (UTC)
This sentence is baseless and useless. Again, unless an editor explicitly states he wrote some text--which is a rare occurrence--, no one ever even has the opportunity to "present the works of others as his own"; and "presenting the works of others as your own" is irrelevant to why plagiarism is bad on the encyclopedia: sourcing and copyright. —Centrx→talk • 17:38, 19 July 2008 (UTC)
"as your own" vs. "as written by wikipedia editors" or other alternatives
Centrx's comment seems overstated but I agree there is a distinction to be made, and the draft guideline seems to err a bit in terms of suggesting that the problem is individuals seeking credit for themselves individually. It seems to me that editors who copy text into wikipedia without adequate sourcing are not often seeking individual glory. The problem as i see it, is that copied in text appears to written by the collective of wikipedia editors (implicitly claiming credit for themselves as a group), rather than properly attributing the wording, as well as the substance, to the original authors. I am proud of the editing work I have done in wikipedia, in a general way that I contributed in a big collective effort, and I do feel the wikipedia being out there is a presentation of the work as being "our own", broadly. Anyhow, I think some rewordings in several places to tone down the "as your own" type accusations would help this guideline. doncram (talk) 18:17, 24 August 2008 (UTC)
I agree absolutely that a distinction should be drawn between original text written (and rewritten) by Wikipedia editors (based on reliable sources of course) and text copied in from elsewhere. The difference is that the former process goes from "source" (PD or copyrighted) -> "editor's brain" (paraphrase and pick apart the source) -> "wikipedia page" -> "merciless editing"; whereas the latter goes from "PD-source" -> "wikipedia page" (copy directly across) -> "merciless editing". The two processes appear the same, but there are fundamental differences. Not quite sure how to put that in the proposed guideline though. I'm still wary, though, because of the sometimes vehement opposition that this proposal received, despite others welcoming something separate. Still very confused about that. What is the best way forward? Carcharoth (talk) 20:02, 24 August 2008 (UTC)
I commented within this featured article review about seeking to avoid the appearance of plagiarism and putting some value on avoiding different treatment of DANFS public domain text. I hope i expressed it better this time; my comments in a couple other ship article reviews seemed to anger editors about any suggestion they were not doing the right thing, more than convince them of anything. Perhaps others would like to comment there also. doncram (talk) 21:54, 25 July 2008 (UTC)
FYI, it all seems to have been worked out there, eventually, although some confusing (to me) discussion ended up getting moved to the Talk page of the review article. Franamax responded to this request, and was very helpful there. TomStar81 was knowledgeable enough about the specific reliance upon DANFS sources throughout the article, in order to determine that there were no remaining specific passages of DANFS text in the article, so the generic DANFS disclaimer could be removed. Thanks again to Franamax! doncram (talk) 07:20, 26 July 2008 (UTC)
Close paraphrase
I have recently participated in some discussions that show the need for a clear guideline on plagiarism. One of the issues that came up is the use of close paraphrasing from cited sources. Of course, giving a footnote citation does not allow me to use the text of the source as my own. Something more is needed, such as quotation marks, or a block quotation. I believe that this principle holds not only for exact wording, but also for structure, arrangement, and selection of facts. This makes it unacceptable in my view to copy text from a cited source into an edit window, revise it into a close paraphrase, and click "save" to add it as article text. Doing so is potentially a copyright violation, as there have been successful lawsuits against close paraphrases. I think a good general principle is that a footnote citation is only sufficient in itself to show where facts or information came from; anything that borrows the original author's voice, whether using precisely the same words or a paraphrase, must say so within the text. --Amble (talk) 00:53, 26 July 2008 (UTC)
The problem with both footnote citations and citing a source within the text is that someone else may eventually come along and remove the citation. Or in the opposite case, the citation may stay in even after the passage has been rewritten half a dozen times and now says something completely different than the supposed source. The mass editing nature of Wikis does not lend itself at all well to traditional sourcing of information. Since quotations are generally 'fixed' we can handle them fairly well with either footnotes or references in the text. Paraphrases beyond the length of a single fact / sentence, which can just be referenced to the source normally, should be avoided. Longer paraphrases inevitably start to fall into the realm of plagiarism and/or copyright infringement, have the aforementioned problems with future evolution of the text, and thus would probably be better handled as quotations. In an academic setting with a fixed author or authors long paraphrases can be handled by properly attributing the source. On a wiki that only holds true until the next editor comes along. Therefor, I don't think we should do it at all. --CBD12:11, 24 August 2008 (UTC)
Is altering a sentence from the source allowed? It's not verbatim since it's only one sentence. Now how little change is minimal? 10% 20%? Thanks for your feedback.
Bobby fletcher (talk) 06:37, 27 October 2008 (UTC)
It is not uncommon for individuals with other cultural backgrounds to have a different perception of plagiarism than Americans (especially in academia) do—I know it comes up relatively frequently at my university, where international students have to adjust to the different expections. I'm in no means saying that to disparage non-American users; rather, it's just to point out that differing backgrounds and expectations can lead to misunderstandings. Perhaps this policy page could benefit from having a section explicitly outlining the differences between American rules about plagiarism (I assume those are the rules under which the Wikimedia Foundation operates) and other rules around the world, for the benefit of international users. —Politizertalk/contribs06:42, 27 October 2008 (UTC)
Commenting on the specific example, if I understand the dialogue - copying a sentence verbatim and excising a clause from the middle of it is still copying. A good question to ask is: how much of your article writing is done with the mouse (copy/paste/cut) and how much is done with the keyboard? Unless you're putting it in quotes or references, you should be using the keyboard. Franamax (talk) 07:07, 27 October 2008 (UTC)
(after e/c) That's hard to comment on without an example. If you read "the sky is blue" or "quadrupeds are animals having four feet", it's very hard to reword when you write the article here. On the other hand, if it's a long sentence written in a distinctive style, then just changing "and" to a comma or semi-colon is the wrong thing to do. Also, if the sentence is within quote marks (i.e. a direct quote), changing the wording would be frowned on, unless you put your "new" word inside square brackets. For instance, if you were quoting "They lnaded the ship", you could show the quote as either of "They lnaded [sic] the ship" or "They [landed] the ship".
And Politizer, can you expand a bit on what the differences around the world are when it comes to plagiarism? Perhaps we can improve, but we'll need some more information. Thanks! Franamax (talk) 06:50, 27 October 2008 (UTC)
I'll see if I can find some more specific examples. Right now I'm just talking from personal experience, and of course in a university setting it's hard to get a hold of details regarding plagiarism issues because of students' privacy, but I might be able to get my hands on some more general information. —Politizertalk/contribs07:03, 27 October 2008 (UTC)
Franamax, I can only go by what's currently stated here:
1) The sentence altered is a list of exemption examples
2) It is common knowledge these exemptions exits, as the article in question have already stated this fact
The plagiarism wiki also mentioned "verbatim", "minimal" - 25% of the sentence was excised. This is my original question - what constitute minimal change here?
Also, here's a quick servey of the topic in .edu domain:
I don't think you're working with a good definition of "altered" here. Leaving a part out but copying the rest verbatim doesn't really count as an alteration, rather it's an omission. Using different words than the original is an alteration. Franamax (talk) 20:16, 27 October 2008 (UTC)
Fair enough, let's try the same test again, using definition you suggested:
1) One sentence with 25% omission fails the "verbatim from the source" test
2) One sentence with 25% omission fails the "Substantive length from the source" test
I hope this is accetable in refernce to the above Skidmore plagiarism guidline. Also it seems Wikipedia's guidline on plagiarism is less than academic standards. If you check the Google servey cited, paraphrasing is considered plagiarism, but it's okay on wikipedia.
I don't know where you're getting your understanding of "verbatim from source" or why you're hanging on so desperately to this ridiculous 25% thing, but as other users have already said, merely removing a few words from the middle of a sentence does not count as your having altered the original. Let's put it this way: the original sentence was XXXYYYZZZ. Instead of putting XXXYYYZZZ in the article, you put XXXZZZ. That doesn't mean that you didn't copy XXXYYYZZZ; it just means you did copy XXX and ZZZ.
Even if you do think you "only" copied 75% of the original text and you altered the other 25%...well, to put it simply, at any legitimate American institute you would be expelled for copying 75% of someone else's work. —Politizertalk/contribs22:17, 27 October 2008 (UTC)
Oh, and PS, since you so love shoving guidelines in people's faces without understanding them...here is the third thing listed from the very Skidmore reference that you just listed: Excision--Material is copied verbatim from the source with one or more words deleted from the middle of sentences..
Please show where in Wikipedia does it forbid "excision"? Let me again remind you the text I used met wikipedia's standard per your cite; please see the "what is not plagiarism" seciton.
Also, when the Skidmore guideline stated "Substantive length from the source", it is in reference to the entire work, not just one sentence.
Further, not all of this applies in Wikipedia. For example, a commone guideline of "paraphrasing" in academia doesn't apply to wikipedia.
Quit citing Wikipedia:Plagiarism to defend your actions. It says at the very top of the page that the guideline is still under construction and can't be cited as policy. Seriously, stop it.
What you're saying about "substantive length from source" being in reference to an entire work is just something you made up; nowhere on that Skidmore page does it say that plagiarism only applies to copying of entire sources. See the bolded text I cited above: it doesn't say anything about "substantive length." Again, this is not the first time you have made things up and put words in other peoples' mouths, and it is extremely disrespectful of you.
Copying other people's text is copying other people's text, no matter whether you copy hundreds of sentences or just one; it's still someone else's work. Just because the sentence was short doesn't mean you can do whatever you want with it.
I don't know why you're still fighting over this. Your reaction has been completely out of proportion. If I had been trying to get you blocked or something, then you would be justified in defending yourself like this; but no one here ever said they intended to get you in trouble. I only posted the original message to you as a warning. Your reaction has been the most ridiculous and unnecessary reaction I've ever seen to an innocent and informative warning.
If you want, you can keep fighting over this, perpetuating the argument, and bringing more people in to tell you that you were wrong. But you'd be much better off just saying "ok, it was one mistake, and it won't happen again." Nothing bad will happen to you if you simply drop this; no one is trying to get you in trouble. And the only thing that will happen if you keep fighting over this is more people will get involved and tell you about why what you did was wrong, so I can't imagine why you would want that to happen. You'd really be better off just letting this drop. —Politizertalk/contribs22:46, 27 October 2008 (UTC)
(unindent) Anyway, please do the rest of the editors here a favor and move this discussion back to your own talk page (or my talk page, if you prefer). The discussion is not constructive to this article, and I don't think the other people watching this article want to have to hear you and me going back and forth. If you want to continue this argument, you are welcome to do it at your talk page or mine. —Politizertalk/contribs22:51, 27 October 2008 (UTC)
I disagree. You are the one who cited the plagiarism wiki. The only reason I objected to your warning is because it makes no sense. You mixed up copyrigth and plagairsm, and all the cites I'm working with came from you. Which is it? If it's copyright it allows fair use. If it's plagiarism wikipedia have it's current state of rules. I've on more than one occasion offered to go to the admins, and I'll again offer it here in front of everyone. Bobby fletcher (talk) 23:22, 27 October 2008 (UTC)
The following is a proposed Wikipedia policy, guideline, or process.
The proposal may still be in development, under discussion, or in the process of gathering consensus for adoption. Thus references or links to this page should not describe it as "policy".
Therefor, could plagiarism be used as rationale for edit/remval? Copyright warning? The beginning of the article also references copyright, but copyright allows fair use. How that that relate to pllagiarism when it comes to Wikipedia?
Yes, plagiarism is always grounds for editing or removal. Plagiarism is in the realm of intellectual property, which is different than commercial copyright. You can't make "fair use" of other people's writing.
Finally, regardless of what official Wikipedia policy is, anyone with a whit of common sense will tell you that plagiarism is bad, that everyone hates plagiarism, and that sources that plagiarize have no credibility. If people want Wikipedia to remain (or become, depending on how you see it now) a valued resource, then plagiarism should never be accepted here. —Politizertalk/contribs01:16, 28 October 2008 (UTC)
Please cite any official wikipedia policy relating to this. I disagee with your opinion. Common sense tells me modifying one sentence in accordace to proposed policy at work is not violation of any in-work wikipedia policy. Bobby fletcher (talk) 01:30, 28 October 2008 (UTC)
Look at Wikipedia:Use common sense. There doesn't need to be a "policy" telling people not to do things that are stupid and disrespectful. (Plagiarism is stupid and disrespectful). What you are doing is trying to follow the letter of the law and not the spirit of the law. —Politizertalk/contribs01:33, 28 October 2008 (UTC)
Wikipedia's in-work policy on plagiarism doesn't support, and is compltely irrelevant to, your academic notion of plagiarism. Common sense would say an editor should not bite other editor with plagaiarism accusation that's fuzzy and hard to prove. Instead, fix it using copyright rule that is already official. You never gave me an opportunity to simply attribute the work that already have a reference cited. That would've been an equally valid, non-accusatory, palatable, resolution. Bobby fletcher (talk) 02:08, 28 October 2008 (UTC)
"Hard to prove"??? Look at the text you inserted:
Source: Official policy recognizes practical difficulties as justification for sanctioning second children - for example, when a first child is mentally or physically handicapped, or is female, or when the father is a disabled serviceman, or when both parents are single children.
Your edit: Official policy recognizes practical difficulties as justification for sanctioning second children - for example, when the father is a disabled serviceman, or when both parents are single children.
The plagiarism there is as clear as the light of day. Other editors have also agreed on this. You just keep trying to find new loopholes to make it look as if I'm the bad guy here.
As for attributing...if you wanted to attribute the work and put it in quotation marks, you should have done that when you first added it to the article. And I doubt you would have done that anyway, as it took you a day to even admit that it was plagiarism (yesterday your defense was "but it wasn't plagiarism," and only now have you changed it to "ok it was plagiarism, but WP doesn't have a policy on that"). —Politizertalk/contribs02:13, 28 October 2008 (UTC)
Quotes around it and a simple attribution would've made it legit on WP. So is it mis-attribution or plagiarism???
If you want to stick to your academic notion of plagiarism, your "fix", paraphrasing, is still plagiarism. This is not relevant to wikipedia's in-work policy re plagiarism, please see "what is not plagiarism" seciton.
Let me put your doubt to rest here in the open. Had you not being such an a$$ and gave me an opportunity to properly attribute it, I would be entirely thankful and do so without hesitation. Instead you throw around plagiarism, when the fact is Wikipedia's in-work policy on this isn't the case at all.
If you are quoting something, you need to use quote marks and attribute the source. If you are rewriting something in your own words, you may still need to explicitly mention the source you are paraphrasing from, and you definitely need to cite the source of the information in your paraphrased sentence. Just copying something from somewhere and doing nothing to indicate where the text, the construction of the text, or the information in the text has come from is not acceptable. Is that clear enough? Carcharoth (talk) 02:47, 3 November 2008 (UTC)
Userbox?
I got bored and played around with making an anti-plagiarism userbox. Here it is, if anyone wants to improve it or to use it:
This user removes copied and plagiarized material on sight and will never apologize for doing so.
I'm thinking I should try to find a way to get a link to this project page worked into it. (I tried having the image link there, but {{click}} doesn't seem to work inside userboxes.) —Politizertalk/contribs02:03, 16 November 2008 (UTC)
Well... one of the options for plagiarised material is to rewrite it yourself, so it's not completely cut-and-dried as to simple removal. We haven't really thrashed out here what exactly to do about plagiarised material, nor what to do about serial plagiarisers. (Although it certainly seems that just linking this page is sufficient to send people into fits :( ). Franamax (talk) 02:22, 16 November 2008 (UTC)
True; that's what I've done with some material I find (especially when it's just a couple sentences, such as the issue that sparked the big fight in the section above). It's generally when I come across an entire multi-paragraph section or article that I get mad and remove it (especially if it's an article that I have little interest in and don't want to take the time to clean it up) with a note on the talk page explaining where the material was taken from. In a perfect world I would be able to sit down every time I find plagiarized material and turn it into good prose with references back to the source, but generally I don't have the time or attention span (and I get indignant too easily). I know a lot of people object to simple removal of plagiarism, which is the reason for "will never apologize for it" in the box.
Yes, that's another piece of the puzzle on "what to do about it": the preferred option is to just rewrite the offending material yourself, especially if it can be easily done. For more complex passages or in topic areas with which you're not familiar, plagiarized material can be removed from the article - however, this should only be done if you additionally leave a note on the article talk page pointing to your removal and describing your rationale. This is necessary so that editors more familiar with the article can rescue the material themselves. Now if only that could be said in half the word-count! :) Franamax (talk) 02:49, 16 November 2008 (UTC)