Wikipedia talk:Provenance

Jump to navigation Jump to search
WikiProject Essays  
WikiProject iconThis page is within the scope of WikiProject Wikipedia essays, a collaborative effort to organise and monitor the impact of Wikipedia essays. If you would like to participate, please visit the project page, where you can join the discussion. For a listing of essays see the essay directory.
 Low  This page has been rated as Low-impact on the project's impact scale.
Note icon
The above rating was automatically assessed using data on pageviews, watchers, and incoming links.
Welcome to the discussion

This article has been moved here from Provenance at the suggestion of another Wikipedia user.--Pseudo Socrates 20:21, 20 August 2005 (UTC)

From Wikipedia:Village pump (proposals)[edit]


I would like to propose a button on the history page of each article that would provide the provenance (i.e. author) of each section of text in an article. There are various user interfaces possible for this functionality. One simple way would be to have a quasi-footnote for each section of text that would link to a table of authors at the bottom.--Pseudo Socrates 19:29, 16 August 2005 (UTC)

Except that each section, like the article, could have been written by a zillion different people over time... ~~ N (t/c) 19:35, 16 August 2005 (UTC)
Yes, indeed. I meant to say each interval of text by a single author whould have a link to the table of authors at the end. An interval could conceivably be as short as a single word or as long a whole article.--Pseudo Socrates 20:44, 16 August 2005 (UTC)
This is an interesting idea, but I'm afraid there is just no good way to do it. Really. To illustrate the problem, suppose an article says, "q q q", and you edit it, making it "q q q q". Now, which "q" is yours? Again, I'm just illustrating the problem; I don't mean this as a serious question. If you solve that one, there are more complex ones that wouldn't work. Doing a "difference" operation on two pieces of text the "right" way every time is simply impossible. Yes, MediaWiki includes code to do such an operation; that's what the "Show changes" button does on edit screens. And sometimes it gets it wrong. When you're just giving someone an idea of what they've done, it's no big deal, but if you're assigning authorship (with varying licenses!) for all the world to see, it becomes a more serious matter. And if we can't get it right, we'd better not do it. Sorry. — Nowhither 06:44, 17 August 2005 (UTC)
That's easy. All the q's are yours. Wikipedia diffs are already line-based. This is a feature that exists in collaboration software like Subversion (svn blame), so it's short-sighted and wrong to say it's not possible. I think the feature would be a good idea. RSpeer 06:39, August 22, 2005 (UTC)
Provenance has been an issue for the Wikipedia since its inception. Numerous authors have pointed to it as a principle deficiency of the Wikipedia. It is difficult to believe that provenance will not be provided once people realize that it can be done. Of course, it will not be perfect. But that will not prevent it from happening. As far as I can determine from trying it out, the "Show changes" button on the MediaWiki does not provide provenance.--Pseudo Socrates 22:38, 17 August 2005 (UTC)
Three points: First, you said, "... once people realize that it can be done." My point is that is cannot be done, at least not without some change in the edit procedure, requiring editors to mark the things they have added as their own. Second, you are right, "Show changes" does not provide provenance. Because it cannot. Automated determination of authorship, given only the "before" and "after" texts for each author is an impossible task. Third, you said, "Of course, it will not be perfect." You are talking about legal licensing, responsibility for textual accuracy, etc. If we are going to do this, it had better be perfect. If it is not perfect, then you are assigning authorship to the wrong person, possibly licensing with the wrong license, etc. Sounds bad. — Nowhither 14:05, 18 August 2005 (UTC)
Of course what I have proposed can be done. With regard to the legalisms, a notice can be placed at the bottom of a dynamic provenance page as follows:
  The name of each link above was derived from the second
  column source (login name or ip address) of the history
  page of the article for which this
  page was produced.  Clicking on a link will produce a dynamic
  page that shows a version of the article in which the text
  following the link appears.  Of course the source may not be
  the author of any of the text in a article that results from
  their edit.
Note that there is no imperfection involved in the above legalism. Also each provenance page could bear the standard Wikipedia footer which states: "All text is available under the terms of the GNU Free Documentation License (see Copyrights for details)."--Pseudo Socrates 14:48, 20 August 2005 (UTC)

User Interface[edit]

One possbile way to provide provenance would be to place the link before the text. The name of the link would be the author name followed by a colon. The link would point to the version in which the text was introduced. Initially it might be better to place the provenance button on the history page for an article rather than on the article page.--Pseudo Socrates 22:38, 17 August 2005 (UTC)

Whoa, are you proposing a block of text look like this?: Dog's are fun to provenance play with. provenance All breeds are dogs. Rover is a name for dogeprovenance, so is spot. That is too ubofficial to be worth it. Maybe a history for section edits, that would be good. Howabout1 Talk to me! 01:45, August 18, 2005 (UTC)
What is a "history for section edits"?Pseudo Socrates 02:12, 18 August 2005 (UTC)
Next to the edit section tabs there would be a history button for all the edits made only to that section. This is related to below.Howabout1 Talk to me! 02:40, August 18, 2005 (UTC)
A history button for all edits made only to a section sounds like an excellent idea.Pseudo Socrates 03:01, 18 August 2005 (UTC)
You didn't answer my question either, would it look like the text above, with evet didnprovenance'provenancet, should someone make a mistake. I am sorry if I have been rude. Howabout1 Talk to me! 02:40, August 18, 2005 (UTC)
Pressing the Provenance button for an article might produce a version of the article that would include something like this:
"Pseudo Socrates: This text was authored by Pseudo Socrates."
where the above link Pseudo Socrates: would be to the version of the article in which "This text was authored by Pseudo Socrates." was introduced into the article.
Pseudo Socrates 03:21, 18 August 2005 (UTC)
I'm sorry, you really are proposing having a hundreds of links in the text? That is ludicrous, my friend. This is an encyclopedia. That would drasticly affect the readability of the site. I also appreciate your moving of my text, it is placed better now. Howabout1 Talk to me! 03:25, August 18, 2005 (UTC)
Yes, indeed there could be a lot of links in the dynamic page produced by pressing the Provenance button. However, all of these links would be to previously existing versions of the article that already exist on the Wikipedia.Pseudo Socrates 03:37, 18 August 2005 (UTC)

It doesn't matter where the links go, it would make the text hard to read. Above all things, this is a source for information. Are you saying making a different version of the page that could be obtained by clicking on one link at the top? Even if that is your idea, I really hate this idea. The information is important as a whole. And again I say, you are the only one who wants this. Howabout1 Talk to me! 23:37, August 19, 2005 (UTC)

It seems that Rspeer also thinks that making provenance readily available is a good idea. (See above.)--Carl Hewitt 13:31, 23 August 2005 (UTC)

Temporal Provenance[edit]

Let's call the above proposal author provenance. There is another kind of povenance having to do with the time that piece of text was introduced into an article that could be called temporal provenance.

For example, temporal provenance could be implemented by providing a Provenance button which when pressed would display a dynamic page with the vintage of each interval of text indicated. E.g., text less than 24 hours old could be displayed in red font, text older than 24 hours but younger than a week could be displayed in green, and older text could remain in black.

I would like to amend my proposal to be that we implement temporal provenance instead of author provenance.Pseudo Socrates 13:32, 18 August 2005 (UTC)

Tom Cross' "Puppy smoothies" proposes measuring number of edits, not time. Obviously, the two approaches could be combined.

Social Effects[edit]

Providing provenance is likely to have important social effects--positive and negative. You might think that these effects could be avoided by simply voting down the inclusion of provenance in the Wikipedia. However, if some people strongly desire provenance and its inclusion is voted down then it is likely that provenance will be provided via a browser plug in. If this happens then there will be two classes of user of the Wikipedia: those with the plug in and sufficiently powerful computers to quickly compute provenance and those without. Also once one Wikipedia plug in is created it is likely that others may quickly follow. In this way the community could lose significant control over the user interface of the Wikipedia.Pseudo Socrates 01:42, 18 August 2005 (UTC)

First of all, this strikes me more as a description of a technical fix to the social problem that may arise given some people wanting this and some not. So before even dealing with the technical issue, lets deal with whether it's a good idea from a social standpoint. I think it's a bad idea because an article is really a joint effort. Just because I put in a short phrase today after someone else has put in a paragraph, doesn't mean that their input did not lead to mine and visa versa. I stand on the shoulders of giants, and so forth. Editing is really an interactive joint enterprise and should not be divided up into miniscule fragments, that will not really reflect each's real contribution. It seems to go against the spirit of the way articles have been developed in the past, where we're all in it together and no one lays claim to the work.
Secondly, if it is divided up into miniscule fragments with individual ownership of each comma and tilda, it will affect the process of cooperation on the article, maybe adversely. It allows some users to take prominence over others, which again goes against the spirit in which wikipedia operates now.
And thirdly, you still don't really know who contributed each segment because people can choose any user id they want.
So I'm really against it, and at the very least, strongly suggest that it should be taken up for a vote before it's adopted.
Fsm 02:17, 18 August 2005 (UTC)
You (User:Pseudo Socrates) seem to be the only one wanting this, yet you seem to say this as something strongly desired and inevitable. This isn't wanted or needed should a vote turn up many supporters, well lets do it. I have thought that it would be nice to find out who wrote each block of text, but as is said above, it goes against the colaboration of the wiki.Howabout1 Talk to me! 02:40, August 18, 2005 (UTC)
It seems that Rspeer also thinks that making provenance readily available is a good idea. (See above.)--Carl Hewitt 13:32, 23 August 2005 (UTC)

Xiong's Goggles[edit]

There are no individual authors of individual segments of an article. I'm not speaking of some vague collective ownership. I mean to say that the entire text of each article is the entire responsibility of the last editor.

Once I push "Edit this page" and do anything, and save the result, I have assumed responsibility for and authorship of the whole thing. If I don't endorse what existed before, I don't retain it -- I delete or rewrite it.

I don't see that anyone is entitled to weasel out of this responsibility; the very furthest I might go in that direction is to say that, since section editing is allowed, then perhaps I only assume responsibility for the entire section in which I edited.

But should anybody, anywhere ask "who wrote that" about any article, I say there are only three good answers, and none of them link to individual chunks of text:

  • Jointly, by the Wikipedian Community
  • Jointly, by the editors listed in history
  • Individually, by the last editor to touch the page

For purely internal purposes, I might notice that Somebody 7 edits back made such-and-such a change, and although it was retained by subsequent editors, if I wish to question the change, I might well contact that Somebody.

But if the question is accuracy, credit, or blame in the outside world, I say that from one angle, we all share; from another, all share who edited; from yet another, the burden rests firmly on the last editor.

Think about that when you touch a page. — Xiongtalk* 23:35, 2005 August 22 (UTC)

  • Interesting suggestion, but what about truly minor edits? Suppose I'm reading an article on a subject that I know little or nothing about, and I notice a typo. If I correct that typo then, under your suggestion, I am assuming responsibility for the totality of that article (or section,) even though it is a topic I don't know much about. If you write an article and accidentally type "teh" instead of "the", and I fix your typo, should I really be blamed for inaccuracies in what you wrote? Surely that would just discourage copyediting, which would make Wikipedia worse. AJR 02:12, 28 August 2005 (UTC)
According to WP:HEP, that is the meaning of the minor edit flag: don't mind me, this is just a small copyedit; I really have no comment on the content of the article/section. Otherwise Xiong is correct -- you touched it, you bought it. The "provenance" of the article is the author of the last non-minor edit to each section. Fool 17:34, 15 November 2005 (UTC)
  • While this is a admirable guideline for editors to follow, I feel it is contrary to the guideline Be Bold, which I believe is a hallmark of Wikipedia. There are numerous edits that, while not a minor edit like a typo, are tweaks to the current article. These may be minor additions, clarifications, rewordings, changes to layout, etc. that may even be made by someone with little or no knowledge of the subject, but improve the article for all users. As an example, a recent edit I made involved adding a short paragraph at the beginning of the History heading of Accountancy and a rewording of the opening part of the next paragraph; I didn't have the time to research or edit the entire section, but felt I had something important to add. As a new user, I then looked around and found the history page; it turns out the rewording I made was to materials quoted from the Project Gutenberg Encyclopedia, but the citation information has been deleted many revisions ago. Using the last editor is the responsible editor concept I am now "responsible" for this significant breach of encyclopedic standards, when all I wanted to do was correct wording that was questionable ("The art of accountancy on a scientific principle ..."). So, I addressed that on the Talk:Accountancy page now, and plan to do further editing as time permits me to in the future. Should I revert to the old version or delete all the now-uncited information until I can thoroughly address the entire section (as your guidelines would suggest), or do I leave the new version and allow other users to decide if my changes are beneficial, harmful, innocuous, misleading, etc. I choose the latter option; while this muddles the concept of "provinence" significantly, I believe it more closely reflects the underlying philosophy of a collaborative work. --EMU CPA 11:30, 6 January 2006 (UTC)
Still, you not only flagged the problem on the talk page, but also plan to do further editing as time permits. This suggests that you have, in fact, taken responsibility, in the sense that I think is most important. Plus, all philosophy of collaboration aside, in the case of simple provenance below for example, I would still want your name on the section header. Fool 15:57, 6 January 2006 (UTC)


if I understand this correctly, the proposal is for nothing more, and nothing less, than a useful editing tool. Meaning that it will save you time when looking for vintage vandalism. For the reader, it wouldn't make any difference. For the reader, all that matters is WP:CITE, not which IP added which comma. The concept may be useful for some contexts, but with extremely frequently edited article, the concept breaks down. The diff algorithm is not capable to decide which stopword originated with which edit at some point. Baad 08:56, 1 November 2005 (UTC)

See the Controversies section of this article's project page.. Note that despite all that, I think source provenance is a great idea. Like you are saying, it would help Wikipedians find out vandals' usernames. --Unforgettableid | Talk to me 10:44, 25 November 2005 (UTC)
References and some sort of assurance that the references are reliable, relevant, and that the text of the article is in accordance to them. This is sort of the point, I guess. If the reader was going to go check up on the references emself, e wouldn't need Wikipedia in the first place. Fool 21:31, 15 November 2005 (UTC)

Simple provenance[edit]

What about something like this:

TITLE [ Last major edit: INTERVAL by AUTHOR ]





SECTION HEADING [ Last major edit: INTERVAL by AUTHOR ] [ edit ]



SUBSECTION HEADING [ Last major edit: INTERVAL by AUTHOR ] [ edit ]



AUTHOR would be the last one to submit an edit (without the minor flag) to the article/section/subsection, and would link to that user. INTERVAL would be something like "1 month ago", "3 days ago", or "23 secs ago" and would link to a diff of that change. Fool 20:56, 15 November 2005 (UTC)

Professional or expert recognition[edit]

I think there should be a channel through which professionals in some fields could be recognized as such and noted experts could be recognized as such.

For instance, a doctor could request status as a medical professional. This would not give him any additional privileges, it'd only give other Wikipedians warning that this guy knows what he's talking about and his contributions should be respected. This can be written on a user page of course, but outside validation would ensure that it's true and I think would help control a lot of edit disputes. A similar system should be produced for "experts", noted researchers or something else in fields where there cannot be professionals; e.g., there are no longer any samurai, but if a Wikipedian has had his work published, he can be recognized as an authority on samurai.

The practicality of this, with the current setup of the Wikimedia Foundation and MediaWiki software, is questionable. It would probably require some implementation in the MediaWiki code to prevent unauthorized usage of tokens indicating one's authority, and unless we can find a way to perform validation through the internet, it would probably require some employees to investigate credentials. I'm not really sure if it's worth that hassle, but it'd be cool. Cookiecaper 22:41, 6 January 2006 (UTC)

I agree with Cookiecaper on every point apart from "it'd be cool". I wouldn't like the idea anyway; I have absolutely no credibility on this sort of count, in fact, John Seigenthaler Sr made a remark about teenagers that describes me very well. Does this mean that I know nothing and can therefore not contribute? No, it means that I am percieved to know nothing, even though I may be able to help contribute hugely. Have a quote:
  • Now a picture of the body behind the "Hive Mind" of "collective intelligence" begins to take shape. He's 14, he's got acne, he's got a lot of problems with authority ... and he's got an encyclopedia on dar interweb. Watch out.

Find it here —The preceding unsigned comment was added by The1exile (talkcontribs) 12:51, 23 February 2006 (UTC)

Passage Histories as Provenance[edit]

I find that when I'm editing a section, I'd like to know its history: How did it get to the present state; will I be reinventing the wheel with my planned changes? Of course I could toggle through the whole history of the article, but when there are 200+ edits, that becomes unreal.

What would deal with my problem, and with the provenance problem, would be an improved history page that could identify only those changes within sections (or ideally, within arbitrarily selected passages). With that you could step from version to version of a section, identifying the changes with the diff function.

Of course there are real software problems here. Following a section as it was edited, rearranged, divided into several parts which were later rearranged represents a real challenge. (Consider how easily the diff function gets lost with just a single edit.) Despite the problems (says someone who doesn't do programming) it would provide a valuable editing tool, and a way to identify the provenance of a crackpot (or brilliant) idea in that passage. --SteveMcCluskey 22:06, 3 June 2006 (UTC)

Validation and Trust[edit]

The software world uses some very simple models for validation and trust: versioning and review. It would not be hard to apply similar models to wikipedia articles. An author should be allowed to version her article and submit for review. Others in the field pertaining to the article's contents could review the article and rate the article's accuracy and leave comments for the author (or next author). There is a condition for this to work: others in the field must be trusted. Trust can be enforced in a number of ways: authentication and review. Authentication is in place already with Wikipedia's login (maybe not extremely strong, but usable). Review (or rate) is essentially what can be found on an online auction site. Users review (or rate) the expertise of a certain user. Users with higher ratings in that field have higher weights applied on their ratings. Credentials and credential verification could be another way to validate trust of an article reviewer. Along with the aforementioned suggestion (a better passage history user interface), any user of Wikipedia would be able to find a versioned article and have less doubt concerning its validity. --James.cary9 23:20, 19 July 2006 (UTC)