Talk:Good Source Use and Documentation Practice


The Educational Approach [14 April 2009]

Not thinking that a very large percentage of pages are going to get rated/starred, I prefer to tell people what is desired of them. To that end, I tried putting together a rough draft/strawman. It needs examples, among other shortcomings. The original presentation and associated discussion is at WeRelate_talk:Junk_Genealogy&oldid=11532367#Your_Contribution_to_Quality_.28Yes.21_You..29_3

Your Contribution to Quality (Yes! You.)

A goal of WeRelate is to become a repository of high-quality, reliable genealogical data. Various organizations have written detailed descriptions of what constitutes high-quality genealogical research. For example, the Standards Manual for the Board for Certification of Genealogists is viewable at BCG Standards Manual.

WeRelate is a collaborative effort. It is not necessary for one person to do all the hard steps in producing high-quality data, such as the exhaustive search of relevant sources. As long as the work that each person does is entered in a way that empowers collaboration, the community will be able to supplement it and bring it closer to BCG standards over time. This is to everybody’s benefit.

The foundation, upon which this whole process rests, is documenting the sources of your information.

  • As a courteous person, you are giving credit to the person who did the hard work.
  • As a collaborative person, you are enabling others to verify the work to ensure its reliability.
  • As a helpful person, you are providing pointers to others who are looking for more information.
  • As a researcher, you are providing a dispassionate argument to support your conclusions.

Whenever possible, source citations should reference items in the Source namespace. These are sources that are publicly available in a repository, such as a library or an Internet website. Whether you add data manually, or via GEDCOM upload, you should try to convert all your sources to point to Source pages, if they fit this criteria. If there is no page for your Source, create it. Source citations should give enough information, particularly page numbers or current URLs, etc., so that another person can easily and unambiguously locate the relevant material. Supplementing your citation with a brief abstract that honors any copyright protection can be very useful to other users. See Help:Source pages for details of working on sources.

The MySource namespace is used for one of a kind sources that other people will generally not have access to, such as conversations, family Bibles, etc. In citing these sources, you should be prepared to share them to the extent practical, such as e-mailing photographs of Bible pages, or providing transcriptions. It is common to see such sources described as being in the possession of some person. Do not publish the name, address, phone number or email address of such third parties without getting their permission first.

Quality of sources, not quantity.

One of the enemies of clarity is excessive data. Adding sources to WeRelate should strive to increase the quality of what is there, not merely adding redundant sources saying the same thing. Genealogical issues are not solved by counting the absolute number of sources on each side, but by thoroughly analyzing the reliability and quality of the sources to decide which is most credible. There are several characteristics that help identify higher-quality sources.

Contemporary sources preferred over after-the-fact reporting. Contemporaneous written records (made at the time the event happened) are usually given more respect that after-the-fact reporting of facts or family tradition. They tend to be freer of myths, faulty memory, and accumulated error. You should attempt to find sources that are contemporaneous when possible, or sources that quote or cite contemporaneous records when not.

Original sources preferred over derivative sources. Derivative sources just pass along data that others have gathered. Often the authority for the data is lost in the process. You should try to provide the original sources when available. When you cite a derivative source, try to cite those that identify the original sources. If a derivative source does not provide the basis for its data, its reliability can only be guessed at.

A special case of derivative data are other people’s GEDCOMs, One World Tree data, Ancestral Files, etc. These do not make good sources. While there are individual cases of these sources that have excellent quality, there are as many or more poor quality ones in existence. As it is very difficult to assess the quality of these types of sources from a citation, most people simply discount all such electronic family trees as meaningless.

Consistency with other facts. Direct data will generally carry more weight then indirect evidence, the latter only showing the side-effect of some event. But, every human activity is prone to error. And even the highest quality, most direct sources can be erroneous. A genealogical analysis will place more stock in a collection of consistent data items, even though indirect in nature, than in a direct item that is inconsistent with other facts. So, when there is doubt about a fact, try to find independent evidence of that fact. For example, consider trying to decide, given several people with the same name, which one a birth record should be applied to. A baptism record, or being mentioned in a will, or the age on a gravestone, can often verify the validity of that decision, even though none of them addresses the birth event directly.

Analyze data within its context. To interpret data correctly, you must have some familiarity with related cultural, historical and geographical details. Don’t be afraid to research unrelated people to rule out alternative suggestions. Find explanations for all discrepancies. Recognize your biases and your assumptions.

Be courteous. If sources are cited for facts with which you disagree, try to start a discussion on the Talk page. Consider the evidence presented by the cited sources fairly. To the extent possible, try to show how the given facts are inconsistent with other known facts. Present evidence for alternative facts, citing sources of equal or higher quality than the ones already cited. Give the watchers of the page a chance to respond. Listen to what they say. If there is not a clear consensus, consider that perhaps the truth is not knowable without more information, and the most useful result may simply be to leave the discussion on the Talk page where it may be seen by other researchers.

The collaborative process of WeRelate is a long-term, two-way interaction. Your participation can be a valuable part. But it is more than simply loading a GEDCOM. It is a continual process of querying, responding, redirecting research, and sharing new data and thoughts. Please do not dump your data and go. Stay involved and watch the magic happen!--Jrich 18:16, 13 April 2009 (EDT)

"One of the enemies of clarity is excessive data." While I appreciate the intent, the above is way too long for the average web-page reader/person. I don't consider myself average and *I* couldn't get through it. I think you're trying to do too much in the above. KIS(s). How about:

Your Contribution to Quality (Yes! You.)

WeRelate seeks to become a repository of high-quality, reliable genealogical data. It does this through collaborative contributions of a community of individuals committed to improving existing compilations of genealogical information. Key to our (and your) success is documenting the sources of your information. This includes:

  • Citing reliable sources for as much of what is posted here as possible.
  • Giving credit to the person who did the hard work.
  • Collaborating with others to verify the work to ensure its reliability.
  • Providing dispassionate cases to support your conclusions.
  • Giving enough information, particularly page numbers or current URLs, etc., so that another person can easily and unambiguously locate the relevant material.

There are several characteristics that help identify higher-quality sources. To understand what constitutes high-quality genealogical research, see:

Be wary of citing such "sources" as:

  • Other people’s GEDCOMs,
  • One World Tree data,
  • Ancestral Files, etc.

These do not make good sources as it is very difficult to assess the quality of these types of sources.

Be courteous. If sources are cited for facts with which you disagree, start a discussion on the Talk page associated with the person or family. Give the watchers of the page a chance to respond. Listen to what they say. If there is not a clear consensus, consider that perhaps the truth is not knowable without more information, and the most useful result may simply be to leave the discussion on the Talk page where it may be seen by other researchers.

The collaborative process of WeRelate is a long-term, two-way interaction. Your participation can be a valuable part. But it is more than simply loading a GEDCOM. It is a continual process of querying, responding, redirecting research, and sharing new data and thoughts. Please do not dump your data and go. Stay involved and watch the magic happen!

-- jillaine 21:26, 13 April 2009 (EDT)

How about we put this on a separate subpage. Then it can be edited without having to repeat the whole thing. I've put a copy at Mission Statement. We can change the name of that as deemed appropriate. Q 22:00, 13 April 2009 (EDT)

Um. I was a little surprised to see a link to the BCG Standards Manual, so I went to Google books to check. That site has only a sampling of pages, not the whole text. You might want to link instead to the Standards page at the BCG website, since it's easily accessible. --Mike (mksmith) 12:04, 14 April 2009 (EDT)

The problem is that BCG sells the manual, so the webpage you mentioned is just a short summary of GPS. GPS was not the topic I was trying to explain to new users when I first wrote the document. As I mentioned, with WeRelate, satisfying the GPS will be a community achievement over time, and need not be accomplished by each individual's research.

What is needed is to educate people how to explain where their data came from. Secondarily, they need some appreciation for what is a good source. They may not appreciate the need to do this in their own family tree, because they heard their family history told to them their whole life, but they need to realize how differently things work when you are trying to convince other people of the correctness of your data.

I am probably not a good person to do that, but was just trying to illustrate by example. Personally, I think there should be a whole series of self-paced tutorials with computer-graded tests at the end, and new users should not be allowed to input data until they pass all the tests. At least then you would know they read everything you wanted them to. Whether they follow those instructions is another issue.

--Jrich 14:22, 14 April 2009 (EDT)

Contributing your GEDCOM

WeRelate grows through the submission of individual lineages by users that are combined with the existing lineages into theOne WikiTree. If you use a genealogy program of one sort or another, you probably have the capability of converting your family data into a GedCom file. That file can then be submitted to WeRelate, adding your personal branch to the single family tree of WeRelate. Some portions of your personal lineage may already be found on WeRelate, placed there by other users of this site. But everyone's personal lineage has its unique elements, either in the form of your immediate family, or in lines where you've researched and built, but which have not been previously explored by other users. Those unique elements are the ones that will be attached to the tree when you submit your GedCom.

[OK, here's a point where we really do have a problem---the information that's included in a new addition may actually be better than what's there now. But we have no way of seeing that information, since the data for existing portions of the tree will be ignored. I can think of some solutions for this, but that needs to be discussed on the Talk page. Q 15:30, 23 April 2009 (EDT)]

[How will they be ignored? Is there some new policy I missed? Jillaine 13:38, 1 August 2010 (EDT)]
If they aren't added during the upload process, but are rejected, then they can't be seenby anyone but the original submitter.Q 17:49, 1 August 2010 (EDT)

(Something here about how your contribution of your GEDCOM is a contribution to a single tree; how we are different than other online family tree sites, etc. Basically a one-paragraph explanation.) Before you contribute your GEDCOM, please read the following so that you understand what we're seeking.

[There is an entire page of instructions for "Before you contribute your GEDCOM"; we should just link to that. In fact, why don't we just replace this entire section with a link to that page? Jillaine 13:38, 1 August 2010 (EDT)]
That should workQ 17:49, 1 August 2010 (EDT)
Thanks. I've amended and abbreviated the article page to direct new users to the new Help:Before you import your GEDCOM help page article and moved the original discussion portion here. --BobC 22:17, 1 August 2010 (EDT)

Not really a mission statement [22 April 2009]

While I support the idea of setting this topic aside in its own space so that we can focus on it, I find the title "Mission Statement" a bit confusing and misleading.

As jrich pointed out, the initial goal of his proposal was to help promote good Source/Documentation practice. It's always wise to ground such conversations in the mission of WeRelate, but WR probably already has a mission statement.

That said, whatever language we do come up with could be added to any "About" page at WeRelate as well as any page that helps visitors understand what we're looking for here in terms of quality.

Where do we go from here?

-- jillaine 17:23, 21 April 2009 (EDT)

Guess it sort of evolved from the previous discussion. [There's a previously existing page-- Source:MISSION STATEMENT--that may or may not signify something. But a) it has no effective content, b) makes no sense in the namespace its in, and c) was created by the WeRelate agent. I've no idea why it was created, but it probably wasn't a concious decision.] In anycase, we probably DO need a mission statement, but perhaps that's not what's arising out of the previous conversation. Maybe its a separate problem.
However, it would seem to me that "doing good genealogy" on THIS site, is contingent on knowing what the site is trying to do. With merger mania in full play, its clear to me that the object of the site is NOT simply to collect peoples family lineages, and preserve them for their use---whatever that use might be. That particular niche is probably being handled well be other sites (like Ancestry), need not be duplicated here, and more-over takes no advantage of the Wiki nature of this site.
Looking at the implications of the mergers going on, coupled with the fact that "merger" IS site policy, I can't help but conclude that the underlying intent is to integrate everything that comes into the site so that you get one huge monolithic tree, combining everyone data. That is, as people upload their data, that data is merged with the existing tree. Good stuff is kept, bad stuff gets converted into free electrons---with the object being that we get a well documented, sourced tree for everyone.
In a system like that you really have to emphasize doing good work. YOu can't get away with what they have on Ancestry, for example, where you find for any given person multiple sets of parents, DOBs DOD's spouses, etc. There's only one "RIGHT" answer (in the Aristotilian sense), and the others are errors that have to be elminated--otherwise you don't have a monolithic tree. Of course, the problem here is "Whose got the right answer". In any given case we might think "I know the right answer", but then someone else can say exactly the same thing. How do you separate them out?
The answer to that is that it depends on what the data shows. And ultimately, that's a question of good sourceing of information. Two different data sets describing the same lineage may show two different sets of data. How you distinguish between them as to which on is more likely to be right or wrong, is by looking at the quality of their sourceing. When you've got good original sources to point to, and/or a good logic train to support a view, that takes precendence over someone's "This is the right answer, because its the right answer!". That's where showing "how you know something" is important. That's what allows others to evaluate two sets of data, and conclude that this one's got the better answer (or maybe neither has a very good answer). That's why sourcing is important in genealogy, and that's why things like the BCG standards of proof are important.
And ulitmately, that's also why the site's Mission Statement is directly relevant to this conversation. Its what the site is trying to do, and I think that deciding what's "good genealogy" needs to be interpreted in terms of that objective. Q 18:56, 21 April 2009 (EDT)

Q, that's all well and good but what's our purpose here? What is the purpose for which we are writing this language? And for which audience? Seems like we've got at least three different approaches to it, and until we reach alignment on the purpose of drafting this language, then we'll just be throwing pieces of ones and zeroes into the aether.
My approach? We've got a problem in that people are uploading GEDCOMs with insufficient and often frustrating data; we're trying to encourage good practice in terms of what information is shared here. I concur with you that part of that requires helping people understand the purpose of the site, and yes, as Dallan & Solveig have posted in a variety of places: the goal is a single tree-- what's he call it? Pangea? a single, but extensive, far-reaching root system. So I'm interested in seeing language that helps people understand this BEFORE they decide to upload their GEDCOMs, to make sure they're "in the right room" (as opposed to "next door" over at WorldConnect/, and that they're willing to "play" the wiki game with us, and here is the layout of the game board and the "rules of the game." -- jillaine 22:08, 21 April 2009 (EDT)
Yes, that is the purpose of a mission statement. Q 08:31, 22 April 2009 (EDT)

pangea [22 April 2009]


I think the concept of a single, monolithic, unified tree has some advantages, particularly in a Wiki environment. After all, if the idea is to encourage collaboration, then that would indeed lead to a single, monolithic, unified tree. Calling that tree "Pangea" may be a reasonable thing to do, though it would be confused to some extent, I suspect, with the geologists "Pangea". Be that as it may be, I find no reference on the site to Dallan's use of this term, though I do find it in at least two instances where its used in its geological sense.

Can you point to a location where Dallan, or someone else has used the term in this way? Q 12:57, 22 April 2009 (EDT)

Ooops! Meant "pando" not Pangea. On the home page and elsewhere. Thanks for catching this. jillaine 13:13, 22 April 2009 (EDT)
I'm almost sorry I caught that. "Pando" might be a classy reference to use, but that is one-plug-ugly word. Seems like the Vikings had a similar concept, but a better word. I'll check. Of course, its Dallan's game so he gets to choose I suppose. Q 14:26, 22 April 2009 (EDT)

yeah, i'm not too keen on the word either (which might be why I unconsciously changed it to Pangea...) We don't have to use it here. We can describe the effort without using the word. jillaine 15:01, 22 April 2009 (EDT)
Might simply call it "The WeRelate Tree". Would go with "One World Tree", but that's already taken. Q 15:14, 22 April 2009 (EDT)

Done. jillaine 16:15, 22 April 2009 (EDT)

Pando [12 September 2009]

I see where GedCom import notifications now include a reference to "Pando". Repetition makes it no less ugly. I tried out "One WikiTree" in a couple of places. Some improvement, but I guess Dallan likes Pando. Q 13:45, 24 April 2009 (EDT)

Pando is cool because it really exists and is so illustrative of what we are. There really is a "tree" in the Unitas that has many many trunks coming off the same root system. It covers acres and is the largest living organism in the world. Dallan just recently thought about calling it Pando, that's why its not used throughout the website. I am concerned that if we call it "One WikiTree" it might get confused with "One World Tree" and several other such non wiki efforts.

As far as a mission statement goes, everyone has great ideas. Less is more. Most users won't read a page of text. I have written page after page of instructions/help pages and most new users never look at them. Is there some way to be more concise and still get the point across? Somehow we have to get across to people that this site is for serious genealogists and not just another place to post gedcoms.  :)--sq 09:46, 27 April 2009 (EDT)

The point about "One WikiTree" being similar to "One World Tree" is something I've thought of. I wasn't particlarly pushing that one, partially for that reason. More or less trying it on for size. Simply "WikiTree" would also work. But Hey, its Y'lls site, so if you want to call it "Pando", so be it. Its just that while the name does fit, its sort of bizzare to western ears. But then, so was "panda" once upon a time. Q 11:50, 27 April 2009 (EDT)
Yes, simplifying is good. Less is more. The footnote table I added does probably overload. We'll work to cut this down. The table probably goes to a separate article, anyway. Q 11:50, 27 April 2009 (EDT)

One thought I had was to relate (??) to the term Genome - from the "Human genome project" which was the project to map the whole of the Human DNA.

Perhaps a term could be the Human Gedome project which has its aim to Map the entire Human family tree--Dsrodgers34 23:18, 10 September 2009 (EDT)

Human Gedome project is cool too. Let's re-visit this early next year when we prepare to come out of beta. I like Pando, but I like Human Gedome project as well.--Dallan 17:02, 12 September 2009 (EDT)

Ambiguity [2 August 2010]

Bob. Having done the bit on ratings I would like to explore ambiguity a bit. I have one principle - that a given source (eg a line in a census) can only be linked to one ancestor (as a primary source). Where more than one candidate exists, if there is no more evidence to be uncovered, there the item should be linked to all possible ancestors but ranked in terms of probability.

In order to explore a little more - do you have links to any pertinent articles or discussion - Dale

Dale, if your question is addressed to me, I didn't make any input on this subject yet, although I changed the page title yesterday to more accurately identify the topic at the recommendation of other responders' remarks.
I generally agree with your statement relating to the census attribution, except for the word "only". I can see circumstances where you may also link a particular census line to parents of that individual where it identifies the birthplace of the parents. That information may be your only clue to the birthplace of the subject's father and mother, and therefore should be referenced as a source for that data on each of the parents' pages. You may have an argument that although that reference overall is an "original, primary source," the record was written substantially after the event described, and therefore, in my opinion, is "secondary" in reliability (or quality level for that data). So although I understand what you mean by the use of the word probability, the more accurate nomenclature would be reliability.
That is an example of why I think the associate artice page here needs to be analyzed and discussed further. We need to agree on proper usage of terms and terminology at WeRelate. The use of "original," "equivalent," "secondary," "tertiary", and "ephemeral" might be understood by the writer and might be useful for describing the type of record used as a source document, they don't match the accepted terminology used for describing type, quality or reliability of evidentiary sources in most genealogy programs as well as in most publications on the subject. We shouldn't reinvent the wheel.
If interested in following up on this, you may want to review the related comments on the Analyzing your Genealogical Sources talk page and the Genealogy Process chart which is a graphical illustration of the Genealogical Proof Standards encouraged by Elizabeth Shown Mills in her various publications. --BobC 13:04, 1 August 2010 (EDT)

Thanks Bob. I was a bit vague in my first post. Probably what I mean to say is I would like to explore some gudelines of how to consistently present findings where abmbiguity exists. I have this idea that if there is ambiguity, each time a new researcher looks at a collection of data/sources, they will face the same issues. The idea is, that instead of a researcher 'solving' sme ambiguity, then posting the findings with no reference to the ambiguity, then it would not be as complete as if it were referred to. Perhaps the discussion could be on talk page with a single reference on the main page. There will also be cases where the ambiguity is never resolved - and a 'best guess' mght need to be proposed. Conventional genealogy says this should not be published until solved. In that case we get the scenario whare a case gets looked at over and over. better that the Hypothesizes ancestor is given a page, but the ambiguous findings be shown.

I should alo mention the cost element. I'm sure many peopl do as I do and only purchase BMD certificates to solve an ambiguity or where there is a strong likelyhood that ectra info is contained within. If you place 'people' or 'family' pages in here without the appropriate BMD certificates, there should be a 'standard' by which the relationships can be taken as very likely because all the sources quoted are consistent and no confliction information exists. I reckon in rural england , census era, that at least 90% fall under this category. (In my area of interest, even though there are 13 Mary Howards in one parish in one census alone - most can be resolved due to birth date etc. I would say that over time, More BMD info will be available online, and this info can be resolved further.

I think that's the great feature of WeRelate, is that ambiguous elements and data fields will eventually get corrected, modified, proven or disregarded based on the input of multiple users with different reference points, and their access to various sources and official records. If you experience ambiguous results in your own uploaded records, I would suggest you note those ambiguities at every point, and let the community add to, concur with, analyze or contend with those differences. Better stated up-front and brought to the attention of users rather than remain hidden or presupposed as proven, and thereby end up misleading others (and ourselves) with information that may not be totally accurate.
Only you can judge the value of original BMD certificates when confronted with either proven or ambiguous elements within your area of research versus the cost incurred in obtaining them. The value of a primary source document in my view is immense, especially when it may clear up some ambiguous data element that may have been originally obtained in some unreliable GEDCOM download or questionable genealogy record passed through multiple hands and multiple genealogy programs. In the absence of an original, primary source document, then a "best guess" may have to suffice. I disagree that this "best guess" should not be published -- it may be the best information that is available, but should be clearly stated what it is, a questionable best guess or an educated estimate based on information that is readily available to others to review. (I agree with you there - Dale)
Another great thing about WeRelate is that each page has single discussion reference to the main page -- a "Talk Page" where you can post such ambiguities and uncertainties in your data elements, a place to bring up unproven information in the person or family page you are interested in, and a place to request assistance from others in finding more reliable information and sources. Good luck. --BobC 22:49, 1 August 2010 (EDT)