WeRelate talk:Junk Genealogy/Archives 2008-9

Watchers

Topics


Scots initial Proposal [4 June 2008]

I am becoming more frustrated, disillusioned and concerned every day. Some 3 years ago it ocurred to me that the Wiki model could be a great opportunity for genealogy. When I found this site, as well as several others, I was excited that someone else had come to the same conclusion. However, it appears that any collaborative effort is being overwhelmed by the amount of Junk genealogy being uploaded to the site. I have been thinking about how to prevent the Werelate site from becoming just another repository for misinformation like so many others. Downloading gedcoms from world connect or the AF and then uploading it for someone else to edit is not genealogy. What we should be doing is compiling data, examining sources, weeding out the trash and creating a credible database. How can we encourage collaboration by those indivduals who are serious researchers and eliminate those who are just waving their pedegree to masssge their egos.

Some thoughts:

  1. If a person joins Werelate, uploads a gedcom and walks away, he contributes nothing.How about, If after uploading he does no editing for a certain period, say 90 days, then his upload is purged, except for pages edited by others just as if he removed it himself.
  2. For person pages for individuals from before 1500 or so, allow the surname field to remain empty without the unknown tag. The Title prefix or suffix can be used to differentiate individuals. Because people feel a need to enter something into the surname field, this creates an incredible number of variations for the same person if he does not have a surname. I have many instances of duplicates because they are entered in different languages. I realize that place names eventually became toponymics and are used as surnames, but in medieval times, they simply indicated where a person was from, often these appear with of, de, van, etc. preceding them and each vriation if used as a surname results in a duplicate entry.
  3. Accept no data for individuals prior to 1600 without source reference.
  4. Screen the sources and reject submissions based on questionabe sources or those known to be flawed.
  5. Perhaps have two separate databases with separate rules for submission. One for medieval, royal, historical and celebrity figures where submissions are restricted as stated above. Don't allow GEDCOM uploads to this section, only individual pages. A second data base remains as it is now but with the purge function for inactive users..Allow linkage to individuals in the other database within immediate families.

Maybe this seems rather Draconian, but I feel some kind of control must be implemented to prevent the site from becoming a hopeless morass like most others.. Opinion s anyone?--Scot 19:30, 9 April 2008 (EDT)


1. Try to imagine werelate is good enough to survive in some form 100 yars from now. I believe the next stage for amateur geneaolgy/ history is modelling of families / communities / places / events. wos to say something contributed yesterday then abandoned ( eg the 'only' online copy of someone's wedding or a school photograph wouldnt be of interest to someone else, even though it has not been looked at for a perid of 20 years

That person may not have sourced it correctly, or identified all the people on the photo , but may have left enough clues for someone else to identify it correctly.

In the UK you have to spend seven pounds for a copy of a certificate - perhaps in 5 years all the info will be online for 7p and it would be affordabel to model entire communities.

How do you know waht is useful? its in the eye of the beholder.

Perhaps the junk could be left out there but the werelate community should develop a simple quality accredition system. That which is sourced correctly (as identified by werelate accredited officers) can be coded as such.

Apage with a lower quality rating cannnot then 'damage' one with a higher one

Perhaps volunteers could adopt a geographic location and ask other interested parties to connect with them. With the right tools a volunteer could keep things in check. One thing I like about the wiki is you can host a family tree, a one neme or a one place - theoretically they should be able to co exist--Dsrodgers34 01:14, 4 June 2008 (EDT)


Some Responses

It has occurred to me that abandoned GEDCOM uploads are very much a mixed bag. It may sometimes be that the person just didn't take to the site and their data is still pretty good. Other times, well.... Anyway, I agree with the thrust of scot's argument - there need to be some steps taken to prevent werelate from becoming a sewer.

GEDCOM genealogy has made it easy for people to accumulate a data set that they would never have any hope of seriously maintaining - even given many lifetimes. We should encourage people to upload only data that they are serious about working on. Maybe the size of a particular GEDCOM upload should be limited unless by special arrangement (5000 or so?). Likewise a GEDCOM that goes back before 1500 AD.

The other problem of course, is the management of abandoned data. I've got a fairly small tree compared to some (~3000). I try to source things in detail and would like to think the pages are the sort of thing the site would want to host indefinitely. There should be a way to designate trees as a permanent part of the collection (genealogy goes on forever - good research done in the 1850s is still working for us - but I don't think anyone reading this dates to 1850...do they?). On the other hand, if a tree is loaded and worked a little then abandoned for a while, it probably shouldn't automatically persist forever. If someone in the user community wants to adopt the tree, maybe it just goes to them. If no one in the user community wants it, or perhaps if the user community actively requests it's removal, then after a time it goes away (if we want to be nice about it, maybe it gets archived to a named GEDCOM and tossed into the digital library?).--Jrm03063 22:04, 9 April 2008 (EDT)

Anything in an abandoned tree can be retained by any one, simply by editing the page. My 90 day suggestion is only for the period after the initial upload. I f they don't do match/merge for any of their data we can assume that they aren't interested in maintaining it. After that a longer period if inactivity could be required before declaring the data abandoned. If the match/merge utility works well, duplicate entries might not hang around so long, so it will be easier to find and evaluate data in recent uploads. Again any pages that are merged or edited will be retained and the rest of the tree was not found to be of interest to anyone searching.--Scot 01:12, 10 April 2008 (EDT)
There are several types of users on WeRelate presently. I looked at some of the users who registered in April 2007. First, you have those who only registered; second you have the users who registered and created a profile and listed surnames in place that they are researching but have no other contributions; third you have the users who created a profile, uploaded a gedcom and have no contributions since the initial gedcom upload, and fourth you have the users who have an active file and recent contributions.

I believe that we should eliminate all unsourced gedcoms after 6 months; unless they are watched pages by someone other than the user or an agent of WeRelate. --Beth 08:31, 10 April 2008 (EDT)


I agree in principal, the problem I see is that there probably aren't any utterly unsourced GEDCOMs, though there are plenty of essentially unsourced GEDCOMs. The former being a GEDCOM entirely without source records and citations. The latter being a GEDCOM with "OneWorldTree", gedcom upload date, and other sorts of essentially useless sources. Trying to make software know the difference wouldn't be all that easy.

I think we need to hook on other criteria to decide that something is both abandoned and useless.

Also, having one or more pages in an otherwise abandoned tree watched probably doesn't say a lot about the quality of the tree generally - though anyone watching a portion of a tree should be consulted before the remainder of the tree goes away. When I merge duplicate families, I don't concern myself with any question about whether an originating page comes from a "good" tree. I only try to understand whether the various pages are talking about the same people (or at least, the same fantasy about people).--Jrm03063 11:43, 10 April 2008 (EDT)


First, this topic should probably be moved to a separate page, because it soon is going to take on a life of its own.

Secondly, I am in 100% agreement with Scott. Becoming "Draconian" in principle is not going to turn away the masses, because from what I can tell, there doesn't seem to be a mad rush taking place to genealogy wikis anyway. Why is that? Becoming a little more picky about what gets uploaded and about what stays uploaded is instead going to attract those researchers who are serious about collaboration and who don't take offense when a bad source or bad information has been revealed. --Ronni 12:41, 10 April 2008 (EDT)


Dallans Response [10 May 2008]

A few thoughts:

  • It would be pretty difficult for a computer to determine whether a tree is of "good quality." People will have to determine this.
  • Abandoned trees that are of good quality are probably worth keeping around.
  • Deleting a tree carries its own problems. Not too long ago a user deleted a tree that others were interested in (but had not gone to the fair amount of effort to watch all of the pages), and it caused some grief when it was gone. There are people on the other side of the fence arguing that we should be more strict about removing trees. (I assume this applies only to "good quality" trees.)

I don't think we want to automatically remove abandoned trees, because abandoned trees that are of good quality are worth keeping around, and the system can't tell the difference between good quality and junk. So let's focus on removing junk trees. Under what conditions then would we want to remove a junk tree? I can think of four; perhaps there are more?

  1. The junk tree contains a lot of internal duplicates (duplicates within the tree itself)
  2. The junk tree overlaps with existing trees, and the tree uploader didn't merge the pages
  3. The junk tree overlaps with existing trees, and the tree uploader merged the pages but in so doing added a bunch of "bad" data from their tree to existing pages
  4. The junk tree overlaps with a well-sourced tree that I am trying to upload, and merging my tree with it is going to add my well-sourced data to a bunch of pages with "bad" data

I'd like to consider each of these cases in turn.

  1. There's nothing to do here but delete the tree, as has happened already with the tree that contained a large number of internal duplicates for the Norman's. If someone finds a tree with a large number of internal duplicates, I think we should contact the submitter and delete the tree.
  2. I think the best way to resolve this is to require the tree submitter to go through a match+merge step (where they are shown the probable-overlapping trees and can choose which pages to merge) within say 7 or 14 days of uploading the tree. If they have not completed the match+merge step within 3 days they get a warning, and the tree is removed if they have not completed it within 7 or 14 days. Trees that aren't determined to overlap any existing trees don't have to go through this step of course.
  3. This is a more difficult problem: The tree submitter merged their pages into an existing tree, but the merger resulted in a bunch of questionable data and sources being copied into otherwise good pages. We may need to have an option in merging to not append data from the new pages onto the existing pages.
  4. This is the opposite of the previous problem. I am trying to submit a new tree, and it overlaps an existing junk tree. But if I merge my pages into the existing tree I don't want my good data appended to a bunch of junk. We may need to have an option in merging to have the data from the new pages replace the data on the existing pages.

BTW, don't get discouraged. Match+merge is something that should have been implemented a long time ago, but it's not an insurmountable problem. As part of match+merge we'll have a screen that shows all of the probable duplicates between two trees, and lets the tree contributors discuss and select which pages to merge. This will hopefully make merging much easier than it is now.--Dallan 17:08, 10 April 2008 (EDT)


Question about #3/4 above - at some point in the past, we talked about a function that during gedcom upload would identify the duplicates and do a merge if necessary. At that point, the user would have the option of just not uploading the duplicate people. That takes care of 1) those people in well-sourced trees that are placemarkers (like spouse's parents one hasn't pursued); 2) chunks of badly researched trees; and 3) situations in between where you want see what's there before deciding one way or another. With some instructions, hopefully most offenders will recognize themselves and not upload their junk onto "nice" pages. Is something like that happening? If so, we're talking about people that ignored that instruction, which adds another dimension. But, that said, I also think you do really need to have an option of not appending the data from one page or another to the merged page. I would say, based on hundreds of merges, that far more often then not, one page is either junk or functionally, but not literally, identical (that is, one user says b. Windsor, CN, the other says b. Windsor, CT - this is why I'm still hand-merging, because these are human decisions.) So to avoid creating more work and more junk, I would think a "use data from ___ page" option would be highly useful. --Amelia 08:35, 4 May 2008 (EDT)
The duplicate-detection hasn't been implemented yet. I agree that we need to implement it. And a "use data from ___ page" option when merging is also a great idea.--Dallan 15:26, 6 May 2008 (EDT)
See Gen Mehods Archives for a recent comment on this problem and wiki's, particular in the context of the LDS site. Q 08:54, 7 May 2008 (EDT)
Interesting discussion. From what I've been told the new family search wiki is primarily a way for them to get their research outlines into a form that others can extend. It should also allow them to more easily post their own material online that is currently available only at the family history library (the "half-sheets" at the reference desks). I think it's a great step for them.--Dallan 10:44, 10 May 2008 (EDT)

Medieval Genealogy

Medieval genealogy (pre-1600) is a slightly different problem than the problem of "junk genealogies" because (a) there are only a few people that we have records for pre-1600, and (b) those people didn't generally have surnames, and the birth dates are often approximated. I'm not unwilling to prohibit people from uploading pre-1600 people, but I'd like to first see if we can merge uploaded pre-1600 people into well-sourced existing pre-1600 pages, and rather than append their probably-lesser-quality information onto the existing pages, we would not modify the existing pages.--Dallan 17:18, 10 April 2008 (EDT)


Speedy Delete [20 April 2008]

While I don't particularly think wholesale deletion of "abandoned" "poor quality" trees is a good idea, a feature that would be good to have is something akin to "Candidate for Speedy Delete" on other wiki's. In truth, that capability is already in place in part, in that its present under the "More" pulldown menu (at least when you are at the article level)---specifically, if you are the only person watching a page you can delete it anytime you want, as per the following guidance:

If you are the only person watching a page, click the More link in the upper right corner of the screen under the blue bar. Select Delete, enter a reason for the deletion and click Delete Page.

But I'll bet there are a lot of pages, such as duplicates, where the author is no longer paying in attention, and a duplicated or otherwise unneeded article (kind words for junk) could be removed with no loss to anyone. Its probably not a good idea to allow just anyone to do that, but I think its something that could be done suitably by an administrator---if they knew that someone thought a particular page could be done away with. If there were a repository where people could nominate candidates for speedy deletion, someone from the admin side would go through the list, review, and make a rationale decision about how to handle the article. That might mean notifying the original creator, denying the request, or perhaps, immediate deletion if that were appropriate.

Now, in truth, I don't really know of any articles so messed up that I think I'd delete them. But I DO encounter lots of duplicates---usually by the same author. I suspect that there's something in the process of GEDCOM uploads that creates them. Possibly they re-upload their GEDCOM periodically to sweep up any changes that they've entered in their genealogy program, and the upload program can't identify things that haven't changed, and just creates everything anew.---hence, lots of duplicates). Don't know why the duplicates are there, but the fact is, they are---and might be candidates for speedy deletion. Q 19:32, 10 April 2008 (EDT)


I like this idea. If a page or set of pages isn't just poor quality, but duplicates pages already in the system and is causing merge work without contributing any new information to those pages, I could see marking them for speedy deletion. Then you have a human being instead of a computer making the final delete decision. What are others' thoughts on this?--Dallan 15:08, 15 April 2008 (EDT)


I agree that human interaction is needed in making these kinds of decisions. I also believe we shouldn't let this issue fall by the wayside. I just came across a GEDCOM uploaded in February that has many duplicate pages in it. The GEDCOM is by no means considered "junk," however. But it was uploaded and the user has not edited it since nor have they contributed another page to WeRelate. --Ronni 22:46, 19 April 2008 (EDT)

One of the things Dallan has indicated would be in place eventually is a search result tabulation similar to the browse function, but including more information than just the name---ie, DOB/POB/Spouse/Father/Mother type information. Such a tabulation would make it easier to spot duplicates of this sort---especially if it included the identity of the submitter. That way,if you ran a search and found that John Smith had created four separate cards for a "Jeremy Black" all with similar DOB's and DOD's, Spouses etc, you'd be fairly sure that some of them were duplicates. Q 10:17, 20 April 2008 (EDT)
Including the identity of the submitter is a good idea. I don't have that readily available, but the list of users watching the page is available. I'll include the watching users in the search results.--Dallan 17:30, 24 April 2008 (EDT)

Status flags to mark state of data [4 May 2008]

Interesting discussion. The data I have loaded up for my small tree cannot be described as bad data, but it is poorly organized and presented. I am a beginer in genealogy and when I used Family Tree to hold my data I didn't put the data in the correct places. When I uploaded a gedcom it looks like a dump of mixed facts. This can be very confusing to any one that tries to sort through it.

Many people will be new to genealogy as well as new to computers as well as new to wikis. They will not do things correctly at the beginning and will be frustrated and over whelmed at times about the amount of work to get details into there proper places. There is a learning curve and it can be daunting especially when you are use to immediate gratification, fast food, and no line up service. This can be a lot of work.

The advantage of keeping these people active in your/our wiki is that they do bring very good data related to themselves. The closer the family ties the better the data will be. It is intuitive. So, I agree requiring better credentials for "historical" data makes sense.

The advantage for me to have a presence on this wiki is to increase the chances of contacting another person with an interest in the same people. To establish a tree for this purpose requires only the basic name, bmd stats, and locations. The details don't need to be 100% as that is why you want to find other people, to compare notes. I have connected with one person because of this wiki and we were able to share some info.

My suggestion would be to have two classes or status for data. Or even multiple status flags. Raw, draft, under construction, basic, vitals only, etc. Then when you run merges etc, you could include or exclude based on status flag. You/we will need to develop a set of criteria for fitting assigning a status flag. A disposition rule could also be auto set based on a status. For example, if status = raw, then delete 6 months after last update.

In records management profession the concept of transitory and offical records is well understood. Transitory information is used to create an offical or final record. While offical records may be kept permenantly, transitory records are not. Disposition is driven by a rule for the record series (class). Based on a triggering event, a count down of a specified time starts. When the end of the time period is reached the record is proposed for destruction. If no one can declare a reason to keep the record it is deleted.

Triggering events could be last date record amended, last date record viewed, and record status = "transitory". And/or include activity of record owner, or persons with interest in record. NO activity, transitory records, nothing happening for x months, then delete.

As an aside how many people are familiar with the 5 steps to change management? Awareness, understanding, acceptance, committment, action. It seems to me that there is a lot of change management required in a wiki to get people to move together in an agreeable direction.

Thxs Peter --PeterP 08:48, 13 April 2008 (EDT)


Good points.

I am reminded that just because an article is not being edited, does not mean that it is not being viewed. Just because an article is not being viewed does not mean it is not valued.

Eventually, there is NO user of this site, Dallan included, that will not cease editing their articles. It would not be good for this site if people could not contribute to it with confidence that their contributions would remain.

Also, in the same vein, if the criteria for deletion came to be that they had to be "good" articles (or at least not "bad" articles), you'd need to be able to define a criteria for good and bad articles. An obvious criteria would be that they meet BCG standards. How many articles on this site meet those standards? Not mine, I know.

Q 09:24, 13 April 2008 (EDT)


Wikipedia has a set of templates that anyone can add to a page to say for example that it doesn't contain source citations or is biased. These serve as flags for others to improve the articles. But articles not meeting wikipedia's criteria don't get deleted except in special circumstances. I'd be in favor of coming up with a set of templates along these lines to flag articles. But I wouldn't want to delete pages just because they weren't good quality and haven't been edited in awhile. I've uploaded my genealogy and many of those pages haven't been edited in quite awhile, and many don't have good source citations. But I'm hoping that they'll get better over time. I'd personally hate to see them deleted.--Dallan 15:08, 15 April 2008 (EDT)


Hi Dallan,

If one chooses to allow any gedcom upload without any criteria; then I certainly vote for some kind of status flags. One could be junk or more politely unsourced and second, sourced but only with meaningless sources such as WFT #3 or so and so's gedcom etc. All of these could be under one status flag.

One could isolate these trees as unavailable for automatic merging until some person chooses to edit the pages.

Then perhaps after a certain time period, perhaps one year, active users on WeRelate could vote whether or not to keep the pages or delete the pages. You could place a warning on the registration page that possibly one's pages could be deleted in the future. --Beth 10:10, 4 May 2008 (EDT)


Something along these lines makes sense. I'm not sure exactly how, and I'm not sure whether removing unsourced abandoned trees should be proactive or reactive, but it seems like we should come up with something in this area.--Dallan 15:26, 6 May 2008 (EDT)


GEDCOMs in the digitial library? [6 May 2008]

Do we support the upload of GEDCOM files to the digital library? I'm struck that, for some people who are not yet sure about whether they want to commit to the process of wiki genealogy, or if they have an unusually large GEDCOM (say, over 2K people) we should encourage them instead to protect their current GEDCOM by loading it to the digital library with whatever cover material they can muster. Then, instead of uploading their entire GEDCOM into werelate, we give them guidance on how to carve up their work to upload piecemeal.

The GEDCOM standard has been a help for genealogy, but a hinderance as well. Instead of folks focusing on a small set of ancestors that they reasonably have the time and interest to properly research, they become slaves to the maintainenace of a large data file that often turns out to contain tremendous amounts of crap. Tell them to abandon that stuff and focus on a more reasonable set of goals and they'll run off screaming. On the other hand, tell them to archive their work in a labelled and maintained repository like the digital library, while carving out the subset that they really want to actively continue work on, for use in werelate and we may all be better off.

I know that Dallan is looking at ways to improve the upload process, so that duplication can be suppressed at the start, but that's only part of the challenge. The real challenge is uploads of data that the user never really intends to actively work with.--Jrm03063 15:11, 5 May 2008 (EDT)


You should be able to upload your GEDCOM to the digital library. I haven't tried it, but I've added GEDCOM as an accepted file type to the library. I hadn't thought about having people upload a complete GEDCOM to the digital library and then copy just a portion of it to the wiki, but that seems like a really good idea. We could even have links from the wiki pages that were on the boundary of what was carved out of the GEDCOM pointing back into the GEDCOM file. (I wish there were two of me.)--Dallan 15:26, 6 May 2008 (EDT)


Usage Statistics [29 October 2008]

"Junk" genealogy is a fact of life in genealogy. Its been around for a looong time, and not a recent phenomenon, but it has taken on a life of its own with the internet. I suspect that for most services, such as Ancestry, there's really no advantage in purging junk. Perhaps the philosophy that rules is "something, anything, is better than nothing". There's a certain amount of truth to that, unpleasant though it is. The more people use a site, the more successful it will be at least in terms of survival.

With that in mind, here's a small summary of traffic on the main genealogy wiki's and some other sites for comparison. These data are from Quantcast.com, and are for the month of April. I've added some interpretive information about each (number of articles and functionality comments)

Datum Type Ancestry GenCircles.com Genealogy WeRelate WikiTree's FamilySearch Rodovid
Wiki?NoNoYesYesYesYesYes
Rank 287 21268 108397 136773 184445 287304 ND
Unique Hits per month 4.8M 102996 15192 11432 7892 4550 ND
Visits Per Month 44.9M 371021 3709 3524 0 6533 ND
visits/unique 9.30 3.60 0.24 0.31 0.00 1.44 ND
Audience Comp (Passerbys) 59 66 85 83 83 75 ND
Audience Comp (Regulars) 38 34 15 17 17 25 ND
Audience Comp (Addicts) 4 0 0 0 0 0 ND
Share of Visits (Passerbys) 12 27 71 74 71 53 ND
Share of Visits (Regulars) 41 73 29 26 29 47 ND
Share of Visits (Addicts) 47 0 0 0 0 0 ND
Number of ArticlesAVBN*ABN*20K2M??100K
GedCom supportYesYespartialYesassistedNo?
Guided Data Entry*??templatesYestemplatesNoYes
  • AVBN: A very big number
  • ABN: A big number
  • Guided data entry: Uses text boxes to input data, keeps track of relationships in some manner

None of the Genealogy wiki's shown here are anywhere close to Ancestry or even GenCircles. In terms of traffic Genealogy and WeRelate are about the same. FamilySearch has alreay grown to more site visits than either G or WR, but that may be because its new. Rodovid seems to be loosing ground; Last month it garnered about 1900 hits. Its now dropped off the board (insufficient data to report), though its still active. The high traffic count for Genealogy is, I believe, do in part to recent changes in layout (much better looking than it used to be), but I don't think that's the real driver. Its being visited more, but actual page creation seems to have dropped off. What's really driving the visitation numbers for genealogy is its connection to Wikia, and I believe an advertising campaign that's made it somewhat more visible.

Among Wiki's the major distinction is the total number of articles. WeRelate is clearly the front runner here, with 2M. Its most serious competior in terms of site activity is Genealogy with 20K articles. WikiTree has 100K articles, but its activity is lower.

The greater number of articles on WeRelate is almost certainly due to it's GEDCOM import capability. Genealogy has a similar capability, but its not been effectively implemented. Rodovid may have this capability (I'm told) but its not obvious. WikiTree can do it but it requires the operator to insert it---not automatic---that's probably why it has 100K articles, but the fact that its not automatic is a major barrier for it.

The point of this is that I believe that what is driving WeRelate's success is its GEDCOM import capablity coupled with a well thought out manual data entry system. Its the GEDCOM load that brings the useful traffic. None of the other Wiki's have functioning GedCom support.

On this site getting folks to do more than dump their GEDCOM is a challenge, but first you have to get them here. My guess is that much less than 10% of the people who dump a GedCom ever do more on the site---perhaps that's 1% who really stick. What's really going to drive the further success of the site is that small percent---these are the people ANY wiki needs---dedicated users who do more than simply dump a GEDCOM. Ultimately, they are the ones that are going to make the site work. But to get them you have to cast a large net---and ANYTHING that diminishes the number of people trying the site, is also going to diminish the number of users that turn into dedicated users.

Which is why you have to be very careful about doing things that will turn off those who come to the site for its GEDCOM dumping capability. That's a number that you want to increase, not decrease. Otherwise we might find ourselves struggling along like Rodovid with traffic so low it doesn't get picked up in the statistics. it would be nice to encourage people to do well with their genealogy, so that nothing here could be described as Junk. No one else (wiki or otherwise) has succeeded with that, and putting up with Junk Genealogy is a small price to pay in return for persistence.

And finally, I might add that I've personally developed a fondness for "Junk Genealogy". True it is junk, but there's some utility in having about a million people looking for information. Even if they don't understand the need for citing sources, there's usually enough of a clue in their work that, once spotted, you can seek out the original data yourself. I LIKE having lots of folks looking for the same things I'm interested in. The fact that many of them don't know how to report what they find, or make effective use of it, is a small price to pay for having all of those busy hands finding good stuff. Q 10:28, 7 May 2008 (EDT)


I own Family Tree Legends and am a member of GenCircles. My family files were transferred to MyHeritage and I suspect that all of the GenCircles' files have been transferred there also. The transfer was an automatic transfer; without my knowledge. That did not bother me. I have now found the burial place of a great great grandfather and successfully ordered his funeral home records.
The family sites may be public or private; your option. I have not used the new program but I suspect it works similarly to Family Tree Legends. The genealogy software is on your personal computer and you enter data into your program as usual. The data entered is automatically entered on your web page on the MyHeritage site. You receive notifcation of Smart Matches as one did with GenCircles.
The capability of having a genie program with reports and charts and the capability of automatically creating web page entries; no duplicate typing is unsurpassed in my opinion.
It would be fantastic if WeRelate had a similar capability. WeRelate is not difficult but I have not had much assistance with my data because others seem to think it is difficult to learn how to use the site
Because you have GenCircles in your chart; I thought you might be interested in the new site. Here is the link for MyHeritage. [1] --Beth 10:51, 7 May 2008 (EDT)

It pains me, but I think this analysis is sound. Most of the "junk" genealogy I've encoutered over the last six weeks or so was really just inadequate genealogy - vast wastelands of unsourced names with nothing but dates for birth, death, and marriage. Much more often than not, the information is correct or (at least) flawed in a way that is well known or been documented as a flaw in the literature. It can be an odious task to work through merging the stuff, but I think that was mostly because we've got a backlog of a couple of years of stuff that was almost entirely unmerged. I noticed that individual trees, added to a reasonably well merged space, can be merged in pretty quickly when you know how to go about it.

I still think that we should encourage folks to think critically about their purposes before uploading a GEDCOM. The paradigm shift from __my__ tree to __our__ shared genealogy space is a serious jump for folks, and it can't be reinforced enough. If they are simply looking for a place to archive their GEDCOM (or perhaps their TMG data base or whatever), then the digital library may be a better choice. If they don't have an interest in working cooperatively, leaving their data base where it can be picked up by another researcher may be the best thing to do. If they have a large GEDCOM but a core set of folks that they are really interested in working, they may want to take a hybrid approach - GEDCOM to the digital library and a subset uploaded to werelate. Having offered that guidance we probably have to trust that folks will make good decisions more often than not. When they make very bad decisions, we can always fall back on the recently used informal approach with one notorious upload - deleted by popular demand.--Jrm03063 11:25, 7 May 2008 (EDT)


I agree with everything that's been said. Thank-you for the analysis! These are all great ideas.--Dallan 10:44, 10 May 2008 (EDT)


Junk genealogy? I learned very early to be very careful when uploading anywhere. So I have 'special' gedcoms to upload with hardly any sources named. If I find I like and trust the site, I upload a better file or do as I started here. Add them manually, as I have time. So, my files would be listed in this 'junk' talk, as I have not been able to add a lot lately. Or what constitutes true 'junk'?

WeRelate has as many entries as it does BECAUSE it can take in GEDCOMs. It is more complicated than the other gen wiki sites. And I still dislike the search here.

I have gotten many leads in 'junk genealogy' files. They tell me which direction to go or to just look else where. I don't believe that a file of 10,000 or more can have sources compiled by one person, it has to be a file that was put together by taking others peoples work. So what is 'junk'?

Abandoned files? It is a lot harder to understand how this place works. Perhaps they have gone away just frustrated. Perhaps they check back periodically to see if there are changes. People tend to do what is easy when putting up their files. Not many have time to learn a new format.

I don't know if anyone has even expressed an interest in my files. The only person watching is my cousin who I told to join so she could add if she chose. She hasn't. I have never gotten any messages from here.

I'm not sure what I will do now. If I am going to be 'junked' I would prefer to delete my own files. Just my ramblings.--Twigs 11:51, 15 May 2008 (EDT)


I think some people define it as unknown (or unknowable) genealogy, lacking in source support. I think the term is a little more abstract for us however, and it probably is more a function of the contributor's conduct than of any fundamental qualities of their data at any point in time. Whether the space of your interest is ten people or ten thousand, since we're sharing the space and any overlapping research, we hope that anyone jumping in will be interested in improving the quality of their contribution going forward - regardless of where they start.

I suppose it could be put another way. Imagine a group of people doing old-fashioned genealogy collectively. Maybe they share a file cabinet at the local historical society and the group has a set of general conventions for how to record information and sources. The group tries very hard to be dilligent about getting their information correct and complete, as well as citing sources so that other researchers can review and expand their work - but of course it still is of uneven quality. Now imagine someone showing up a meeting of the group, throwing vast chunks of material they don't understand (or plan to understand) into the group's cabinet. Then, they just disappear. What is the group to do with such a contribution? Does someone suspend their own research interests and start wading through the contribution to bring it up to the quality standards of the community? Or do they just extract it, set it aside, and wait for someone with actual interest in that area to adopt the stuff and take responsibility for it?

If what you are doing would seem brusque in a group meeting around a table once a month, then it would probably be received unenthusiastically in this context.

If you have a genuine interest but are simply starting small to see if this all works for you - great! Welcome! Nice to have you here! Ask for help any time! If you're just looking for some place to archive a GEDCOM without any intentions of working the stuff further, then I suggest either the digital library or another site that archives GEDCOMs from any source.

I guess it's all a wordy way to say "play nice".--Jrm03063 13:37, 15 May 2008 (EDT)


One more thing - repeated uploads will not do what you think. The shared data space would wind up with both the old and new information, and someone would have to merge the material. I'm curious - why hold back sources? What is the issue of trust that concerns you?--Jrm03063 13:43, 15 May 2008 (EDT)


I meant no offense by commenting. I did not know that was not playing nice. I am sorry.--Twigs 15:54, 15 May 2008 (EDT)


Oh my goodness! Of course you're "playing nice" - you're talking to folks - that's participating in the group! ....and I'm only one person in this community. I'm only sharing my idea of things, which hopefully is something like the mainstream, but who knows? I was just trying to help you understand how one other person sees this space and what's behind this weird notion of "junk genealogy".--Jrm03063 17:54, 15 May 2008 (EDT)



From a Junkee [22 November 2008]

I realize I'm coming to the party a bit late but I thought you all might be interested in hearing the POV of an abandonner. I don't know how typical I am but I doubt I'm alone.

First I was introduced to that other genealogy wiki and I added some content there but their manual process was so slow and tedious it would have taken a lifetime to get much done.

Then the guy from Australia (Robin?) suggested WR and I came over here around the time you started accepting uploaded GEDs. I decided to try it out; I uploaded a couple files, created a few project pages; contributed elsewhere for awhile BUT I was also continuing to research my lines. With no way to easily update the content already on the wiki with my. Updated gedcom, I realized I would have to be updating twice. The amt of time required was disheartening. In addition, I sought more of the template support that the other wiki site offered. I left, "abandoning" my tree, other pages and leaving behind what some here would call a "junk tree". In the last 18 months I have had very few notifications that my tree was edited or watched.

My trees are of mixed quality; some pages are really well.referenced; others are cr@p.

I came back because the notices of edits and now merges have increased and I was curious to see what was going on. The merge function fascinated me and I started to try it only to realize I better LEARN how to do it right first otherwise I might really screw things up. In search of such help, I stumbled across this page. And thought I would add another perspective.

I will add that you have got my attention for awhile. I have also learned that I am much more likely to focus on those areas of my research where I feel I have strong data, do my research elsewhere, then return here when I have solid info and update accordingly.

I wonder how many other junkees you will be attracting back now that so much merge activity has started?

I would like to help out with that effort, but let me get reacquainted with the system.

Jillaine 17:28, 29 October 2008 (EDT)


I'm sorry for the late reply. We've had a number of users to come back due to the merging, but I think that in order for us to take off in a big way, in addition to merging we need to make it easy to sync your online tree with your offline desktop application. Although some people are comfortable doing everything online, most people are not. So a desktop app that synchronizes with WeRelate is in the works for next year. That plus the ability to merge should make WeRelate a really fun place to work.--Dallan 15:40, 22 November 2008 (EST)


Leaving data from gedcoms sourced by WFT etc. [22 July 2008]

Hello everyone,

I have changed my mind. I think that we should leave all of the pages uploaded on WeRelate. Using Dallan's new search engine; I discovered a gedcom that had been uploaded in May of this year. This is a family that I have researched. I have not removed the source for WFT nor have I deleted data sourced by WFT that I do not have. To date there have been no conflicts in the data.

What I have chosen to do is to enter the data that I have and source my data. The user does not have a profile; and I don't intend to contact the user. If she receives a notification via email and contacts me that is fine.

I created a new tree and add the pages to the tree as I edit them. I am researching the Coker line and this person is researching the Meadors line so some of the pages will not be added to my newly created tree. You can view the history of one of the pages here Person:Elijah Coker (1). I use FTE all of the time and am not sure about the navigation if the Family Tree Explorer is dished.

Trees that have been uploaded via gedcom with no activity that have duplicate pages in another inactive gedcom should be automatically merged.--Beth 20:48, 22 July 2008 (EDT)


Definition of Junk Genealogy [7 February 2009]

I know this seems to be a old topic; but it will keep coming up as more folks participate. I'd like to see a list or step-by-step description of what constitutes 'junk genealogy' without having to read through this whole page.

This is on my mind because I have just been badly bashed (oh, not here; but by email) as having poor judgement, ludicrous guesses and my work is an injustice to researchers. This because I connected children to a father without provable paper documentation. I had accepted the children's connection to this man as 'reasonable conjecture' based on tax records and deeds over a period of years, census records, and because I found no other man by that name located in that locality during that time period and the fact that a father by that name pettioned the legislature to have these children legally renamed as they were his 'naturally born children.' The children's descendants are outraged that I would make this connection as there is 'no proof' that the man who designated them his own is the same man I said he was. They also say that just because I have found no other man who might be their father, doesn't mean there wasn't someone else and that the process of elimination is not good research - it is only a poor guess. I somehow thought that 'reasonable conjecture' was OK; acceptable when no other info seems to be available. But if I decide to upload this to WeRelate, how in the world would it be sourced and would it be acceptable genealogy or should I rethink my position? Is conjecture based on circumstances 'junk'?

I know that this could be put on a talk page for examination but that would require the parents to be connected (and married) but that is their complaint! No amount of explaining how the conclusion was arrived at satifies these descendants of the 'naturally born children'. The outrage seems to be from the known fact that he was already married to someone else. (Divorce was not lawful at the time.) BTW, what is the difference between 'naturally born children' and 'illegimate' children under the law? This distinction is made in the legislative language of the pettition. --Janiejac 21:32, 7 February 2009 (EST)


Janiejac,

The "junk genealogy" discussion in this forum has been focused on another topic entirely-- namely people uploading unsourced GEDCOMs that they then "abandon." I don't think your situation falls into that category.

I see that you have two issues here:

  1. Appropriate way to document disputed/disputable information
  2. Reactions of descendants

I have had a similar experience so I can empathize. (See User:Jillaine/Spiritual_Wife-ism_in_Late_Colonial_Massachusetts#HIX_.2F_HICKS if you're interested; my longer manuscript goes into greater detail.)

For item #1, one option is to leave the "natural" father disconnected from the children (I did a variation of this with Moses Hicks -- connecting him to "my girl" but not connecting him with his first family), but place the theory, including the steps you've taken, the specific sources you're citing, in the narrative for him, providing hyperlinks to said children without "linking" them formally in the tree. You may also want to include in the narrative that descendants of the children do not accept that this evidence as sufficient to make the connection. This way you acknowledge that there is a dispute. (I did that on the Moses Hicks page linked above.)

As for challenge #2 If you believe, as I do in my own example, that you've made a reasonable attempt to find the father and this is a reasonable explanation, then you've done your part until others can provide another reasonable alternative. I remain open to being wrong in my theory, but I do believe that I have done sufficient research to present a reasonable theory. I haven't seen your research, but it sounds like you may have done this as well.

I will add one more piece. I did get a piece of "hate email" from one of the descendants. In my response, I apologized for upsetting them, that it was not my intent to do so, that I was completely open to being shown a different conclusion, and I re-iterated, briefly, why I believed what I believed, and asked for any suggestions they might have for additional places for me to look. This did succeed at diffusing the anger. We still disagree, but there is no more hate mail.

Best of luck to you. (And I don't have an answer re natural-born vs. illegitimate.)

-- jillaine 22:26, 7 February 2009 (EST)


I am not a professional genealogist, and certainly not a lawyer trying to prove something of a genealogical nature. But there is a thing called GPS Genealogical Proof Standard that sets standards for how research should be done.

  • Reasonably exhaustive search
  • Complete and accurate citation of sources
  • Analysis and correlation of the collected information
  • Resolution of conflicting evidence.
  • Soundly reasoned, coherently written conclusion.

Based on what I have seen on your website, I doubt we need to worry about 2, 3,and 5. "Reasonably" exhaustive is somewhat vague, but if you have searched all sources you are aware of, considered all information you have seen, and are open to any new evidence proffered, no reasonable person would expect more. Further, while you seem to have considered the resolution of conflicting evidence, as evidenced by your even considering your theory, this may be a step you need to point out to your antagonist. --Jrich 22:55, 7 February 2009 (EST)

Yes, I was aware my 'junk' was not the same kind of junk being discussed. But it had been called JUNK and I just needed a bit of reassurance. This hateful email has been going on for a couple of days, including telling me I MUST take this off my web site. I have answered as reasonably as I can but the level of their outrage is unbelievable. I don't like to be bullied. I will not be answering any more of their emails. I have explained my conclusions in my notes and in the emails but I am not changing my site just to meet their need for social propriety. They were looking for their gr-gr-grandma and didn't like what they found. Yet . . . I needed reassurance that I wasn't just being stubborn. Thanks for listening and responding.

--Janiejac 23:59, 7 February 2009 (EST)


A Bit of a Rant Regarding Junk [7 April 2009]

I'm new at WeRelate, but I'm an old hand at genealogy -- researcher, librarian, editor, and workshop teacher, over the years. I use TMG (I was a member of Bob Velke's original design team fifteen years ago) and that's where I keep everything. I have also had a website for about a decade that includes modified reverse register reports of my eight gr-grandparents. That's a preface to saying that I've recently begun uploading to a couple of websites as well. To Ancestry, simply because it's so large, and I can pick up all those census listings I hadn't gotten around to for various collateral lines. To Genie, because the social networking aspect of it has helped me get a large number of relatives interested and involved, and that's always a good thing.

Now I'm poking around in WeRelate, figuring out how it works and what it might be good for. (Like most librarians, I do a lot of work at Wikipedia, so I'm generally familiar with the mechanics.) I had been wishing for years that a well-designed wiki site would appear for genealogy, since it would enable free-form cooperation with other researchers, both within my family and outside it. Especially, my hope was that at such a site junk GEDCOM uploads could somehow be minimized.

Thoughtless, useless uploads are the bane of serious work at Ancestry, for instance, because people will download some mish-mash of a GEDCOM from World family Tree, merge it with another GEDCOM from an equally specious source, and then re-upload the new GEDCOM to Ancestry -- which only compounds the chaos. As a result, if you look at the "One World Tree" section at Ancestry, it's easy to find trees where all the children from three wives are lumped under a single wife, with the last six kids being born long after her stated death date. And so on. How is this supposed to be useful to anyone?

There are certain instances in my own lineage where an assumption of descent from an entirely unconnected person of the "right" name -- pure wishful thinking -- is claimed by practically every beginner who happens upon a connection to that family -- even though the error of making such an assumption has been demonstrated over and over again, and in detail, by serious researchers in the family. And these people upload the specious linkage in yet another junk GEDCOM, leading others to assume (again) that the information must be correct.

My apologies for ranting on this subject. It's a hobby horse with me, I admit. Every time I teach a class for novices, I emphasize repeatedly the necessity of doing your own research and citing your sources, or at least checking the research of others before you adopt it wholesale. I would like to see WeRelate -- or some such website -- take a principled stand on this. YES, there IS "good" research and "bad" research. YES, there IS such a thing as "junk" genealogy -- which should never see the light of day on a website that wishes to carve out a niche for itself. Not all research is equal, folks. You can encourage good research practice, you can teach family researchers how to do good, useful work. Or you can cater to the lowest common denominator and accept anything someone wants to dump on your server. Accepting everything uncritically actually damages the work of other researchers. Do you want WeRelate to be different from those other sites listed in the table farther up in this thread? "Different" in a meaningful way? Please think about it. --mksmith 17:15, 3 April 2009 (EDT)


We started WeRelate with the idea that, as a wiki, junk could be removed by better researchers. We have a number of users doing this - fixing up and merging information. There are two areas where further improvement is needed:

  • We've got to stop uploading GEDCOMs that create Person and Family pages that are duplicates of what we already have. The new and improved GEDCOM import function currently in final testing at the sandbox will hopefully solve this problem. It makes GEDCOM upload much more labor-intensive because you're required to review the families in your GEDCOM that have potential matches to existing families and link them to those existing families rather than creating duplicate pages, but raising the bar for GEDCOM uploads may not be a bad thing overall. The new GEDCOM import function also requires people to review suspect dates in their GEDCOM and gives them a chance to correct them before import.
  • We need to encourage better sourcing. Until recently I didn't realize how much genealogy was unsourced or where the source was simply another GEDCOM file. I have a lot of theories as to why this is the case, and I believe that at least part of the problem lies with existing tools making sourcing difficult. We've been cleaning up the source database for the past year in preparation for a big effort the latter half of this year to making sourcing easier.
The amount of sourcing going on in WeRelate articles is about the same as on Ancestry. In the case of Ancestry about 95% (literally, I've got data that shows this) of the lineages lack ANY source information. Of that remaining 5% who do source, most are simply citing GedCom's or someone else's lineage that they drew from. Only about 1% of the lineages actually provide the underlying sources, either pointing to original sources (in the BCG sense of the word) or point to something that's making use of original sources. The numbers might be a little better here on WeRelate, (I've only looked casually) but not by much. NOT sourcing is the near universal truth among genealogists on the web---unfortunate, but that's the way it is.
I believe one of the reasons for that is because people don't really understand what's needed. The 5% I mentioned, ARE trying to do the right thing---they ARE sourceing their information---in the sense that they are saying where they got the data from.---"Why, I got it from Billy Joe's GEDCOM!". What they don't understand is that this actually is not what's intended when professional genealogists say that you need to cite your sources. They really mean you need to point to the underlying sources of information---the original sources, contemporary with the events, that show that "Mary Platt born 1675, was the daughter of Epenetus Platt and wife Jane Wood". And usually, you can't do that with a single source---you usually need multiple interlocking sources to show the truth of the a statement like the above.
And that's really too much work for most people. They'd much rather take someone word for it---"Just tell me what the right answer is so I can get it down in my GEDCOM." And despite the best intent, that is not likely to change much. You can encourage folks to do the right thing, but the right thing is much more than explaining where they got their information. Q 14:49, 6 April 2009 (EDT)
I think Quolla6's analysis is probably pretty close to the truth. There are stages in appreciating sources, as in most other areas of growth. Keeping sources is only the first step. Finding a source that is wrong, perhaps stupidly wrong, helps you appreciate the quality of sources. You have to get burned to learn to avoid fire. You may know it's dangerous, but until you are burned, you don't know what priority to give its avoidance, or how much effort to spend avoiding it.
There's already a reason why genealogy attracts mostly older people, and it's not just because you have to experience life to appreciate what your ancestors did to get to you, but also because it takes time. If you have a job, and young kids waiting when you get off from work, it is hard to find the time to do exhaustive searching of sources. WeRelate, books.google.com, and various websites will help distribute the sources making this easier. WeRelate's great value will be to provide the connections that identify a person more accurately than names which may be common, misspelled, aliased, etc.
Hopefully accumulation of good sources on WeRelate is like a ratchet wrench, only going one way, towards better sources. The great fear of GEDCOM uploads is, of course, the danger to this progressive march --Jrich 16:06, 6 April 2009 (EDT)
Ratcheting. Good point. Perhaps the advantage of WeRelate is that those who are further along on the learning curve, can insert the underlying sources as they come to them. Hopefully, bad genealogy does not drive out good, and gradually a corpus of well sourced, well documented family relations will be built. And perhaps, as good stuff builds up, others will see what's needed and act move along on the learning curve themselves. There's really no such thing as Junk genealogy---just a lot of sincere, but incomplete work that sincerely needs some TLC. Something that I notice one of the other wiki's starting to do is emulate the Wikipedia's "Barnstar" approach of flagging well-done articles. Perhaps that's something we should do here. That way people will be able to spot well done articles, and hopefully, see what's involved in an article that others think is well done. Q 16:24, 6 April 2009 (EDT)
Q, We already have a variation of Wikipedia's "Barnstar" approach-- in addition to the "Nominations" option (which we could certainly all use more actively), the recent portals also serve as a way of promoting both good examples as well as "featured pages". I know that there are plans to seriously improve the help pages -- including examples of "model pages" as part of the help system would also go a long way to encouraging good practice. jillaine 08:28, 7 April 2009 (EDT)
Wikipedia uses both "Barnstars" and Featured articles. Presumably a featured article is thought to be worthy of a barnstar, but not necessarily. I know in Genealogy Wiki they also do both, recently adding the Barnstars to articles. I think they started with those that were featured, but I suspect not every feeatured article there got a Barnstar, and probably some with Barnstars have not been featured. They also are trying to implement a "Quality Scale", rating articles 1-5, with 5's being candidates for Barnstars. They are trying to use specific criteria for rating articles based on whether they meet certain requirements. I think that is a good approach, as it makes the evaluations objective rather than subjective. I'm not sure that describing someone's work as "garbage" does much to encourage them to improve their articles---but telling them that its a "1" on a scale of 1 to 5, because it lacks "sources", a "family register", "problems with English", etc. makes it the fault of the article, not the person, and gives them specific objective guidance on how to make it better. Perhaps the point is that since there are no criteria for making an article "featured", there's no way to tell why it was featured. Seeing that a specific article is "featured" doesn't tell you it was genealogy well done--or at least why it is thought to be well done. Q 08:57, 7 April 2009 (EDT)

There's also a line of thinking that says that unsourced material shouldn't be allowed on WeRelate. I don't agree with this; it's like saying that articles without good references shouldn't be allowed on Wikipedia - I think it raises the bar too high for a lot of people who would eventually turn into good contributors. I think it's better to allow unsourced articles to be added and then encourage others to make them better. However, semi-protecting certain pages - say pages for famous or medieval people (people with a link to an article at Wikipedia) or people with more than 5 people watching them - so that they cannot be edited during GEDCOM upload but instead must be edited manually, makes sense. This is also implemented in the new GEDCOM import function.

Finally, the new GEDCOM import function could give administrators a chance to review every GEDCOM, or every GEDCOM above a certain size, before it was uploaded, so make sure that it didn't contain a lot of junk. We haven't decided whether this should happen because it could slow down the turn-around for uploading GEDCOM's, but if we had enough people willing to review incoming GEDCOM's, I'd be in favor of giving it a try.--Dallan 11:38, 6 April 2009 (EDT)


Quality Scale [8 April 2009]

Above, Q mentioned that the Genealogy wiki is considering something like a quality scale of 1-5, with criteria identified for each level. It appears to be being discussed here. They also appear to be basing it on Wikipedia's own article assessment process. Seems like a good idea and that werelate should come up with our own variation of the charts on the latter page. jillaine 09:18, 7 April 2009 (EDT)


I like the idea, but what about basing it on something like the genealogical proof standard? Perhaps something like: no sources vs. some sources vs. every fact is sourced vs. facts are sourced and conflicting evidence is presented and analyzed? I don't know the standard well enough to say what the levels ought to be. I do believe that one of our long-term goals ought to be to encourage and make it easy for people to cite sources.--Dallan 09:51, 7 April 2009 (EDT)

To start, how about a barnstar for pages with a non-gedcom source? Something that simple might even be automatable, so that the system could add the barnstar automatically?--Dallan 09:56, 7 April 2009 (EDT)
Dallan and others, the GPS standards are not a spectrum from good to better, but are five "elements":
  1. a reasonably exhaustive search;
  2. complete and accurate source citations;
  3. analysis and correlation of the collected information;
  4. resolution of any conflicting evidence; and
  5. a soundly reasoned, coherently written conclusion.
We could do some sort of "star" system, five of which would have to hit all five of the above.
And yes, most of what we've got would have zero stars at this point.
jillaine 10:53, 7 April 2009 (EDT)

I probably missed something. The scale talked about appears to communicate an award system more than quality. The ratings were not a scale. They reminded me more of the student of the month awards in elementary school, electronic certificates that this article was once a featured page.

I do not really like rating anything, unless it can be computerized or done entirely by following a flowchart based on simple yes and no answers. In other words, the hard part is writing really good criteria for assigning the rating.

I don't think rating articles is a good idea. I think the quality of articles speaks for themselves, and people read genealogy pages because of the topic, not because of the quality. Some people are not well-known, and any page on them will be poor, but the reader may be happy to get anything they can. Setting up some arbitrary criteria for passing out atta-boys could potentially distract people into chasing form instead of substance. I prefer to let the natural diversity of different people's varied interests to cover all the bases, rather than artificially favoring one aspect of genealogy over another by setting up a rating system that values one thing, but not another. And I suspect WeRelate pages are more volatile than wikipedia pages, so how does a rating get maintained in the face of changes to the page?

There may be some value to rating sources based on comptemporaneous-ness and how far removed from original, but any criteria must be very simple and mechanical or it probably is too subjective. As a side effect, a page could be scored based on having sources for each fact and their relative quality. But, in any event, research must consider all the available sources as a body and sometimes what looks like the highest quality source is the one that appears to be in error. So trying to come up with a rating system of sources carries the risk of short-circuiting a thorough analysis because it is so easy to just take the highest rated source.

I believe rating sources has been discussed relative to GEDCOM updates, obviously with no commitment to doing it. The difficulty is figuring out a way to automate it, which is the best way to get consistency and thoroughness. --Jrich 10:22, 7 April 2009 (EDT)


Jrich,

We're not speaking of an arbitrary award system, but a method for identifying solidly done research (for which there is a set of existing criteria). This would highlight strong models of such research and encourage others to improve their pages to meet said criteria.

jillaine 10:46, 7 April 2009 (EDT)


Dallan, That would be a good starting point, though I'm not sure how easy it would be to automate this. Perhaps making the criteria "inclusive" rather than "exclusive" would be appropriate. That is, tell people what they should have, rather than what they shouldn't. But that would be harder to automate: Easy to exclude a page that uses a GedCom as the source, hard to identify a page that uses something else as appropriate. As an example of that, giving a barnstar to a page that had no GedCom identified as a source would give a barnstar to pages that had NO sources---at least citing a GedCom gives lipservice to the idea of sourcing, so is an improvement over no sources at all,

I suspect that at least at first, you need to keep this as a "people decision" until criteria can be worked out. Perhaps the way to go about this is to focus initially on the featured articles. Looking at them systematically might give us a good idea of what people like in articles. Since the point of departure for this was "encouraging sourcing", the use of "original sources" in the BCG sense of the term, would be one criteria for giving a Barnstar---but "original sources" is probably too stringent to start off with, as not many pages would meet that criteria. Perhaps "effective use of sources" might be a less restrictive criteria that would still encourage good sourceing practice. Others might be effective use of the narrative section of the article, and the effective use of graphics. But limit the field initially to the featured articles. Then you can explore what works as a criteria and what doesn't using a suite of articles already identified as articles people (or at least someone) liked.

IN agreement with Jrich, I'm not sure you really want to get into the business of rating every article (computerized or not). Genealogy Wiki can do that because they have relatively few articles to work with. 2M plus here mitigates against evaluating every article, unless there's a computerized way to do it. If the idea is to give people examples of what's considered a well done article, than Jillaine's point about "featured articles" is very well taken. The problem there is there's no critieria as to why a particular article was featured. Using a barnstar to denote on the page that it is thought to be a good example of effective sourceing (referenceing BCG standards), might help make it clear. It could also be used to point people to good examples to emulate for various purposes. Q 10:52, 7 April 2009 (EDT)



Articles vs. People/Family Pages [8 April 2009]

Mm... Perhaps we are having two separate conversations here. Where *I* am coming from is in response to the topic of this page-- i.e., "junk genealogy" -- the lack of evidence provided, the lack of cited sources. If we focus on that, we can come up with some pretty solid non-arbitrary criteria.

But if we're talking about rating other types of articles, that seems to me to be a different topic (and discussion page) altogether.

-- jillaine 11:00, 7 April 2009 (EDT)

Reading the last few comments, it's my impression, too, that we're talking about two different subjects. I don't think you can "rate" Person pages based on how much they present. I have any number of folks in my database on which there simply isn't much to find -- and I've been looking for several decades on some of them. If after thorough searching, you have only a marriage date, say, and one land transaction -- and you've added some thoughtful speculation based on that -- you've pretty much fulfilled the Genealogical Proof Standard. It's unfortunate that there isn't much to find, and you may never learn much about them, but you've done the work in a proper manner.
Now, in the matter of someone uploading a skeletal GEDCOM that includes no "real" sources and zero interpretation, it comes down to only two ratings: Acceptable vs. Unacceptable. And the great majority will be the latter. Which, I guess, is why I would have to question the point of even allowing junk GEDCOMs on the site. With no sources, they're worthless. Worse even than that, they're often misleading to the inexperienced. (I'll stop there before I get started again. . . .) --mksmith 15:37, 7 April 2009 (EDT)
Where is that list of stages of a genealogist again? I would like to see that given a prominent place somewhere as I think all of us can find ourselves somewhere on that list. As someone who has recently experienced being raked over the coals for posting what I thought was 'reasonable conjecture' I can empathize with those who aren't as experienced and are told their work is junk. That will certainly not encourage them to learn how to make it better. They will just decide to post their info on another site which is more welcoming. And which won't help them learn any better either.

Some of my work has very good sources; some of it is from other folks' research and I work to give them credit for it - though I don't know their sources. But I want to be comfortable posting here on this site where someone else with better sources can fix what I don't know. I would hope that would be one of the benefits of a collaborative site. --Janiejac 16:29, 7 April 2009 (EDT)


If I misunderstood above, I apologize, but the link you provided was about rating articles, and scale included two ratings designed to flag featured articles or almost-featured articles. I have seen plenty of websites where users rush to collect post of the day awards, and don't think that is a good idea. I then shifted in my remarks above to source quality because it seems like the only reasonable way to rate the quality of pages.
Like mksmith, I think the controls need to be put onto GEDCOM updates. Why not put screens on the window, instead of running around the house with a fly swatter? In some ways, the new controls of 5 people watching is so overwhelmingly simple and easy to understand (while it may not be my first choice) that I think/hope it will work, and think we should wait and see. It should mean I only have to fix a page 4 times, and after that, then there will be myself and 4 other people watching it. :-) Maybe I could even get all my family members to sign up so I control five user ids myself. :-)
Scoring pages (a phraseology I like better than rating) would not be too obnoxious if it only measured completeness and quality of sources. It would indicate pages where more help would be beneficial (in case there are people with spare time on their hands), and would provide guidance to new users of what is desired. Any kind of scoring system for sources will depend entirely on the actual criteria, which hasn't been discussed at all. If rules are developed around the scoring (e.g., you can only update data by providing a higher quality source), I suspect there will always be situations where the scoring system gets in the way of doing valid updates. --Jrich 16:52, 7 April 2009 (EDT)

Proposed Rating / Scoring System [8 April 2009]

Desired outcome: high quality, sufficiently researched/sourced data on people and family pages.

0 = no source information at all

1 = Up to 25% of the data is sourced or otherwise a strong case is made

2 = 26-50% of the data is sourced or otherwise a strong case is made

3 = 51-75% of the data is sourced or otherwise a strong case is made

4 = 75-99% of the data is sourced or otherwise a strong case is made

5 = Fully meets GPR (?) standards

-- jillaine 18:14, 7 April 2009 (EDT)


Jillaine out of the 2M articles on WeRelate, how many do you think fall in each of these categories, as defined above? Q 19:32, 7 April 2009 (EDT)


Out of the 2M articles on WeRelate, how many are you going to read to figure out if "a strong case is made"? I tried a couple of times to come up with a scoring system that would involve mostly computer calculation but believe it would inherently useless. Even the best quality sources have errors and no formula fits every case. Each page needs to be analyzed in-depth by the interested people who have knowledge of the subject. That would be the normal course of things, if the only style of input was field-by-field typing data into the page. I would like to believe that even the greenest of genealogists, if forced to at least look at what was on the page, and who knows, maybe even read it, would bow to high-quality cited sources. That does not happen because GEDCOM update enables an unthinking, unreading, one-way, non-collaborative mass update and so empowers people to change data faster we can respond with nasty messages on their Talk pages.--Jrich 19:48, 7 April 2009 (EDT) P.S. To be fair to Dallan, the new GEDCOM update really slows down this process and does cause some thinking. However, the display of sources is so cramped on the update/merge screen, that I am not sure people will do a good job. I think immunizing pages from GEDCOM updates by accumulating 5 watchers will be what really ensures that changes are towards higher quality.

A viewpoint I also agree with, though I don't know about that "nasty message" part. Personally, I don't think scoring every page on WeRelate would a) be do-able, b) achieve the desired goal. I presume that the desired goal is to get better pages. I think that purpose is better served by highlighting articles that meet some standard. Sourceing is part of that standard, but not the only thing involved in a high quality page. Denoting a page as a "featured article" is probably part of the solution, but by itself its incomplete. Also, a high quality page on this site is NOT necessarily a "person page". Q 20:00, 7 April 2009 (EDT)

As I merge duplicate pages, it would not at all be difficult -- assuming we had a set criteria in place -- to take an extra minute and score a page as to how well sourced it is. So it could be done, by humans, while de-duping the already existing pages. In fact, this is where it comes up most for me-- while I'm merging.

And while I concur that high quality pages also include pages that are not person or family pages, the overall point of THIS talk page is about dealing with "junk genealogy" -- i.e., the crap that people upload (i.e., what becomes person and family pages) -- not with other types of articles. The rating / scoring system I'm proposing concerns the person/family pages, not other types of articles. If you want a scoring/rating system for those, feel free to propose one in the appropriate discussion forum.

-- jillaine 21:58, 7 April 2009 (EDT)

Jillaine, think about it. There are 2M articles on this site, most of which are person articles. NO, its not hard to do one, or ten, or 100... 2M is an entirely different matter. But more to the point, roughly how many of those two million do you think meet your various criteria? Q 22:03, 7 April 2009 (EDT) Q 22:03, 7 April 2009 (EDT)


Probably 95% do not meet the criteria. But so what? The point is to put something in place that encourages people to have something to shoot for. A bar. A standard of excellence. A standard that says: this is what we're shooting for here at WeRelate. Help us make that happen. jillaine 22:14, 7 April 2009 (EDT)

I think you are being overly optimistic. After you cast out the cards for people who show NO sources, plus those whose source is simply a GedCom file, or another unverifiable source, you'll probably have less than 1% of the cards getting into your second category. Scoring each card is a) not possible by hand, b) unlikely by automated means. The approach suggested by Dallan is plausible, but as Jrich pointed out, its probably a more complicated problem than can be done by machine. Might use the machine to winnow it down. But there are easier ways to show the way than grading each and every card. Q 22:32, 7 April 2009 (EDT)

please propose something, Q. jillaine 22:42, 7 April 2009 (EDT)
I believe I did, though I didn't elaborate. There are two elements that are needed.
  • First, a presentation of what a "good" person article would entail---ie, criteria, and the criteria definitely go well beyond sourceing. The BCG standards of proof, for example do include sourcing, but they include other elements as well.
  • Second, a set of examples selected to illustrate "good" articles--articles that meet the criteria to varying degrees.
One might also include examples of articles that have problems to be resolved in terms of their quality, but I think you'd get further by not going the critical route. Positive examples could be featured articles, but its not usually clear why they are featured, or what makes them good. If the issue is "good genealogy done here", then you need something to say "this is what doing good genealogy requires, and here's an example of that." It would probably be useful to mark such articles so they would be easily recognizable when people came across them. That's why the Barnstar approach works well on Wikipedia. I think Wikipedia gets carried away with Barnstars a bit---seems like they have Barnstars for just about everything, following the motto that everybody likes to be praised, so everybody gets an award, even if we have to make one up. (I believe that was what Jrich found annoyingly gradeschoolish about this approach.) But a limited set of Barnstars to highlight certain features well done might be appropriate. Then again, maybe a single barnstar to mark "good genealogy done here" would keep it more focused. Q 23:08, 7 April 2009 (EDT)

Mmm... Perhaps we're not so far apart as it was feeling yesterday. (And apologies if I got a bit snippy yesterday; a couple of you pushed a few old buttons of mine related to how feedback is provided and I let it get to me.)

Just out of curiosity, what buttons got pushed? I try to keep things objective, and have no interest in pushing folks buttons. So it helps to know when there's an inadvertant button pushing. Q 09:04, 8 April 2009 (EDT)
Sigh. I don't want to get too sidetracked on this, but basically, my buttons get pushed when someone (anyone) makes a proposal and then others come along and only point out what's wrong with it. MY part in this is that sometimes that's all I see-- the criticism-- and not the good points. I'm working on it. -- jillaine 09:26, 8 April 2009 (EDT)

I'm not proposing that we go through and rate/score 2M "cards" (interesting alternative to the term "pages"),

Traditionally, genealogy programs have referred to the pages that contain basic family information as "cards". I believe the term arose out of the Apples original "Hypercard" which eventually was superceded by the web. (Not sure how much Hypercard contributed to the web development process, but its awfully similar in style to what's characteristic of the web.) "Reunion" is a program that fairly clearly shows its hypercard roots. Q 09:04, 8 April 2009 (EDT)

but I am suggesting that we have some sort of criteria that we can use so that when we do find a really great person or family page that is well sourced that we can rate/score it. And that doing so would then somehow "lift it up" -- either as a featured page or in some other way -- so that people can see examples of good research. And if they want their pages to be so rated/scored, they know what they have to do to get there.

We could similarly see a potentially great page and give it a lesser score/rating such that people watching it could see what is still needed to be done to bring it up to excellence. So perhaps the rating might be more like this:

  • 5-star - Meets BCG standards of excellence
  • 3-star - On its way to excellence; needs one or more of the following (and list the BCG requirements for excellence)
  • red-flag (or something) - Could possibly be automatically added to all un-sourced pages as a notice/warning that said page has no sourced information at all and should be relied upon with caution. I realize this is a more negative than positive approach but would certainly get people's attention.
as in ? Q 09:04, 8 April 2009 (EDT)
Dallan, on this last, I wonder if there's a way to do some sort of scan during upload, that comes back to the user with a warning message: "Our scan of your file indicates that your data has no (or insufficient) source information included; currently WeRelate is not accepting such files. For more information, please visit... etc."
I think something like that was what Dallan originally suggested. Certainly seems do-able, though it could only be the jumping off place. The problem with this is that virtually every dang page on this site is going to have a red flag! Q 09:04, 8 April 2009 (EDT)
Well, I'm not sure I concur that virtually every page would have a red flag, but even if that was true, then so be it. It's incentive to encourage people to do what's needed to bring data up to standards.
I've looked fairly closely at the quality of work produced on Ancestry. Ancestry is well suited for looking this because you can use their search engine to pull up the number of cards that include notes, sources, and both notes and sources. I' used that as an index to whether folks were documenting their data, and attempting to present an analysis of it. There are problems with this approach, but as a first approximation, it gives a fairly decent picture of how things are being done by the users of Ancestry. This is more difficult to do on WeRelate, but I think the Quality on WeRelate is probably comparable.
On Ancestry a study of a 0.1% (I didn't do the statistics on this but given the size of the sample, over 600,000 cards, I think its reasonably accurate) sampling of the ten most common surnames showed that 20% of the cards included "notes", and about 25% showed sources. Cards that showed both notes and sources (which I took to be evidence of looking at things in more depth) amounted to about 5% of the sample. When I manually looked more closely at that 5%, I saw that probably only about 1% of the total were showing what I would consider legitimate sources (most were GedComs and other unverifiable sources.)
The significance is that if you applied your red flag to the cards on this site, virtually all (99%) would probably get a flag. I could be wrong, but I don't think that's going to encourage people to do better work---more likely its going to turn them off, and they'll go away altogether. Q 10:12, 8 April 2009 (EDT)

And how many pages can you point to that you think meet all 5 BCG proof standards? I've seen some nicely done pages here, but I don't know that I've seen any that meet the BCG criteria. Brag on yourself if you think you've got some of your own that meet those standards, and point them out. I personally have none that I think measure up to that. Q 09:04, 8 April 2009 (EDT)

There may be NO pages that currently meet all 5 BCG proof standards, but it would be a great to set goals towards which to strive. I could imagine seeing a scorecard on the home page-- a chart-- that shows the increase in the number of 5-stars. Research indicates that when you give a team of people data about how well (or not) that they're doing-- just hard data-- they are more likely to engage in behavior that improves practice. Okay, now let me go find the citation for that research. ;-) jillaine 09:26, 8 April 2009 (EDT)

-- jillaine 08:33, 8 April 2009 (EDT)


If the point is education, just having 100's of five-star pages will be too unfocused. If somebody does not know what is good, they will probably not be able to cull what is good from one example. To figure out what is good, they would need to find what is common among many examples, and probably will not take the time because the people covered don't interest them. What is needed is one, or a very small number of example pages, with annotations explaining explicitly what features of that page are good, and why.

If the point is let people know their work is appreciated, how about adding a button that says "This helped me". When somebody pushes that button it sends an email to all the people that contributed to that page saying "Thank you. Your work has helped me." But please, give us the option to turn off receiving these emails.

I have no problem with some monthly set of featured pages. I think this should be done for general interest rather than quality, the way NEHGS does their stories of interest in their newsletter. I have no connection with Abraham Lincoln but was interested in a story detailing how he no longer has any living descendants. In general, I find that reading about people with no connection to me isn't all that interesting. So the featured article needs to have broad relevance due to history, or some interesting genealogical issue.

Anything else that isn't applied equally to all 2M pages (someday to be 2B pages) strikes me as arbitrary, an accident of what page is read by whomever is empowered to award 5 stars. --Jrich 10:15, 8 April 2009 (EDT)


Featured pages aren't necessarily examples of genealogical thoroughness. They're generally chosen because someone happened to notice the article and add it to the nomination page, and they have a picture and an interesting story. I hope eventually we'll develop a more formalized process for choosing featured articles, but it's pretty ad-hoc at present.

Thinking more about this, if we tried to rate a reasonable number of articles we would have to use an automated approach rather than manual. There are too many articles to rate manually. If we were to use an automated approach, it would need to be pretty simple. I don't think we could automatically rate articles based upon the genealogical proof standard. It's probably even asking too much to automatically rate articles on a 1-5 scale. A yes-no scale would be do-able, like displaying an icon if the article contained at least one source, possibly requiring the source to not be a gedcom.

Alternatively or in addition to an automated approach that attempted to give a simple yes/no rating to person/family pages, people could be encouraged to give a human rating to pages. Not many pages would get rated this way, but the criteria could be more involved. Having said that, adding a barnstar to a page (another yes/no rating) might be more encouraging for people than getting a 1-5 rating. I'm not wild about putting red flags on 99% of the pages here. I like the "this helped me" button. We could keep track of the number of times that button was pressed (by different users) for each page and display that. We could even look at pages with a high "this helped me" count as featured page candidates.

On a related thought, I've been thinking that maybe one way to encourage people to cite sources is to let people a list of all of the source citations they have added to their pages, and show the total number next to their user name wherever their signature appears. (I'd omit the number if it was zero.) This wouldn't happen right away of course. But maybe there's a better approach.--Dallan 10:16, 8 April 2009 (EDT)


Alternative Approach [23 April 2009]

In response to Dallan's comment (and I think I'm repeating myself, but just to be clear)

yes, its not possible to manually rate all of those pages
See above, I am not suggesting we rate ALL pages. jillaine 10:45, 8 April 2009 (EDT)
yes, the only way that could be done is automating the process.
I still don't agree with this; especially if we are highlighting well-done pages that illustrate the kind of quality we're seeking, this could absolutely be done manually as Dallan suggests above. jillaine 10:45, 8 April 2009 (EDT)
Since we're in agreement on this, I probably should not comment. However, the point you were disagreeing with was that it was not possible to rate ALL pages manually. I think we are actually in agreement that rating ALL pages MANUALLY is not possible. Identifying (and perhaps rating, if people see a need) well done pages would be doable and very helpful to show what the site is striving for.---(You just couldn't do that and be sure that every worthy page was so evaluted). The Barnstar approach seems like a reasonable way to do this, though there are other ways this could be done. Lead by example. Celebrate the good, not convict the evil....(or in this case, the mis-tutored.) Q 08:57, 11 April 2009 (EDT)

The problem is, the likely tools needed for the machines to evaluate the quality of the pages, using anything like the complete BCG standards, just isn't there. Eventually, maybe will have that kind of capability...but not now. So, rather than try to climb the whole mountain all at once, perhaps we should be satisfied with smaller steps.

And I'm proposing smaller steps; see above. jillaine 10:45, 8 April 2009 (EDT)

Going back to Dallan's initial proposal to identify pages that lack any sources whatsoever could be the way to go. Perhaps rather than slapping a red flag on them, a note could be posted to the (talk page perhaps?)---something along the lines of "WeRelate is attempting to help its users improve the quality of their pages. This page would be improved by indicating the original sources on which the information (such as Date's of Birth) is based on. Can you help us improve this article by adding sources for such information?"--Then the message could point to appropriate pages for guidance in what is needed in the way of sources. Q 10:33, 8 April 2009 (EDT)

I LIKE this last idea. Nice. jillaine 10:45, 8 April 2009 (EDT)
This could be done pretty easily the same way Wikipedia does it with pages that are stubs, or which lack citations -- by having a template that says "The page has *NO SOURCES*! Please add sources! Otherwise, this page will be considered for deletion!" . . . or something (possibly less shrill) along those lines. Since it's a template, in addition to a bot adding it to source-less pages, anyone cruising the site and examining pages could add it as needed, either to the top of the main page (more highly visible) or to the talk page (possibly more polite). Those pages would be automatically added to the list in a category: Pages Without Sources. Users interested in those individuals would then be warned to hunt for sources themselves if they want them to survive, and possibly to merge them to the user's own pages. (I'm already starting to do some of this with the Hatfield pages I'm creating, and I've seen several other family groups that I'm familiar with and which need to be merged, or sourced, or both.)
Certainly, that would be one way to do it, though 99% of the pages would end up getting tagged. It would be less intrusive, less in-your-face than a "red Flag" appearing on the page. There's no distinction to be made when virtually everything is tagged. Perhaps this could be done with a selected set, or as people come across pages they'd like to see better documented. A common template, perhaps to be placed on the articles talk page, could be effective. But to do something like this, we still need the underpinnings---what are the standands that should be met. If we can't point to a set of standards, how can we tell people what they should be doing? Q 10:08, 11 April 2009 (EDT)
Perhaps we flag pages of people born within a certain timeframe to begin with, say 1600 to 1700 in the US. There are lots of sources for that time period, and those people tend to be the ones with the most watchers, thereby the ones that might most benefit from a flag. And I would bet that the percentage of pages with multiple watchers without sourced from that time period is actually much lower than generally, so it wouldn't be quite so overwhelming. If it has the intended effect, we can expand the project.--Amelia 12:00, 11 April 2009 (EDT)
Speaking of stubs: Would ya'll consider a page that has only the name and nothing else -- no dates, places, or relationships -- a "stub"? I've seen some like that. They're just floating out there, isolated. Should they be similarly marked? (Well, they have no real sources anyway, only the GEDCOM or "OneWorldTree".) --mksmith 09:42, 11 April 2009 (EDT)
If the pages are truly isolated, and linked to nothing, contain nothing, and if they've been floating around in e-space for sometime, they probably are just flotsom and jetsom, worthy of having their electrons freed to be put to more useful purposes. On the otherhand, that's a function a machine could perform more efficiently. Perhaps the critiera for deletion would be A) No data other than title, B) No link to any other page, C) been in existence for X months, without activity. Q 10:08, 11 April 2009 (EDT)
And Dallan, what about my question above about some sort of scanning upon GEDCOM upload that displays a message and does not accept the unsourced GEDCOM? jillaine 10:45, 8 April 2009 (EDT)

I want to set a minimum standard for uploaded GEDCOM's, especially large ones, but I don't want to set the bar too high that it discourages people from participating in the community and thereby learning how they can improve over time. I think about this like learning to play chess -- you wouldn't want to tell people that they had to play at a certain level before they could join an on-line chess playing group. What I need to do is analyze how many of our current GEDCOM's didn't contain sources, or didn't contain either source or notes.--Dallan 11:14, 8 April 2009 (EDT)

Well, . . . to carry out your chess club analogy, Dallan: If your club meetings usually attracted fifty regular players who knew the game (never mind that some players were better than others), and the meetings were suddenly flooded by a thousand new people who didn't even know what a chessboard looked like but insisted on having a place at the table, . . . how long would your club survive?
I'm definitely not saying beginners or "name-collectors" shouldn't be allowed in the playground. My wife and I teach classes regularly in which we try to teach those folks how to do "good" genealogy. And she's the Examining Genealogist for the First Families of Louisiana Program, which is also about encouraging better research standards and reporting methods. But a lot of people who simply accumulate GEDCOMs have no interest whatever in doing actual genealogy. They'll upload here (and at every other site they come across) and walk away. They're not teach-able. So the question becomes, does WeRelate simply allow anyone who passes by to dump their garbage on its front lawn and drive away? How many family researchers who discover this site look around, note the very high proportion of OneFamilyTree clones, shake their heads, and write WeRelate off as yet another dumping ground? --mksmith 12:32, 11 April 2009 (EDT)
Its a good point, but as I look down the corridor into the future, I think this is going to become less troublesome. Despite my concerns about the willy nilly merging going on, there's real merit in having only a single card per person. Eventually, that will tie everything into one large integrated tree. What that means is that folks will no longer be able to simply add a new branch or build an entire tree here, as the persons they are interested in will already be embedded in the tree. At some point, people are going to have to start working on the data that's associated with each card---finding supporting evidence, building narrative descriptions, etc. Good genealogy is going to start to drive out the less good. That means better sourcing, better articles, and a better educated user community that understands the values of sources etc. So if some folks today are dumping poor quality work onto the site, I'm confident that such work will eventually be scrubbed away. In the meantime, I'm not worrying about it too much.
That's not to say I don't think we should encourage people to do good genealogy here. We Absolutely Should. Finding the right way to make that happen is what we're really talking about. That may be a with a rating system, or perhaps with a barnstar approach. Perhaps both at once, carrot and stick, perhaps something else. What ever it is, I like Amelia's suggestion above, that we might start with a certain time period---say 1600 to 1700. Perhaps Dallan can tell us what fraction of the total number of cards have DOB's between those two dates?
But some ground work is needed first---there needs to be a statement concerning what's expected in terms of standards that each person page should meet. We need to identify pages that we think meet those standards, so we've got something we can point people too as models to emulate. Perhaps rate or flag the pages that don't meet the standards, perhaps barnstar or feature especially good articles that do meet those standards. Then either rate the pages in the target period, or otherwise identify pages that need improvement and send the appropriate users an appropriate message asking them for their help.
Would someone like to take a shot at identifying the standards that should be met? Stictly BCG, or do we need something broader, less intimidating, that can be achieved by many people? Q 17:33, 11 April 2009 (EDT)
Excellent points, well made. In fact, most of this should be tucked away for a Statement of Purpose page. --Mike (mksmith) 20:37, 11 April 2009 (EDT)
The fastest way to teach people is, surprise, lecture. In other words, tell them what you want. So yes, the first step is putting together the criteria. Flagging pages is feedback to the author, but is not very effective for teaching to the wider audience, or for educating new users. Once you have criteria written, then find a small number of clear examples of each point, and maybe another small collection of counter-examples to show what not to do, pointed to by links where the criteria is explained. It adds to the authenticity to use real pages for examples, instead of making them up, at a slight risk of diluting the clarity of the example. --Jrich 21:18, 11 April 2009 (EDT)
The concept of "Mentoring" comes to mind. "Better genealogy one person at a time." Q 21:35, 11 April 2009 (EDT)

I just want to mention one thing to note, when I uploaded my GEDCOM, I purposefully omitted my sources and notes. I wanted to avoid creating a MySource mess. I wanted to put my "shell" of information online, then figured I would take the time to add my sources correctly afterward. I would hate to "judge" GEDCOMs based upon that criteria.--Jennifer (JBS66) 11:25, 8 April 2009 (EDT)


Some stats: of the roughly 2400 GEDCOM's uploaded, just over 500 of them (22%) don't list any sources, and just over 300 (12%) don't list any sources or notes.

As an alternative to disallowing GEDCOMs without sources or notes entirely, we could flag them for administrator review and ask admins to contact the uploaders to ask if they plan to add sources/notes once the GEDCOM is uploaded. This is more work for the admins, but maybe not too much effort since relatively few GEDCOM's don't have any sources or notes.

As an aside: Once we get source matching working later this year, I'm thinking about not creating MySources anymore for uploaded GEDCOM's. Instead, the information for unmatched GEDCOM sources would be added to an expanded source-citation section directly on the person and family pages.--Dallan 12:06, 8 April 2009 (EDT)

Since you're talking about "sources" that are just the date of the GEDCOM upload, etc, you could almost just put that info in the page's text box, rather than creating source statements for what are actually non-sources. --mksmith 12:32, 11 April 2009 (EDT)
To clarify, if there were 300 GEDCOM's that didn't include sources or notes---that means a substantial number did---but those stats are for an entire submission, not for individual pages. The distinction is significant. If a submission contained just one source and one note, then the entire GEDCOm would get scored as having sources and notes---To get comparable numbers as I cited for Ancestry, you'd need to examine specific cards, not the entire GedCom. Q 12:24, 8 April 2009 (EDT)
That's right -- my guess is that about 1% of the pages here have real (non-gedcom) sources. This discouraging number is a big reason why I don't want to be too hard on people that don't source their work. I'd rather have them become part of the community where peer-review will help encourage some percentage of them to start sourcing than raise the bar so high that most people won't want to participate.--Dallan 22:14, 10 April 2009 (EDT)
Which is another reason why wholesale rating of pages might not be such a good idea. Since the vast majority of folks are not citing useful sources, most pages would get red flagged, or whatever. Highlighting that probably sends the wrong message, and would make the site less welcoming. Would it be helpful to create a set of standards that are to be strived for? That could include, of course, sourcing, but it could also include other aspects of genealogy as well. Perhaps a list of example pages that embody the sites goals? Of course, first we have to establish what those goals are. Q 08:57, 11 April 2009 (EDT)

In terms of trying to alert new users to good practices, the idea of putting a message on their Talk page about (lack of) sources would be good. Make for a pretty long message if 5000 people are uploaded with zero sources. Maybe if the GEDCOM has 10 or fewer pages without sources, you list the pages, otherwise you just say that 50% or whatever percent of the GEDCOM lacked sources, could you please review it and add any sources you can, and point to a help page that gives a brief explanation of what kind of sources are most valued.

GEDCOMs without sources are not without value. If forced to vote, I personally might say block them in the hopes that eventually somebody else will come along with sourced data, and in the long run, we'll be better off. But that takes a lot of faith and patience, and I have to admit that as long as they are only adding new people, the information and connections they provide, may give a clue to some other researcher who then adds the sources. And certainly, there is catch-22 for WeRelate, to attract people you have to have people, so at this stage, scaring off people has its drawbacks.

The biggest worry about Junk Genealogy is protecting data that is already there. The hard part of Junk Genealogy is that most of it has sources, they are just outdated by newer research. So people can input Junk Genealogy with sources, and it is hard to think of a way to catch them. Again, if there is no data there previously, even outdated data has value, as it starts the process. But if the data has already been refuted in a detailed explanation, then we don't want the GEDCOM update overwriting things. It would be nice if there was something like the nomerge template that caused updating of pages by GEDCOM updates to be blocked, but I think the 5 watcher rule is a start, and probably less prone to misuse.

I am not so much worried about manual updating, though as mentioned in a different place, I feel it would be nice if the presence of a discussion on the Talk page was more prominent, and even if you could add a flag T1 to facts that says be sure to review Topic 1 on the Talk page before changing this birth date or something along those lines. --Jrich 13:25, 8 April 2009 (EDT)


I too believe that the more important concern is that unsourced GEDCOM's don't degrade existing pages; adding unsourced pages for new people that we don't already have in the system isn't as bad because those pages will hopefully be improved later, especially if we can make adding sources easy. I'll make sure that the nomerge template works for GEDCOM uploads as well as regular merges.

Yeah, this is a major concern, I think. I would hate to see properly-done, well-sourced pages being auto-merged with "junk" pages where a mere statement of GEDCOM upload date is given equal weight as a "source." --mksmith 12:32, 11 April 2009 (EDT)

There has been a lot said on this topic that I have not read and I have flipflopped on my opinions about allowing gedcoms with no sources or sourced by another tree. The new merge capability with the gedcom upload should help with the duplicate pages. I believe that after a gedcom is uploaded that meets the criteria of no sources or only sources with another tree, such as WFT # such and such, should be flagged for deletion after a period of 6 months. We could notify the user and allow the user to request an arbitration on the decision if they so desire. We could also state this on the gedcom upload page. I don't support a rating system as such; but don't oppose giving a gold star to super pages as a feature to encourage users to adhere to certain standards. Some of my research is only on WeRelate so please don't decide to delete my pages; it is a work in progress. There are sources but some are indexes from Ancestry. The proof is in the sources but have not the time now to write the proofs for the article. If you do decide to delete pages I would like you to consider whether the user only has the data entered on WeRelate or also entered in another database. When the gedcom download is activated that will not be a problem, but now it is a major problem. --Beth 20:11, 12 April 2009 (EDT)


This is a tough problem. We could try to filter bad GEDCOMs, but really, the problem isn't so much bad GEDCOMs as much as it is bad GEDCOMs that are then abandoned. Anyone, and maybe everyone, starts somewhere, and often that place is sort of feeble. If someone uploads a "junk" GEDCOM, but demonstrates a commitment to improving the data, it's really not junk - just a work in progress. On the other hand, a weak GEDCOM, that isn't purely junk, could be abandoned here and create just about as much hassle and chaos as pure junk. We're often looking at data of the latter sort, wondering what the better choice might be. Semi-ok data may actually be the harder problem, since it's less clear what to do with it.

For these reasons, my proposal has long been a small GEDCOM "newbie" limit. Until someone generates a track record of individual hand-edits, demonstrating a commitment to really using werelate, they really shouldn't be able to load more than a few dozen - or perhaps a few hundred - total pages via GEDCOM.

Dallan has heard this suggestion from me a bunch of times, and those of you who have been working the site for a while probably have too. I'm not sure to what extent Dallan may have adopted elements of the idea or not, but I toss it out again in hopes that it may be useful...--Jrm03063 09:34, 13 April 2009 (EDT)


And I've suggested including "before you upload" text that stresses what kind of place WeRelate is and what kinds of GEDCOMs we're seeking. He's incorporated some of this into the new text for uploading, but I'd almost rather they see something before they even get to see the "upload" button. Not quite a user agreement, but a "hey folks, don't upload junk here that you don't plan on maintaining..."

Dallan, I'll volunteer to be on a GEDCOM review committee.

-- jillaine 11:32, 13 April 2009 (EDT)


I think a "read before you upload" text might be talking to the wrong people. It's not a bad idea to include such an admonishment, but I don't imagine it will have much effect on the people who upload and abandon junk GEDCOMs in the first place. I mean, we would pay attention to it, but they wouldn't. On the other hand, the idea of limiting the size of permitted GEDCOM uploads until a new user has established his bona fides might actually work -- although it's a little late now. :)

I still haven't uploaded a GEDCOM. I've done about a hundred pages by hand, in those family groups where my own research is most active, and where I can add useful text in addition to mere vital statistics. (And I'm looking at the occasional duplicates I'm coming across and merging those in as I go along, so my network is also growing quite a bit). I'm not sure uploading a large GEDCOM at this point serves any useful purpose -- for me. I'm actively pursuing perhaps 15%-20% of the people in my database. The rest are grandchildren of siblings, and in-laws, and cousins of cousins, and whatnot -- people whose identity I'm interested in for contextual reasons, but not "real" members of my direct family. Which means I don't have much information about them, so I don't see a point in creating yet another nearly empty page. Not yet, anyway. I'm aware that my attitude toward GEDCOMs is out of step with most people who use them, but I guess I believe there's such a thing as making a worthwhile endeavor too easy. --Mike (mksmith) 12:48, 13 April 2009 (EDT)

But Mike, that 'nearly empty page' you don't upload may be the one thing someone else needs to find to be able to connect other family members. Because I'm a Jackson researcher, I've accumulated a lot of info about folks that are not even my immigrant ancestor's descendants; no relation to me at all. I want to eventually put it all on werelate where someone else can see it and benefit from it, even though I may have nothing but census records to put on their pages.--Janiejac 14:51, 13 April 2009 (EDT)
Mike makes a good point. If the warning is pointed to people submitting GEDCOMS that are not sourced, they won't understand that its directed toward them. After all "they know their work IS sourced (and very very good too), because they took it from someplace on the web".
If you want to get people to do better genealogy, you probably need to go one on one with them. But that means they have to be in communication, and that means they can't be excluded because the gate was set so high they couldn't get in.
But, if the objective is to have only well sourced person articles, then just go back to the original idea and bleep out every article that doesn't contain at least one non-Gedcom Source. Dallan can do that almost automatically. And there's lots of benefits to that, too. Among other things, it would certainly make the merging process easier, as you'll have eliminated 99% of the person articles per Dallans statistics. Q 15:42, 13 April 2009 (EDT)

I've been reading this discussion with interest. Junk genealogy has been my "windmill" for 35 years. I'm investing a lot of time in here in helping to clean up the myriad duplicates. It's a lot of work because if you merge them, the "junk" is all tossed together. I have also recruited a few people to the site and have watched their frustration in getting their gedcoms to go well. They have no idea what they did wrong.

My suggestion is that we might consider an "intern" period. A person could not upload a gedcom until he has been active on the site for "x" days and added "Y" people by hand. I have come to realize that the help warnings on the gedcom merge are meaningless to someone who is not familiar with how wikis work and how this site in particular works. There are some great tutorials, but does the newbie know how to find them? Does he have time to go through them? --Judy (jlanoux) 19:06, 22 April 2009 (EDT)
I like your idea Judy. That is exactly what I did when I joined WeRelate.--Beth 19:17, 22 April 2009 (EDT)

Another thought that I had: Collect the person's skill level at registration by asking if they have ever done editing in a wiki before. For those who say "no", have a set of "get acquainted" documents which are sent to them every few days during the intern period designed to draw them in and help them learn. As was mentioned, we need to try to retain members and we won't do that by letting them upload a 30,000 person gedcom the first day they register and then telling them they did it wrong.--Judy (jlanoux) 10:12, 23 April 2009 (EDT)


An additional suggestion [11 April 2009]

What would help me as a researcher as well as help WeRelate in its goal to quality, sourced data, is an automatically generated list for each of my trees. Perhaps it's under "My Relate" -- but basically, it's a selection that says something along the lines of "View unsourced info in your tree(s)". It's basically an automated "to do" list. I could use that RIGHT NOW.  ;-)

-- jillaine 07:33, 11 April 2009 (EDT)


NGS Standards for Sound Genealogical Research [13 April 2009]

Copied from Standards for Sound Genealogical Research (note the copyright notification below) (I've bolded those items that seem particularly relevant to the topic at hand).

Standards for Sound Genealogical Research As Recommended by the National Genealogical Society

From the National Genealogical Society, for About.com

Remembering always that they are engaged in a quest for truth, family history researchers consistently —

  • record the source for each item of information they collect.
  • test every hypothesis or theory against credible evidence, and reject those that are not supported by the evidence.
  • seek original records, or reproduced images of them when there is reasonable assurance they have not been altered, as the basis for their research conclusions.
  • use compilations, communications and published works, whether paper or electronic, primarily for their value as guides to locating the original records.
  • state something as a fact only when it is supported by convincing evidence, and identify the evidence when communicating the fact to others.
  • limit with words like "probable" or "possible" any statement that is based on less than convincing evidence, and state the reasons for concluding that it is probable or possible.
  • avoid misleading other researchers by either intentionally or carelessly distributing or publishing inaccurate information.
  • state carefully and honestly the results of their own research, and acknowledge all use of other researchers’ work.
  • recognize the collegial nature of genealogical research by making their work available to others through publication, or by placing copies in appropriate libraries or repositories, and by welcoming critical comment.
  • consider with open minds new evidence or the comments of others on their work and the conclusions they have reached.

© 1997, 2002 by National Genealogical Society

Permission is granted to copy or publish this material provided it is reproduced in its entirety, including this notice.

-- jillaine 16:10, 13 April 2009 (EDT)

That's a good set. Similar to the BCG, though with a bit more elaboration. This and the BCG proof standards would be a good starting point for a set of goals for this site. Believe Mike referred to it as a "Mission Statement".
If the goal is to have a single card for each discrete individual, and to create a card for every person whose lived (not really, but perhaps in theory) then that could/should be part of the goal statement. Perhaps we should coin a new term to describe this----"Integrated Family Tree". Is that analogous to Ancstry's "One World Tree"?
Perhaps what should be stated as part of a site Mission Statement is that when people upload their GedCom, it should be done with the realization that this is a wiki, and to anticipate that "Their" tree will eventually be integrated into the "Integrated Family Tree". Q 16:33, 13 April 2009 (EDT)



The above might lead us to this type of criteria. Yes, I've chosen a different word than Barnstar. -- jillaine 16:25, 13 April 2009 (EDT)

Pages that achieve “Source-Star” status at WeRelate are those in which 100% of the information provided is well cited per the NGS standards. This means that all information meets the following criteria:
*each piece of information cites a reliable source document
* where specific citation is unavailable, the data is supported by convincing areasons for conclusions reached
I'm not overly enamored with "Barnstar". Its just a work that people are familiar with if they've been around Wiki's, and if that's to be done, than another term might be a good idea. Barnstar IS kind of clunky. On the other hand "Source-Star", is probably not the most memorable choice.
Can you point to A page that you'd currently give a star to? Q 16:36, 13 April 2009 (EDT)

The Educational Approach [13 April 2009]

Not thinking that a very large percentage of pages are going to get rated/starred, I prefer to tell people what is desired of them. To that end, I tried putting together a rough draft/strawman. It needs examples, among other shortcomings.


Your Contribution to Quality (Yes! You.)

A goal of WeRelate is to become a repository of high-quality, reliable genealogical data. Various organizations have written detailed descriptions of what constitutes high-quality genealogical research. For example, the Standards Manual for the Board for Certification of Genealogists is viewable at BCG Standards Manual.

WeRelate is a collaborative effort. It is not necessary for one person to do all the hard steps in producing high-quality data, such as the exhaustive search of relevant sources. As long as the work that each person does is entered in a way that empowers collaboration, the community will be able to supplement it and bring it closer to BCG standards over time. This is to everybody’s benefit.

The foundation, upon which this whole process rests, is documenting the sources of your information.

  • As a courteous person, you are giving credit to the person who did the hard work.
  • As a collaborative person, you are enabling others to verify the work to ensure its reliability.
  • As a helpful person, you are providing pointers to others who are looking for more information.
  • As a researcher, you are providing a dispassionate argument to support your conclusions.

Whenever possible, source citations should reference items in the Source namespace. These are sources that are publicly available in a repository, such as a library or an Internet website. Whether you add data manually, or via GEDCOM upload, you should try to convert all your sources to point to Source pages, if they fit this criteria. If there is no page for your Source, create it. Source citations should give enough information, particularly page numbers or current URLs, etc., so that another person can easily and unambiguously locate the relevant material. Supplementing your citation with a brief abstract that honors any copyright protection can be very useful to other users. See Help:Source pages for details of working on sources.

The MySource namespace is used for one of a kind sources that other people will generally not have access to, such as conversations, family Bibles, etc. In citing these sources, you should be prepared to share them to the extent practical, such as e-mailing photographs of Bible pages, or providing transcriptions. It is common to see such sources described as being in the possession of some person. Do not publish the name, address, phone number or email address of such third parties without getting their permission first.

Quality of sources, not quantity.

One of the enemies of clarity is excessive data. Adding sources to WeRelate should strive to increase the quality of what is there, not merely adding redundant sources saying the same thing. Genealogical issues are not solved by counting the absolute number of sources on each side, but by thoroughly analyzing the reliability and quality of the sources to decide which is most credible. There are several characteristics that help identify higher-quality sources.

Contemporary sources preferred over after-the-fact reporting. Contemporaneous written records (made at the time the event happened) are usually given more respect that after-the-fact reporting of facts or family tradition. They tend to be freer of myths, faulty memory, and accumulated error. You should attempt to find sources that are contemporaneous when possible, or sources that quote or cite contemporaneous records when not.

Original sources preferred over derivative sources. Derivative sources just pass along data that others have gathered. Often the authority for the data is lost in the process. You should try to provide the original sources when available. When you cite a derivative source, try to cite those that identify the original sources. If a derivative source does not provide the basis for its data, its reliability can only be guessed at.

A special case of derivative data are other people’s GEDCOMs, One World Tree data, Ancestral Files, etc. These do not make good sources. While there are individual cases of these sources that have excellent quality, there are as many or more poor quality ones in existence. As it is very difficult to assess the quality of these types of sources from a citation, most people simply discount all such electronic family trees as meaningless.

Consistency with other facts. Direct data will generally carry more weight then indirect evidence, the latter only showing the side-effect of some event. But, every human activity is prone to error. And even the highest quality, most direct sources can be erroneous. A genealogical analysis will place more stock in a collection of consistent data items, even though indirect in nature, than in a direct item that is inconsistent with other facts. So, when there is doubt about a fact, try to find independent evidence of that fact. For example, consider trying to decide, given several people with the same name, which one a birth record should be applied to. A baptism record, or being mentioned in a will, or the age on a gravestone, can often verify the validity of that decision, even though none of them addresses the birth event directly.

Analyze data within its context. To interpret data correctly, you must have some familiarity with related cultural, historical and geographical details. Don’t be afraid to research unrelated people to rule out alternative suggestions. Find explanations for all discrepancies. Recognize your biases and your assumptions.

Be courteous. If sources are cited for facts with which you disagree, try to start a discussion on the Talk page. Consider the evidence presented by the cited sources fairly. To the extent possible, try to show how the given facts are inconsistent with other known facts. Present evidence for alternative facts, citing sources of equal or higher quality than the ones already cited. Give the watchers of the page a chance to respond. Listen to what they say. If there is not a clear consensus, consider that perhaps the truth is not knowable without more information, and the most useful result may simply be to leave the discussion on the Talk page where it may be seen by other researchers.

The collaborative process of WeRelate is a long-term, two-way interaction. Your participation can be a valuable part. But it is more than simply loading a GEDCOM. It is a continual process of querying, responding, redirecting research, and sharing new data and thoughts. Please do not dump your data and go. Stay involved and watch the magic happen!--Jrich 18:16, 13 April 2009 (EDT)


Very Very Nice. I'm sure there's some fine tuning of this, but this is a really nice start---probably much more than a start. Q 20:17, 13 April 2009 (EDT)



Scots initial Proposal [4 June 2008]

I am becoming more frustrated, disillusioned and concerned every day. Some 3 years ago it ocurred to me that the Wiki model could be a great opportunity for genealogy. When I found this site, as well as several others, I was excited that someone else had come to the same conclusion. However, it appears that any collaborative effort is being overwhelmed by the amount of Junk genealogy being uploaded to the site. I have been thinking about how to prevent the Werelate site from becoming just another repository for misinformation like so many others. Downloading gedcoms from world connect or the AF and then uploading it for someone else to edit is not genealogy. What we should be doing is compiling data, examining sources, weeding out the trash and creating a credible database. How can we encourage collaboration by those indivduals who are serious researchers and eliminate those who are just waving their pedegree to masssge their egos.

Some thoughts:

  1. If a person joins Werelate, uploads a gedcom and walks away, he contributes nothing.How about, If after uploading he does no editing for a certain period, say 90 days, then his upload is purged, except for pages edited by others just as if he removed it himself.
  2. For person pages for individuals from before 1500 or so, allow the surname field to remain empty without the unknown tag. The Title prefix or suffix can be used to differentiate individuals. Because people feel a need to enter something into the surname field, this creates an incredible number of variations for the same person if he does not have a surname. I have many instances of duplicates because they are entered in different languages. I realize that place names eventually became toponymics and are used as surnames, but in medieval times, they simply indicated where a person was from, often these appear with of, de, van, etc. preceding them and each vriation if used as a surname results in a duplicate entry.
  3. Accept no data for individuals prior to 1600 without source reference.
  4. Screen the sources and reject submissions based on questionabe sources or those known to be flawed.
  5. Perhaps have two separate databases with separate rules for submission. One for medieval, royal, historical and celebrity figures where submissions are restricted as stated above. Don't allow GEDCOM uploads to this section, only individual pages. A second data base remains as it is now but with the purge function for inactive users..Allow linkage to individuals in the other database within immediate families.

Maybe this seems rather Draconian, but I feel some kind of control must be implemented to prevent the site from becoming a hopeless morass like most others.. Opinion s anyone?--Scot 19:30, 9 April 2008 (EDT)


1. Try to imagine werelate is good enough to survive in some form 100 yars from now. I believe the next stage for amateur geneaolgy/ history is modelling of families / communities / places / events. wos to say something contributed yesterday then abandoned ( eg the 'only' online copy of someone's wedding or a school photograph wouldnt be of interest to someone else, even though it has not been looked at for a perid of 20 years

That person may not have sourced it correctly, or identified all the people on the photo , but may have left enough clues for someone else to identify it correctly.

In the UK you have to spend seven pounds for a copy of a certificate - perhaps in 5 years all the info will be online for 7p and it would be affordabel to model entire communities.

How do you know waht is useful? its in the eye of the beholder.

Perhaps the junk could be left out there but the werelate community should develop a simple quality accredition system. That which is sourced correctly (as identified by werelate accredited officers) can be coded as such.

Apage with a lower quality rating cannnot then 'damage' one with a higher one

Perhaps volunteers could adopt a geographic location and ask other interested parties to connect with them. With the right tools a volunteer could keep things in check. One thing I like about the wiki is you can host a family tree, a one neme or a one place - theoretically they should be able to co exist--Dsrodgers34 01:14, 4 June 2008 (EDT)


Some Responses

It has occurred to me that abandoned GEDCOM uploads are very much a mixed bag. It may sometimes be that the person just didn't take to the site and their data is still pretty good. Other times, well.... Anyway, I agree with the thrust of scot's argument - there need to be some steps taken to prevent werelate from becoming a sewer.

GEDCOM genealogy has made it easy for people to accumulate a data set that they would never have any hope of seriously maintaining - even given many lifetimes. We should encourage people to upload only data that they are serious about working on. Maybe the size of a particular GEDCOM upload should be limited unless by special arrangement (5000 or so?). Likewise a GEDCOM that goes back before 1500 AD.

The other problem of course, is the management of abandoned data. I've got a fairly small tree compared to some (~3000). I try to source things in detail and would like to think the pages are the sort of thing the site would want to host indefinitely. There should be a way to designate trees as a permanent part of the collection (genealogy goes on forever - good research done in the 1850s is still working for us - but I don't think anyone reading this dates to 1850...do they?). On the other hand, if a tree is loaded and worked a little then abandoned for a while, it probably shouldn't automatically persist forever. If someone in the user community wants to adopt the tree, maybe it just goes to them. If no one in the user community wants it, or perhaps if the user community actively requests it's removal, then after a time it goes away (if we want to be nice about it, maybe it gets archived to a named GEDCOM and tossed into the digital library?).--Jrm03063 22:04, 9 April 2008 (EDT)

Anything in an abandoned tree can be retained by any one, simply by editing the page. My 90 day suggestion is only for the period after the initial upload. I f they don't do match/merge for any of their data we can assume that they aren't interested in maintaining it. After that a longer period if inactivity could be required before declaring the data abandoned. If the match/merge utility works well, duplicate entries might not hang around so long, so it will be easier to find and evaluate data in recent uploads. Again any pages that are merged or edited will be retained and the rest of the tree was not found to be of interest to anyone searching.--Scot 01:12, 10 April 2008 (EDT)
There are several types of users on WeRelate presently. I looked at some of the users who registered in April 2007. First, you have those who only registered; second you have the users who registered and created a profile and listed surnames in place that they are researching but have no other contributions; third you have the users who created a profile, uploaded a gedcom and have no contributions since the initial gedcom upload, and fourth you have the users who have an active file and recent contributions.

I believe that we should eliminate all unsourced gedcoms after 6 months; unless they are watched pages by someone other than the user or an agent of WeRelate. --Beth 08:31, 10 April 2008 (EDT)


I agree in principal, the problem I see is that there probably aren't any utterly unsourced GEDCOMs, though there are plenty of essentially unsourced GEDCOMs. The former being a GEDCOM entirely without source records and citations. The latter being a GEDCOM with "OneWorldTree", gedcom upload date, and other sorts of essentially useless sources. Trying to make software know the difference wouldn't be all that easy.

I think we need to hook on other criteria to decide that something is both abandoned and useless.

Also, having one or more pages in an otherwise abandoned tree watched probably doesn't say a lot about the quality of the tree generally - though anyone watching a portion of a tree should be consulted before the remainder of the tree goes away. When I merge duplicate families, I don't concern myself with any question about whether an originating page comes from a "good" tree. I only try to understand whether the various pages are talking about the same people (or at least, the same fantasy about people).--Jrm03063 11:43, 10 April 2008 (EDT)


First, this topic should probably be moved to a separate page, because it soon is going to take on a life of its own.

Secondly, I am in 100% agreement with Scott. Becoming "Draconian" in principle is not going to turn away the masses, because from what I can tell, there doesn't seem to be a mad rush taking place to genealogy wikis anyway. Why is that? Becoming a little more picky about what gets uploaded and about what stays uploaded is instead going to attract those researchers who are serious about collaboration and who don't take offense when a bad source or bad information has been revealed. --Ronni 12:41, 10 April 2008 (EDT)


Dallans Response [10 May 2008]

A few thoughts:

  • It would be pretty difficult for a computer to determine whether a tree is of "good quality." People will have to determine this.
  • Abandoned trees that are of good quality are probably worth keeping around.
  • Deleting a tree carries its own problems. Not too long ago a user deleted a tree that others were interested in (but had not gone to the fair amount of effort to watch all of the pages), and it caused some grief when it was gone. There are people on the other side of the fence arguing that we should be more strict about removing trees. (I assume this applies only to "good quality" trees.)

I don't think we want to automatically remove abandoned trees, because abandoned trees that are of good quality are worth keeping around, and the system can't tell the difference between good quality and junk. So let's focus on removing junk trees. Under what conditions then would we want to remove a junk tree? I can think of four; perhaps there are more?

  1. The junk tree contains a lot of internal duplicates (duplicates within the tree itself)
  2. The junk tree overlaps with existing trees, and the tree uploader didn't merge the pages
  3. The junk tree overlaps with existing trees, and the tree uploader merged the pages but in so doing added a bunch of "bad" data from their tree to existing pages
  4. The junk tree overlaps with a well-sourced tree that I am trying to upload, and merging my tree with it is going to add my well-sourced data to a bunch of pages with "bad" data

I'd like to consider each of these cases in turn.

  1. There's nothing to do here but delete the tree, as has happened already with the tree that contained a large number of internal duplicates for the Norman's. If someone finds a tree with a large number of internal duplicates, I think we should contact the submitter and delete the tree.
  2. I think the best way to resolve this is to require the tree submitter to go through a match+merge step (where they are shown the probable-overlapping trees and can choose which pages to merge) within say 7 or 14 days of uploading the tree. If they have not completed the match+merge step within 3 days they get a warning, and the tree is removed if they have not completed it within 7 or 14 days. Trees that aren't determined to overlap any existing trees don't have to go through this step of course.
  3. This is a more difficult problem: The tree submitter merged their pages into an existing tree, but the merger resulted in a bunch of questionable data and sources being copied into otherwise good pages. We may need to have an option in merging to not append data from the new pages onto the existing pages.
  4. This is the opposite of the previous problem. I am trying to submit a new tree, and it overlaps an existing junk tree. But if I merge my pages into the existing tree I don't want my good data appended to a bunch of junk. We may need to have an option in merging to have the data from the new pages replace the data on the existing pages.

BTW, don't get discouraged. Match+merge is something that should have been implemented a long time ago, but it's not an insurmountable problem. As part of match+merge we'll have a screen that shows all of the probable duplicates between two trees, and lets the tree contributors discuss and select which pages to merge. This will hopefully make merging much easier than it is now.--Dallan 17:08, 10 April 2008 (EDT)


Question about #3/4 above - at some point in the past, we talked about a function that during gedcom upload would identify the duplicates and do a merge if necessary. At that point, the user would have the option of just not uploading the duplicate people. That takes care of 1) those people in well-sourced trees that are placemarkers (like spouse's parents one hasn't pursued); 2) chunks of badly researched trees; and 3) situations in between where you want see what's there before deciding one way or another. With some instructions, hopefully most offenders will recognize themselves and not upload their junk onto "nice" pages. Is something like that happening? If so, we're talking about people that ignored that instruction, which adds another dimension. But, that said, I also think you do really need to have an option of not appending the data from one page or another to the merged page. I would say, based on hundreds of merges, that far more often then not, one page is either junk or functionally, but not literally, identical (that is, one user says b. Windsor, CN, the other says b. Windsor, CT - this is why I'm still hand-merging, because these are human decisions.) So to avoid creating more work and more junk, I would think a "use data from ___ page" option would be highly useful. --Amelia 08:35, 4 May 2008 (EDT)
The duplicate-detection hasn't been implemented yet. I agree that we need to implement it. And a "use data from ___ page" option when merging is also a great idea.--Dallan 15:26, 6 May 2008 (EDT)
See Gen Mehods Archives for a recent comment on this problem and wiki's, particular in the context of the LDS site. Q 08:54, 7 May 2008 (EDT)
Interesting discussion. From what I've been told the new family search wiki is primarily a way for them to get their research outlines into a form that others can extend. It should also allow them to more easily post their own material online that is currently available only at the family history library (the "half-sheets" at the reference desks). I think it's a great step for them.--Dallan 10:44, 10 May 2008 (EDT)

Medieval Genealogy

Medieval genealogy (pre-1600) is a slightly different problem than the problem of "junk genealogies" because (a) there are only a few people that we have records for pre-1600, and (b) those people didn't generally have surnames, and the birth dates are often approximated. I'm not unwilling to prohibit people from uploading pre-1600 people, but I'd like to first see if we can merge uploaded pre-1600 people into well-sourced existing pre-1600 pages, and rather than append their probably-lesser-quality information onto the existing pages, we would not modify the existing pages.--Dallan 17:18, 10 April 2008 (EDT)


Speedy Delete [20 April 2008]

While I don't particularly think wholesale deletion of "abandoned" "poor quality" trees is a good idea, a feature that would be good to have is something akin to "Candidate for Speedy Delete" on other wiki's. In truth, that capability is already in place in part, in that its present under the "More" pulldown menu (at least when you are at the article level)---specifically, if you are the only person watching a page you can delete it anytime you want, as per the following guidance:

If you are the only person watching a page, click the More link in the upper right corner of the screen under the blue bar. Select Delete, enter a reason for the deletion and click Delete Page.

But I'll bet there are a lot of pages, such as duplicates, where the author is no longer paying in attention, and a duplicated or otherwise unneeded article (kind words for junk) could be removed with no loss to anyone. Its probably not a good idea to allow just anyone to do that, but I think its something that could be done suitably by an administrator---if they knew that someone thought a particular page could be done away with. If there were a repository where people could nominate candidates for speedy deletion, someone from the admin side would go through the list, review, and make a rationale decision about how to handle the article. That might mean notifying the original creator, denying the request, or perhaps, immediate deletion if that were appropriate.

Now, in truth, I don't really know of any articles so messed up that I think I'd delete them. But I DO encounter lots of duplicates---usually by the same author. I suspect that there's something in the process of GEDCOM uploads that creates them. Possibly they re-upload their GEDCOM periodically to sweep up any changes that they've entered in their genealogy program, and the upload program can't identify things that haven't changed, and just creates everything anew.---hence, lots of duplicates). Don't know why the duplicates are there, but the fact is, they are---and might be candidates for speedy deletion. Q 19:32, 10 April 2008 (EDT)


I like this idea. If a page or set of pages isn't just poor quality, but duplicates pages already in the system and is causing merge work without contributing any new information to those pages, I could see marking them for speedy deletion. Then you have a human being instead of a computer making the final delete decision. What are others' thoughts on this?--Dallan 15:08, 15 April 2008 (EDT)


I agree that human interaction is needed in making these kinds of decisions. I also believe we shouldn't let this issue fall by the wayside. I just came across a GEDCOM uploaded in February that has many duplicate pages in it. The GEDCOM is by no means considered "junk," however. But it was uploaded and the user has not edited it since nor have they contributed another page to WeRelate. --Ronni 22:46, 19 April 2008 (EDT)

One of the things Dallan has indicated would be in place eventually is a search result tabulation similar to the browse function, but including more information than just the name---ie, DOB/POB/Spouse/Father/Mother type information. Such a tabulation would make it easier to spot duplicates of this sort---especially if it included the identity of the submitter. That way,if you ran a search and found that John Smith had created four separate cards for a "Jeremy Black" all with similar DOB's and DOD's, Spouses etc, you'd be fairly sure that some of them were duplicates. Q 10:17, 20 April 2008 (EDT)
Including the identity of the submitter is a good idea. I don't have that readily available, but the list of users watching the page is available. I'll include the watching users in the search results.--Dallan 17:30, 24 April 2008 (EDT)

Status flags to mark state of data [4 May 2008]

Interesting discussion. The data I have loaded up for my small tree cannot be described as bad data, but it is poorly organized and presented. I am a beginer in genealogy and when I used Family Tree to hold my data I didn't put the data in the correct places. When I uploaded a gedcom it looks like a dump of mixed facts. This can be very confusing to any one that tries to sort through it.

Many people will be new to genealogy as well as new to computers as well as new to wikis. They will not do things correctly at the beginning and will be frustrated and over whelmed at times about the amount of work to get details into there proper places. There is a learning curve and it can be daunting especially when you are use to immediate gratification, fast food, and no line up service. This can be a lot of work.

The advantage of keeping these people active in your/our wiki is that they do bring very good data related to themselves. The closer the family ties the better the data will be. It is intuitive. So, I agree requiring better credentials for "historical" data makes sense.

The advantage for me to have a presence on this wiki is to increase the chances of contacting another person with an interest in the same people. To establish a tree for this purpose requires only the basic name, bmd stats, and locations. The details don't need to be 100% as that is why you want to find other people, to compare notes. I have connected with one person because of this wiki and we were able to share some info.

My suggestion would be to have two classes or status for data. Or even multiple status flags. Raw, draft, under construction, basic, vitals only, etc. Then when you run merges etc, you could include or exclude based on status flag. You/we will need to develop a set of criteria for fitting assigning a status flag. A disposition rule could also be auto set based on a status. For example, if status = raw, then delete 6 months after last update.

In records management profession the concept of transitory and offical records is well understood. Transitory information is used to create an offical or final record. While offical records may be kept permenantly, transitory records are not. Disposition is driven by a rule for the record series (class). Based on a triggering event, a count down of a specified time starts. When the end of the time period is reached the record is proposed for destruction. If no one can declare a reason to keep the record it is deleted.

Triggering events could be last date record amended, last date record viewed, and record status = "transitory". And/or include activity of record owner, or persons with interest in record. NO activity, transitory records, nothing happening for x months, then delete.

As an aside how many people are familiar with the 5 steps to change management? Awareness, understanding, acceptance, committment, action. It seems to me that there is a lot of change management required in a wiki to get people to move together in an agreeable direction.

Thxs Peter --PeterP 08:48, 13 April 2008 (EDT)


Good points.

I am reminded that just because an article is not being edited, does not mean that it is not being viewed. Just because an article is not being viewed does not mean it is not valued.

Eventually, there is NO user of this site, Dallan included, that will not cease editing their articles. It would not be good for this site if people could not contribute to it with confidence that their contributions would remain.

Also, in the same vein, if the criteria for deletion came to be that they had to be "good" articles (or at least not "bad" articles), you'd need to be able to define a criteria for good and bad articles. An obvious criteria would be that they meet BCG standards. How many articles on this site meet those standards? Not mine, I know.

Q 09:24, 13 April 2008 (EDT)


Wikipedia has a set of templates that anyone can add to a page to say for example that it doesn't contain source citations or is biased. These serve as flags for others to improve the articles. But articles not meeting wikipedia's criteria don't get deleted except in special circumstances. I'd be in favor of coming up with a set of templates along these lines to flag articles. But I wouldn't want to delete pages just because they weren't good quality and haven't been edited in awhile. I've uploaded my genealogy and many of those pages haven't been edited in quite awhile, and many don't have good source citations. But I'm hoping that they'll get better over time. I'd personally hate to see them deleted.--Dallan 15:08, 15 April 2008 (EDT)


Hi Dallan,

If one chooses to allow any gedcom upload without any criteria; then I certainly vote for some kind of status flags. One could be junk or more politely unsourced and second, sourced but only with meaningless sources such as WFT #3 or so and so's gedcom etc. All of these could be under one status flag.

One could isolate these trees as unavailable for automatic merging until some person chooses to edit the pages.

Then perhaps after a certain time period, perhaps one year, active users on WeRelate could vote whether or not to keep the pages or delete the pages. You could place a warning on the registration page that possibly one's pages could be deleted in the future. --Beth 10:10, 4 May 2008 (EDT)


Something along these lines makes sense. I'm not sure exactly how, and I'm not sure whether removing unsourced abandoned trees should be proactive or reactive, but it seems like we should come up with something in this area.--Dallan 15:26, 6 May 2008 (EDT)


GEDCOMs in the digitial library? [6 May 2008]

Do we support the upload of GEDCOM files to the digital library? I'm struck that, for some people who are not yet sure about whether they want to commit to the process of wiki genealogy, or if they have an unusually large GEDCOM (say, over 2K people) we should encourage them instead to protect their current GEDCOM by loading it to the digital library with whatever cover material they can muster. Then, instead of uploading their entire GEDCOM into werelate, we give them guidance on how to carve up their work to upload piecemeal.

The GEDCOM standard has been a help for genealogy, but a hinderance as well. Instead of folks focusing on a small set of ancestors that they reasonably have the time and interest to properly research, they become slaves to the maintainenace of a large data file that often turns out to contain tremendous amounts of crap. Tell them to abandon that stuff and focus on a more reasonable set of goals and they'll run off screaming. On the other hand, tell them to archive their work in a labelled and maintained repository like the digital library, while carving out the subset that they really want to actively continue work on, for use in werelate and we may all be better off.

I know that Dallan is looking at ways to improve the upload process, so that duplication can be suppressed at the start, but that's only part of the challenge. The real challenge is uploads of data that the user never really intends to actively work with.--Jrm03063 15:11, 5 May 2008 (EDT)


You should be able to upload your GEDCOM to the digital library. I haven't tried it, but I've added GEDCOM as an accepted file type to the library. I hadn't thought about having people upload a complete GEDCOM to the digital library and then copy just a portion of it to the wiki, but that seems like a really good idea. We could even have links from the wiki pages that were on the boundary of what was carved out of the GEDCOM pointing back into the GEDCOM file. (I wish there were two of me.)--Dallan 15:26, 6 May 2008 (EDT)


Usage Statistics [29 October 2008]

"Junk" genealogy is a fact of life in genealogy. Its been around for a looong time, and not a recent phenomenon, but it has taken on a life of its own with the internet. I suspect that for most services, such as Ancestry, there's really no advantage in purging junk. Perhaps the philosophy that rules is "something, anything, is better than nothing". There's a certain amount of truth to that, unpleasant though it is. The more people use a site, the more successful it will be at least in terms of survival.

With that in mind, here's a small summary of traffic on the main genealogy wiki's and some other sites for comparison. These data are from Quantcast.com, and are for the month of April. I've added some interpretive information about each (number of articles and functionality comments)

Datum Type Ancestry GenCircles.com Genealogy WeRelate WikiTree's FamilySearch Rodovid
Wiki?NoNoYesYesYesYesYes
Rank 287 21268 108397 136773 184445 287304 ND
Unique Hits per month 4.8M 102996 15192 11432 7892 4550 ND
Visits Per Month 44.9M 371021 3709 3524 0 6533 ND
visits/unique 9.30 3.60 0.24 0.31 0.00 1.44 ND
Audience Comp (Passerbys) 59 66 85 83 83 75 ND
Audience Comp (Regulars) 38 34 15 17 17 25 ND
Audience Comp (Addicts) 4 0 0 0 0 0 ND
Share of Visits (Passerbys) 12 27 71 74 71 53 ND
Share of Visits (Regulars) 41 73 29 26 29 47 ND
Share of Visits (Addicts) 47 0 0 0 0 0 ND
Number of ArticlesAVBN*ABN*20K2M??100K
GedCom supportYesYespartialYesassistedNo?
Guided Data Entry*??templatesYestemplatesNoYes
  • AVBN: A very big number
  • ABN: A big number
  • Guided data entry: Uses text boxes to input data, keeps track of relationships in some manner

None of the Genealogy wiki's shown here are anywhere close to Ancestry or even GenCircles. In terms of traffic Genealogy and WeRelate are about the same. FamilySearch has alreay grown to more site visits than either G or WR, but that may be because its new. Rodovid seems to be loosing ground; Last month it garnered about 1900 hits. Its now dropped off the board (insufficient data to report), though its still active. The high traffic count for Genealogy is, I believe, do in part to recent changes in layout (much better looking than it used to be), but I don't think that's the real driver. Its being visited more, but actual page creation seems to have dropped off. What's really driving the visitation numbers for genealogy is its connection to Wikia, and I believe an advertising campaign that's made it somewhat more visible.

Among Wiki's the major distinction is the total number of articles. WeRelate is clearly the front runner here, with 2M. Its most serious competior in terms of site activity is Genealogy with 20K articles. WikiTree has 100K articles, but its activity is lower.

The greater number of articles on WeRelate is almost certainly due to it's GEDCOM import capability. Genealogy has a similar capability, but its not been effectively implemented. Rodovid may have this capability (I'm told) but its not obvious. WikiTree can do it but it requires the operator to insert it---not automatic---that's probably why it has 100K articles, but the fact that its not automatic is a major barrier for it.

The point of this is that I believe that what is driving WeRelate's success is its GEDCOM import capablity coupled with a well thought out manual data entry system. Its the GEDCOM load that brings the useful traffic. None of the other Wiki's have functioning GedCom support.

On this site getting folks to do more than dump their GEDCOM is a challenge, but first you have to get them here. My guess is that much less than 10% of the people who dump a GedCom ever do more on the site---perhaps that's 1% who really stick. What's really going to drive the further success of the site is that small percent---these are the people ANY wiki needs---dedicated users who do more than simply dump a GEDCOM. Ultimately, they are the ones that are going to make the site work. But to get them you have to cast a large net---and ANYTHING that diminishes the number of people trying the site, is also going to diminish the number of users that turn into dedicated users.

Which is why you have to be very careful about doing things that will turn off those who come to the site for its GEDCOM dumping capability. That's a number that you want to increase, not decrease. Otherwise we might find ourselves struggling along like Rodovid with traffic so low it doesn't get picked up in the statistics. it would be nice to encourage people to do well with their genealogy, so that nothing here could be described as Junk. No one else (wiki or otherwise) has succeeded with that, and putting up with Junk Genealogy is a small price to pay in return for persistence.

And finally, I might add that I've personally developed a fondness for "Junk Genealogy". True it is junk, but there's some utility in having about a million people looking for information. Even if they don't understand the need for citing sources, there's usually enough of a clue in their work that, once spotted, you can seek out the original data yourself. I LIKE having lots of folks looking for the same things I'm interested in. The fact that many of them don't know how to report what they find, or make effective use of it, is a small price to pay for having all of those busy hands finding good stuff. Q 10:28, 7 May 2008 (EDT)


I own Family Tree Legends and am a member of GenCircles. My family files were transferred to MyHeritage and I suspect that all of the GenCircles' files have been transferred there also. The transfer was an automatic transfer; without my knowledge. That did not bother me. I have now found the burial place of a great great grandfather and successfully ordered his funeral home records.
The family sites may be public or private; your option. I have not used the new program but I suspect it works similarly to Family Tree Legends. The genealogy software is on your personal computer and you enter data into your program as usual. The data entered is automatically entered on your web page on the MyHeritage site. You receive notifcation of Smart Matches as one did with GenCircles.
The capability of having a genie program with reports and charts and the capability of automatically creating web page entries; no duplicate typing is unsurpassed in my opinion.
It would be fantastic if WeRelate had a similar capability. WeRelate is not difficult but I have not had much assistance with my data because others seem to think it is difficult to learn how to use the site
Because you have GenCircles in your chart; I thought you might be interested in the new site. Here is the link for MyHeritage. [2] --Beth 10:51, 7 May 2008 (EDT)

It pains me, but I think this analysis is sound. Most of the "junk" genealogy I've encoutered over the last six weeks or so was really just inadequate genealogy - vast wastelands of unsourced names with nothing but dates for birth, death, and marriage. Much more often than not, the information is correct or (at least) flawed in a way that is well known or been documented as a flaw in the literature. It can be an odious task to work through merging the stuff, but I think that was mostly because we've got a backlog of a couple of years of stuff that was almost entirely unmerged. I noticed that individual trees, added to a reasonably well merged space, can be merged in pretty quickly when you know how to go about it.

I still think that we should encourage folks to think critically about their purposes before uploading a GEDCOM. The paradigm shift from __my__ tree to __our__ shared genealogy space is a serious jump for folks, and it can't be reinforced enough. If they are simply looking for a place to archive their GEDCOM (or perhaps their TMG data base or whatever), then the digital library may be a better choice. If they don't have an interest in working cooperatively, leaving their data base where it can be picked up by another researcher may be the best thing to do. If they have a large GEDCOM but a core set of folks that they are really interested in working, they may want to take a hybrid approach - GEDCOM to the digital library and a subset uploaded to werelate. Having offered that guidance we probably have to trust that folks will make good decisions more often than not. When they make very bad decisions, we can always fall back on the recently used informal approach with one notorious upload - deleted by popular demand.--Jrm03063 11:25, 7 May 2008 (EDT)


I agree with everything that's been said. Thank-you for the analysis! These are all great ideas.--Dallan 10:44, 10 May 2008 (EDT)


Junk genealogy? I learned very early to be very careful when uploading anywhere. So I have 'special' gedcoms to upload with hardly any sources named. If I find I like and trust the site, I upload a better file or do as I started here. Add them manually, as I have time. So, my files would be listed in this 'junk' talk, as I have not been able to add a lot lately. Or what constitutes true 'junk'?

WeRelate has as many entries as it does BECAUSE it can take in GEDCOMs. It is more complicated than the other gen wiki sites. And I still dislike the search here.

I have gotten many leads in 'junk genealogy' files. They tell me which direction to go or to just look else where. I don't believe that a file of 10,000 or more can have sources compiled by one person, it has to be a file that was put together by taking others peoples work. So what is 'junk'?

Abandoned files? It is a lot harder to understand how this place works. Perhaps they have gone away just frustrated. Perhaps they check back periodically to see if there are changes. People tend to do what is easy when putting up their files. Not many have time to learn a new format.

I don't know if anyone has even expressed an interest in my files. The only person watching is my cousin who I told to join so she could add if she chose. She hasn't. I have never gotten any messages from here.

I'm not sure what I will do now. If I am going to be 'junked' I would prefer to delete my own files. Just my ramblings.--Twigs 11:51, 15 May 2008 (EDT)


I think some people define it as unknown (or unknowable) genealogy, lacking in source support. I think the term is a little more abstract for us however, and it probably is more a function of the contributor's conduct than of any fundamental qualities of their data at any point in time. Whether the space of your interest is ten people or ten thousand, since we're sharing the space and any overlapping research, we hope that anyone jumping in will be interested in improving the quality of their contribution going forward - regardless of where they start.

I suppose it could be put another way. Imagine a group of people doing old-fashioned genealogy collectively. Maybe they share a file cabinet at the local historical society and the group has a set of general conventions for how to record information and sources. The group tries very hard to be dilligent about getting their information correct and complete, as well as citing sources so that other researchers can review and expand their work - but of course it still is of uneven quality. Now imagine someone showing up a meeting of the group, throwing vast chunks of material they don't understand (or plan to understand) into the group's cabinet. Then, they just disappear. What is the group to do with such a contribution? Does someone suspend their own research interests and start wading through the contribution to bring it up to the quality standards of the community? Or do they just extract it, set it aside, and wait for someone with actual interest in that area to adopt the stuff and take responsibility for it?

If what you are doing would seem brusque in a group meeting around a table once a month, then it would probably be received unenthusiastically in this context.

If you have a genuine interest but are simply starting small to see if this all works for you - great! Welcome! Nice to have you here! Ask for help any time! If you're just looking for some place to archive a GEDCOM without any intentions of working the stuff further, then I suggest either the digital library or another site that archives GEDCOMs from any source.

I guess it's all a wordy way to say "play nice".--Jrm03063 13:37, 15 May 2008 (EDT)


One more thing - repeated uploads will not do what you think. The shared data space would wind up with both the old and new information, and someone would have to merge the material. I'm curious - why hold back sources? What is the issue of trust that concerns you?--Jrm03063 13:43, 15 May 2008 (EDT)


I meant no offense by commenting. I did not know that was not playing nice. I am sorry.--Twigs 15:54, 15 May 2008 (EDT)


Oh my goodness! Of course you're "playing nice" - you're talking to folks - that's participating in the group! ....and I'm only one person in this community. I'm only sharing my idea of things, which hopefully is something like the mainstream, but who knows? I was just trying to help you understand how one other person sees this space and what's behind this weird notion of "junk genealogy".--Jrm03063 17:54, 15 May 2008 (EDT)



From a Junkee [22 November 2008]

I realize I'm coming to the party a bit late but I thought you all might be interested in hearing the POV of an abandonner. I don't know how typical I am but I doubt I'm alone.

First I was introduced to that other genealogy wiki and I added some content there but their manual process was so slow and tedious it would have taken a lifetime to get much done.

Then the guy from Australia (Robin?) suggested WR and I came over here around the time you started accepting uploaded GEDs. I decided to try it out; I uploaded a couple files, created a few project pages; contributed elsewhere for awhile BUT I was also continuing to research my lines. With no way to easily update the content already on the wiki with my. Updated gedcom, I realized I would have to be updating twice. The amt of time required was disheartening. In addition, I sought more of the template support that the other wiki site offered. I left, "abandoning" my tree, other pages and leaving behind what some here would call a "junk tree". In the last 18 months I have had very few notifications that my tree was edited or watched.

My trees are of mixed quality; some pages are really well.referenced; others are cr@p.

I came back because the notices of edits and now merges have increased and I was curious to see what was going on. The merge function fascinated me and I started to try it only to realize I better LEARN how to do it right first otherwise I might really screw things up. In search of such help, I stumbled across this page. And thought I would add another perspective.

I will add that you have got my attention for awhile. I have also learned that I am much more likely to focus on those areas of my research where I feel I have strong data, do my research elsewhere, then return here when I have solid info and update accordingly.

I wonder how many other junkees you will be attracting back now that so much merge activity has started?

I would like to help out with that effort, but let me get reacquainted with the system.

Jillaine 17:28, 29 October 2008 (EDT)


I'm sorry for the late reply. We've had a number of users to come back due to the merging, but I think that in order for us to take off in a big way, in addition to merging we need to make it easy to sync your online tree with your offline desktop application. Although some people are comfortable doing everything online, most people are not. So a desktop app that synchronizes with WeRelate is in the works for next year. That plus the ability to merge should make WeRelate a really fun place to work.--Dallan 15:40, 22 November 2008 (EST)


Leaving data from gedcoms sourced by WFT etc. [22 July 2008]

Hello everyone,

I have changed my mind. I think that we should leave all of the pages uploaded on WeRelate. Using Dallan's new search engine; I discovered a gedcom that had been uploaded in May of this year. This is a family that I have researched. I have not removed the source for WFT nor have I deleted data sourced by WFT that I do not have. To date there have been no conflicts in the data.

What I have chosen to do is to enter the data that I have and source my data. The user does not have a profile; and I don't intend to contact the user. If she receives a notification via email and contacts me that is fine.

I created a new tree and add the pages to the tree as I edit them. I am researching the Coker line and this person is researching the Meadors line so some of the pages will not be added to my newly created tree. You can view the history of one of the pages here Person:Elijah Coker (1). I use FTE all of the time and am not sure about the navigation if the Family Tree Explorer is dished.

Trees that have been uploaded via gedcom with no activity that have duplicate pages in another inactive gedcom should be automatically merged.--Beth 20:48, 22 July 2008 (EDT)


Definition of Junk Genealogy [7 February 2009]

I know this seems to be a old topic; but it will keep coming up as more folks participate. I'd like to see a list or step-by-step description of what constitutes 'junk genealogy' without having to read through this whole page.

This is on my mind because I have just been badly bashed (oh, not here; but by email) as having poor judgement, ludicrous guesses and my work is an injustice to researchers. This because I connected children to a father without provable paper documentation. I had accepted the children's connection to this man as 'reasonable conjecture' based on tax records and deeds over a period of years, census records, and because I found no other man by that name located in that locality during that time period and the fact that a father by that name pettioned the legislature to have these children legally renamed as they were his 'naturally born children.' The children's descendants are outraged that I would make this connection as there is 'no proof' that the man who designated them his own is the same man I said he was. They also say that just because I have found no other man who might be their father, doesn't mean there wasn't someone else and that the process of elimination is not good research - it is only a poor guess. I somehow thought that 'reasonable conjecture' was OK; acceptable when no other info seems to be available. But if I decide to upload this to WeRelate, how in the world would it be sourced and would it be acceptable genealogy or should I rethink my position? Is conjecture based on circumstances 'junk'?

I know that this could be put on a talk page for examination but that would require the parents to be connected (and married) but that is their complaint! No amount of explaining how the conclusion was arrived at satifies these descendants of the 'naturally born children'. The outrage seems to be from the known fact that he was already married to someone else. (Divorce was not lawful at the time.) BTW, what is the difference between 'naturally born children' and 'illegimate' children under the law? This distinction is made in the legislative language of the pettition. --Janiejac 21:32, 7 February 2009 (EST)


Janiejac,

The "junk genealogy" discussion in this forum has been focused on another topic entirely-- namely people uploading unsourced GEDCOMs that they then "abandon." I don't think your situation falls into that category.

I see that you have two issues here:

  1. Appropriate way to document disputed/disputable information
  2. Reactions of descendants

I have had a similar experience so I can empathize. (See User:Jillaine/Spiritual_Wife-ism_in_Late_Colonial_Massachusetts#HIX_.2F_HICKS if you're interested; my longer manuscript goes into greater detail.)

For item #1, one option is to leave the "natural" father disconnected from the children (I did a variation of this with Moses Hicks -- connecting him to "my girl" but not connecting him with his first family), but place the theory, including the steps you've taken, the specific sources you're citing, in the narrative for him, providing hyperlinks to said children without "linking" them formally in the tree. You may also want to include in the narrative that descendants of the children do not accept that this evidence as sufficient to make the connection. This way you acknowledge that there is a dispute. (I did that on the Moses Hicks page linked above.)

As for challenge #2 If you believe, as I do in my own example, that you've made a reasonable attempt to find the father and this is a reasonable explanation, then you've done your part until others can provide another reasonable alternative. I remain open to being wrong in my theory, but I do believe that I have done sufficient research to present a reasonable theory. I haven't seen your research, but it sounds like you may have done this as well.

I will add one more piece. I did get a piece of "hate email" from one of the descendants. In my response, I apologized for upsetting them, that it was not my intent to do so, that I was completely open to being shown a different conclusion, and I re-iterated, briefly, why I believed what I believed, and asked for any suggestions they might have for additional places for me to look. This did succeed at diffusing the anger. We still disagree, but there is no more hate mail.

Best of luck to you. (And I don't have an answer re natural-born vs. illegitimate.)

-- jillaine 22:26, 7 February 2009 (EST)


I am not a professional genealogist, and certainly not a lawyer trying to prove something of a genealogical nature. But there is a thing called GPS Genealogical Proof Standard that sets standards for how research should be done.

  • Reasonably exhaustive search
  • Complete and accurate citation of sources
  • Analysis and correlation of the collected information
  • Resolution of conflicting evidence.
  • Soundly reasoned, coherently written conclusion.

Based on what I have seen on your website, I doubt we need to worry about 2, 3,and 5. "Reasonably" exhaustive is somewhat vague, but if you have searched all sources you are aware of, considered all information you have seen, and are open to any new evidence proffered, no reasonable person would expect more. Further, while you seem to have considered the resolution of conflicting evidence, as evidenced by your even considering your theory, this may be a step you need to point out to your antagonist. --Jrich 22:55, 7 February 2009 (EST)

Yes, I was aware my 'junk' was not the same kind of junk being discussed. But it had been called JUNK and I just needed a bit of reassurance. This hateful email has been going on for a couple of days, including telling me I MUST take this off my web site. I have answered as reasonably as I can but the level of their outrage is unbelievable. I don't like to be bullied. I will not be answering any more of their emails. I have explained my conclusions in my notes and in the emails but I am not changing my site just to meet their need for social propriety. They were looking for their gr-gr-grandma and didn't like what they found. Yet . . . I needed reassurance that I wasn't just being stubborn. Thanks for listening and responding.

--Janiejac 23:59, 7 February 2009 (EST)