WeRelate talk:Merging and downloading trees

From WeRelate

Topics


Should WeRelate allow downloading GEDCOM's? [5 June 2008]

The question about allowing anyone to download the file needs some serious consideration. I'm concerned about 'harvesters' who gather lots of different charts and then post them as their own work without either checking for errors or giving any credit to the author. An advantage of the tree staying on WeRelate (as opposed to being downloaded by anyone) is that when corrections are needed they can be made on WeRelate where everyone can see them. But if someone else downloads the file and passes it around, if errors are in their downloaded version, they will be perpetuating the errors - they won't know of the corrections made later on WeRelate. I envision pros and cons on this myself so I recognize the need for serious debate and/or consideration of the subject of downloading while it is still in planning stage.--Janiejac 22:47, 13 September 2007 (EDT)

When the question first came up, I didn't see what the big deal was, but you have made an excellent point Janiejac. And yes, there are many pros and cons and even more questions to be asked now. --Ronni 04:13, 14 September 2007 (EDT)

WorldConnect has come up with a very good compromise protocol on this issue of downloading. It gives the author the options of allowing all to be downloaded OR only a couple of geneations, or something like that. You might check that out. Thanks for the serious consideration.--Dr. Bill 22:43, 15 September 2007 (EDT)


I hadn't considered Janiejac's point either -- I think it's a good one. Download isn't scheduled until around the end of the year, so we have time for more discussion.--Dallan 13:04, 18 September 2007 (EDT)


Could someone redirect a portion of this exchange to a new subject called 'downloading discussion'? This has sort of evolved from collaboration to downloading.

Done

I want to keep the subject of downloading current and get others point of view on this while it is still in the planning stages. When I upload a file either to my site or to rootsweb or to WeRelate, I do upload all my notes and sources with it. I do believe in sharing and send anyone who requests it a register starting with the individual they are interested in and including notes and sources. But I don't give away my whole data base, notes, sources and all. I want interested folks to contact me with additions/corrections/suggestions and don't want to find all my data and notes posted on someone else's web site.

If I upload to WeRelate and it gets edited by myself or anybody else, I want to be able to download the whole thing back to my computer to continue to work offline. And I do like Rootsweb/WorldConnect's ability to designate just how much of one's chart can be downloaded. But the ethical question comes to mind - if others can add to or edit the chart - should that entitle them to download my whole data base? I'd appreciate input from others on this issue.--Janiejac 12:43, 29 September 2007 (EDT)


Allowing downloads of GEDCOMs is pretty essential, and an opportunity to boot. As has been observed, some folks like to be able to work on things off-line. Others perhaps want to take material to another system to generate different sorts of reports. I take the view that we need a symmetric capability - if you can upload a GEDCOM, you sure ought to be able to reverse the process. One of the reasons I've lost a lot of interest in ancestry.com isn't the expense, but the crappy GEDCOM they produce (and worse, they can't even fully re-import their own GEDCOM - how embarrassing). It seems that they've been intentionally inept in order to strand data under their proprietary control. The result...I'm looking for an alternative. Besides, if someone was really serious about massive harvesting of werelate data bases, they won't be doing it via GEDCOM, so they could probably do it right now.

A GEDCOM download is an opportunity, because a reasonable GEDCOM will be scattered with note/source links back to the werelate site. Skim an ancestry.com GEDCOM and you'll find dozens of links back to ancestry if the GEDCOM has any sources attached. One of the first things I think I would do with a werelate GEDCOM is to replace my ancestry data with a werelate GEDCOM. Then, if people are sniffing around my open tree and source information, they'll find their way to werelate.

The way that werelate gains credibility and preeminance isn't by taking a proprietry view of information, but by making it so totally accessible and free that there is no real advantage to getting it elsewhere. It's the wiki way. The information equivalent of if you love it set it free.--Jrm03063 14:38, 8 November 2007 (EST)


That's an interesting idea about providing links back to WeRelate in note fields embedded in the GEDCOM. We would have to do something like that anyway in order to satisfy the attribution requirement of our license. Please keep comments coming on this topic. We won't get to GEDCOM download until after match+merge, so we have some time to get comments from everyone.--Dallan 18:47, 8 November 2007 (EST)


I think downloading a GEDCOM is a very important feature, and should not be restricted. Even though I intend to do my work primarily in WeRelate going forward, I'd like to be able to download GEDCOMs for various reasons, including ability to put it into other software to generate various pretty-printed reports I can't do here, and as a "back up" of the work I do here. While I appreciate the various degrees of control that ancestry.com gives you when you upload a GEDCOM, there's a significant difference between WeRelate and Ancestry (or most other places like it). On Ancestry, when you upload a GEDCOM, it remains your tree. Here, when you upload a GEDCOM, it becomes your contribution to the ongoing wiki, which other people may add to, link to, correct, etc. From the moment you upload a GEDCOM here, it is no longer your tree, and it wouldn't make any sense for you to be able to dictate who could subsequently download it, especially after it has been enhanced by the work of others.

I appreciate the concern about careless people who might download your work, pass it around, and you lose the opportunity for updates. But I'm not sure we can solve the problem of careless people. :-) I for one keep track of where I got valuable information, and always like to keep in touch with those I've collaborated with on common lines. I think the suggestions that the downloaded GEDCOMs have back-links to WeRelate where appropriate are good ones.

That's my $.02. --TomChatt 01:56, 9 November 2007 (EST)


I've gone back and forth on this issue (i.e., no restrictions vs some restrictions). JRM's comment about embedded links to WeRelate is a very good idea. Tom's comment about "my tree" now being "our tree" needs to be reiterated because it is essentially what WeRelate is all about. That idea alone is one that I think still isn't completely understood when someone starts putting their data online here. I have observed that "misunderstanding" several times in the last few months. If we understand the concept of what is mine is now ours in regards to WeRelate, then restrictions on GEDCOMs would be few if any at all. --Ronni 04:38, 9 November 2007 (EST)


I agree with Ronni and TomChatt that the community aspect of the data on WeRelate demands that we have a Gedcom download. If the purpose of wikifying genealogy is to get the best information out there, we must have a way for it to get off of WeRelate into the "wild." But in order to keep supporting the mission of producing high-quality data, it is crucial that downloaded gedcoms be sourced properly. I imagine a download where the sources are all the source page "tites" on WeRelate. That would be bad. It would badly degrade the quality of source citation in any properly sourced database, and would create a tremendous amount of work to replace any links back to the WeRelate pages with the actual publication and date information that would allow me to locate the source. I don't object to links back to the source pages, which do contribute useful information, but the downloaded sources should be as complete as possible (using the fields filled out on the source page, I would imagine).

On a separate but related issue, what do we do about the licensing requirements, particularly if someone chooses not to download (or import) sources? Perhaps some explicit statements and instructions during the process about the attribution requirements if people redistribute (I know they can do this now, but it's going to be a much bigger problem once downloading is permitted).

And that reminds me of a technical issue we (uh, you, Dallan) need to be sure to solve -- imbedded links in notes that go to other places on WeRelate need to be rendered as full links that are intelligible when imported into a genealogy program. --Amelia.Gerlicher 14:11, 9 November 2007 (EST)


I'm thinking that a downloaded GEDCOM would include information from the Source/MySource pages on WeRelate as source records in the GEDCOM. I agree that we'll have to include some explicit statements on the download page about needing to attribute. We could put the attribution links to WeRelate on notes attached to each person/family, or on a source record that is cited by every person/family -- any thoughts on which is best? Your comment about turning embedded wiki links to HTML links is a good reminder -- I'll make a note of that.--Dallan 11:29, 16 November 2007 (EST)


Hi, new contributer. Beginer level genealogist. Consider this a comment from the man on the street.... Yes you should allow downloads. But people will need "help" to avoid pitfalls, whether it is an upload or a download. For example, I am one of those careless people who hasn't paid proper attention to how I entered information in our Family Tree Maker. My wife and I have bastardized our usage of the fields so that when I load it up into Werelate, data shows up where it should not. If someone were to download what I loaded they will have to sort through some strange stuff. I need to improve my discipline in managing info in the FTW. (sources, events, and notes fields) I also need to convince my wife that her approach to puting data where she wants is not going to work in the long run. (for example I can't get her to not put Rev. or DR. in the name field...)

I plan on maintaining my own database (FTW) as my primary repository on my home computer and "contribute" to Werelate by publishing what I want to share. (Probably everthing I have as I like to share) But, I will not use Werelate as my primary repository.

A page on gedcom file format and pro's and con's about how people have used genealogy programs incorrectly and the problems this causes as people get more invested into their data repositories would be good...if it doesn't already exist.

(PS I take back any negative comments about my wife, she just handed me tea and home made cookies...) PPs is there a spell checker?

Thxs --PeterP 18:48, 26 November 2007 (EST)


Hi Peter, one of our big challenges is going to be making the GEDCOM export good enough so that you can incorporate the new material that others have added to your tree into your home database on FTW, so that you don't lose what others have added. As you've seen with your GEDCOM, using the fields in FTW for purposes other than what they're for makes the GEDCOM output look funny. I'm not sure about the different oddities that typically occur, but feel free to add any of your observations to this page. And no, there's no spell checker, but Firefox has one built in.--Dallan 17:13, 4 December 2007 (EST)


My vote is a definite yes for allowing downloads of gedcoms; no restrictions. I suggest that you communicate this to new users when they register. Require new users to check a box that the user understands that gedcoms can be downloaded with no restrictions. There are plenty of sites with restrictions; not what I wish for this site.

I would also like the ability to download images or is this already possible? --Beth 10:45, 14 December 2007 (EST)


It sounds like the general consensus is that we should allow GEDCOM downloads. There's already a statement on the GEDCOM import page and on every edit page that "All contributions to WeRelate are released under the GNU Free Documentation License 1.2 (GFDL)." and that "Others can add to, edit, and redistribute your contributions." I just bolded the first part on the GEDCOM import page to highlight it. We could require people to check a box, but unless it becomes a problem it's not as high of a priority as other things.

You can currently download images (one at a time -- right-click on the image to save it to your local disk). Some images are uploaded under fair-use though, so you may not be able to do certain things with those images (possibly not upload them to a commercial site).--Dallan 00:07, 16 December 2007 (EST)


I think that is fantastic news Dallan. Glad to know that I can also download images. I hope every user understands the concept of WeRelate including the GNU Free Documentation License. Call me a pessimist but I envision some users getting upset about this or that and deciding to remove "their" tree as has happened on Ancestry and Rootsweb and probably other sites as well. I removed my tree from a site, but that was because I used the merge feature of their software and the file was so messed up that I gave up and removed it. Anyway just thinking that a statement in "plain English" may save some future woes. Thank you and all of your volunteers for your hard work and dedication to WeRelate.

--Beth 18:22, 17 December 2007 (EST)


I switched the bolding in the gedcom upload text to emphasize the phrase that describes what others can do with your contributions (add to, edit, redistribute) and added "download" as another specific possibility. Hopefully this will make things clearer.--Dallan 17:03, 18 December 2007 (EST)


Thanks Dallan,

I noticed an option to delete one's family tree in the FTE; can the user delete their family tree? --Beth 07:42, 19 December 2007 (EST)


Yes, you can delete the pages in your tree so long as nobody else is "watching" them. If another user is watching one of your pages (which happens if they add the page to their own tree, or if they edit the page and leave the "watch this page" box checked, or if they click on the "Watch" link at the top of the page), then that page does not get deleted.

A problem caused by this approach is what happens when you are watching one member of a family that someone else has uploaded, but have forgotten to watch the other family members, and the original uploader removes the tree. The page that you watched is still there, but the other family members have been deleted. I can restore them if this happens, but one of the things on the todo list for next quarter is a screen that will tell you where your "off-tree" links are -- pages in your tree that link to pages not in your tree -- and give you a chance to add those pages to your tree.--Dallan 12:08, 21 December 2007 (EST)


I am happy to share. However I am concerned about harvesters who then may put the money on for profit sites. It might be a bit friendlier to have the person just contact the submitter. That way they can make contact, chat and then share information as they wish.--Sheri 20:06, 5 June 2008 (EDT)

Hi Sheri, I am not sure that I understand your statement: It might be a bit friendlier to have the person just contact the submitter. Friendlier than what? What exactly happens if a harvester puts the money on a for profit site? The information will still be on WeRelate and people can view the information here with no charge. Please clarify your concerns. --Beth 22:10, 5 June 2008 (EDT)

Merging trees? [12 March 2008]

Anybody else undertaken merging of trees? I've recently been doing manual work on the Slafter family of colonial New England, creating pages by hand. But I discovered that just a couple days ago, somebody did a GEDCOM upload that duplicates a portion of the tree I'd been working on. I looked at the Help topic for merging pages, and have been following that procedure. But I'm realizing that it's quite complicated if you're talking about whole dup branches of a tree, and not just a duplicate page. When you start redirecting pages that are in a tree, then you start getting disconnected bits of tree and "orphan" pages. I ended up making a manual inventory of all the pages with the same names, and which ones needed to get merged/redirected with which. Not for the faint of heart (nor for the non-methodical)!

I agree, definitely not for the faint of heart! LOL. I've done several merges in the past, but because it does involve a methodical checklist of things to be done, I tend to only handle the "minor" merges now until Dallan finishes the new merge program. Amelia has done several merges and appears to have a method to the madness. --Ronni 06:34, 29 September 2007 (EDT)
I second the idea that merging is definitely complicated. There have been various threads in the past as to the various odd effects of a redirect. The most important thing to remember is that if you redirect a family page, it effectively removes the contents of that page -- so all the individuals connected to the family still have pages, and they are now unconnected to the new merged page. (And vice versa with persons to family pages, but that's not such a big deal). I don't necessarily have a great method, but for what it's worth, my general plan has been to identify the main couple I'm interested in, and open up a tab (window) with each family. Then I navigate to the oldest child on each of them down as far as more than one family goes. The idea is to get to people without spouses to merge, or to get to the point where only one tree has further descendants. Once I reach the "end", I can work up: merge kids (which have no spouses by definition in this scenario), merge spouse, merge family page, merge the descendant of the original couple. If the spouse at any level has parents and siblings, this becomes additionally complicated, because one basically has to repeat the process with the spouse's parents' descendants before even getting around to merging the original spouse. This, so far, seems to be kind of rare on the stuff I've worked on, but there's at least one project I skipped in favor of waiting for the automated merge - there were just too many branches. But I hope that helps. (And Tom, welcome to the group of somewhat crazy people merging New England families!)--Amelia.Gerlicher 15:46, 29 September 2007 (EDT)


One thing I'm wondering about -- if I merge two pages and redirect one to the other, do the people who are "watching" the redirected page get somehow transferred to the watchlist of the other merged page? Seems like that's what you'd want. (And if the redirected pages are in other people's "trees", does that get patched up as it should?) I'm hoping I haven't done anything to break other folks' trees.

Dallan just recently added that function that redirected watchers get added to the new pages. Everyone that is watching the "old" pages should now be automatically added to the watchlist of the "new" pages. And as long as the "old" pages are being redirected to the "new" pages, there shouldn't be any "holes" so to speak. But I gotta tell you, I've almost created a mess or two when I was merging some families. I did get orphan pages or have a family without spouse or two. Don't know how that happened, because I *thought* I was being methodical. LOL. --Ronni 06:34, 29 September 2007 (EDT)

Also, most of the GEDCOM uploads I've noticed are associated with User pages that don't exist. Does that mean they've un-registered from WeRelate? Or does that just mean they never bothered to create a "user home page"? Is it okay to create their User/talk page in order to leave a message there? Will they get notified? ---TomChatt 05:02, 29 September 2007 (EDT)

I think in most cases they have not created a user page and/or are simply not active in the community. Usually if someone "unregisters" they'll delete their tree before leaving. In any case, it's ok to add to their talk page. They'll either see it when/if they log back on or, if they have it setup in their profile, they'll be notified by email that they have new mail on their talk page. You can also check their contributions at the bottom of their user page to get an idea of their activity. --Ronni 06:34, 29 September 2007 (EDT)
I'm guessing a number of people uploaded GEDCOM's to see what would happen, without taking time to create a profile page for themselves. I expect that they'll visit WeRelate more frequently once match-merge is working and they receive messages about invitations to merge their pages, and that they'll create profile pages then. You can be pretty sure if you leave a message on someone's talk page that they'll get an email notifying them about it (because most people haven't turned that option off in preferences). But if you want to be absolutely sure, you can click on the "E-mail this user" link at the bottom of their user page or talk page to send them an email.--Dallan 22:42, 2 October 2007 (EDT)

A follow-up to Amelia's comment, and perhaps by way of clarification, I think that bad/unreliable/flawed sources absolutely have a place on our pages - and an important one at that. They should be cited and noted as bad/unreliable/flawed and, ideally, the research that established them as such should be noted. Otherwise, folks just keep rediscovering previously discredited information. This is particularly true of folks like me, who don't have a huge background so we don't instantly know that some sources are not trustworthy. For example, I understand that the Mayflower passenger Peter Brown is the origin of a large number of discredited genealogical lines. At some point or other he was attributed as having had a son. This was later discredited, but chaos has remained in this area for something beyond 100 years. This site in particular, provides a good chance to document both the accepted and discredited research.--Jrm03063 14:48, 21 February 2008 (EST)

Ronni.. to clarify... the "old" page would be the one with the other person watching.. to be redirected to the "new" page, being my page that shows me watching. Correct? --Msscarlet1957 19:14, 12 March 2008 (EDT)
Well, rereading what I wrote, "old" and "new" were the wrong terms to use. Target and redirected might be better words. During a redirect, watchers on the redirected page will be transferred to the target page. Choosing the best target page though doesn't really have anything to do with whether you are watching it or not, or who created it, or how long it's been on WeRelate. This help file explains some of the criteria that's being used when deciding which page to make the target page. --Ronni 00:47, 13 March 2008 (EDT)

Asking Permission [23 December 2007]

While we are on the topic of merging trees, a new user asked me if we should ask permission from the other user we want to merge with before actually merging into their tree. What's everyone's opinion on this? All manners and politeness aside, you could wait weeks or months for a response or never get "permission" to merge with another tree. This particular topic goes along with the "Downloading GEDCOM" topic as well, in that I see it as understanding what happens to your data once it's put online at WeRelate. I realize this has the potential to be a touchy issue with some, so I'm curious as to everyone's thinking or understanding on this. --Ronni 10:50, 17 November 2007 (EST)

I vote no (surprise). Not only would we wait forever, but there's rarely going to be a reason for someone to say no. The only legitimate reason I can think of is if there's a genuine disagreement over whether someone is the same person or not, and I've very rarely ever encountered a case where that was actually true (as opposed to a bad combination of Ancestral File nonsense). If the point of the site is to get the most/best information about a particular person, and the merge ends up with an entry with more information than the previous one, then it's all a win-win. If someone gets upset (assuming the merge doesn't delete anything that's reliable and not duplicated), then they're missing the point of the site. But, I will say that this assumes that the person doing the merging knows enough about the family to use the more reliable information where there's a difference, or to only add parents/spouse when there's adequate support. If you're in doubt about that, better to ask --Amelia.Gerlicher 12:57, 17 November 2007 (EST)

I vote "no" as well, but I do as a practice look to see if the User I'm about to do a "major" merge with is active on WR by looking at their contributions. --Ronni 03:49, 21 November 2007 (EST)


I too vote no. The whole point here is collaboration and there can't be collaboration as long as there are duplicate trees. --Trevorallred 14:40, 23 December 2007 (EST)


Merge Strategy [10 April 2008]

Here's another question along these same lines: merging two overlapping trees might involve merging hundreds (or even thousands) of individual person and family pages. With lots of pages to merge, most people aren't going to take the time to analyze each pair of pages to merge very carefully. So we need to have a pretty reasonable "default" merge strategy. What should that strategy be? For the text we can put the text from one page after the text from the other page. For the events we can list differing birth/marriage/death events from one page as "alternate birth/marriage/death" events on the merged page. Similarly for differing names -- one name can be listed as an "alternate" name. But which events/names should be the "main" events/names and which should be the "alternate" ones? I can think of two possible approaches; maybe there are more?

  1. The earlier-created page always "wins" -- its events/names are the main ones by default. The justification for this approach is that as new people come to the site, upload their trees, and merge their trees into existing trees, it's going to be a bother for existing users if the "main" events in their trees are constantly changing. It will be easier for the new users to adjust to changes to the main events on their pages as the result of the merge, because presumably they'll do most of the merging shortly after they've uploaded their tree. Also, it's likely that the existing pages are more accurate, since they're probably being watched by more people.
  2. We try to come up with rules for which name/event "wins" by default -- the one with the more specific date/place, the one that is sourced, etc.

Any thoughts?--Dallan 22:29, 17 November 2007 (EST)

I like the reasoning behind number 1. As to suggestion number 2, will it really matter if one has more specific data than it's merge mate since I'm assuming all the data from one will be copied to the other?
Like you said, merging trees could involve hundreds of pages, but in the merging process, we will have the option to "cut off" the merge where we want? For instance, I'm generally only interested in the parents of an allied spouse and thus would not want to add their grandparents, siblings, etc to my tree. --Ronni 13:45, 18 November 2007 (EST)
I'm thinking that by default you would add only the merged pages to your tree, not relatives that you had not merged into your existing pages. The problem of identifying people who are not in your tree but are related to people in your tree so that you can choose whether you want to add them to your tree is another important problem though, and it needs to be addressed at the same time that we address merging, if not sooner. I'm thinking that we would provide a screen that would list all the people outside of your tree along with the people in your tree that they're related to, and allow you to choose which ones to add to your tree.--Dallan 23:15, 20 November 2007 (EST)

I think that any technique for merging John Doe (i) and John Doe (j), must preserve all the information from both "i" and "j", and must clearly indicate that it is an automatic merge (and the provenance of the contributions). What parts are more "believable" or better sourced is going to be easy for a human to understand but pretty tough to make a program understand. Better to be sure you lose nothing and hope a human will clean things up.

Besides merging existing people, I was also wondering whether there is a way to perform a less than complete GEDCOM upload (thereby avoiding the need to merge common individuals). Can we imagine a reasonable UI that would break GEDCOM import into a two-step process? Instead of a one shot load the whole batch, a two step process that would build up a list of names (from the GEDCOM) that already appear to be present on werelate? The user would then be free to pick whether those names are uploaded as new individuals or whether the existing werelate individual is substituted for a particular person.--Jrm03063 18:44, 18 November 2007 (EST)

That's an interesting idea. I'll have to think about that some more. Perhaps the tree (multi-person & family) matching + merging could be handled at GEDCOM upload time. For pages that are entered or edited by hand, we could support matching + merging of individual person and family pages by sending an email when the system finds matches and providing a "merge" link at the top of every person and family page. But maybe it would be ok to limit the multi-person & family merge to just the GEDCOM upload process.--Dallan 23:15, 20 November 2007 (EST)
Does that mean that the large overlapping trees that we've been leaving for the automatic merge will still have to be done by hand? Or am I misunderstanding what you're responding to?--Amelia.Gerlicher 23:32, 20 November 2007 (EST)
I thought about that too after I wrote the above. There's no way we'd want to make people merge existing trees by hand. If we use the above approach, we'd have to do something special so that people with existing trees could merge them as if they had just uploaded their tree.--Dallan 10:14, 22 November 2007 (EST)
But we would want to have the ability to "trigger" this merge feature whenever we want, correct? I have been envisioning something similar to the FTE that just sits there waiting for me to use it or not. If we have a merge utility that will scan our trees whenever we want and then give us a list of possible duplicates, then on a case by case basis we could decide whether we want to merge or not. I also envision this program being able to scan our data and compare with others based on certain criteria that is user defineable (dob, dod, place, parents, etc) and not just the name of the individual we are comparing. I also envision (if you're gonna dream, dream big <g>) this utility being able to handle people that I can mark as NOT duplicates, in other words, once I've scanned them and decided they are not a match, I can mark them so they aren't included again in future scans. --Ronni 10:47, 22 November 2007 (EST)
That's one of my big questions in matching. Matching your tree against the other trees to see where possible overlaps are is a pretty machine-intensive process, especially if your tree has a lot of people in it. If we supported this I'd want to limit how often people could request that their tree be "re-matched" against the other trees to once a month or something. Another alternative would be to allow people to match a specific person (a family really, since the match will take relatives' information into account) in their tree against the other trees, which would be much less machine-intensive, but probably also a lot less helpful since you'd have to visit a lot of pages if you wanted to re-match your entire tree.
Remembering the people that you have marked as not a match is a good idea.
Why would you want the match criteria to be user-definable? I was thinking that the computer would calculate a score based upon all of the pieces of information (dob, dod, parents, etc.) that matched, not just the name.--Dallan 17:13, 4 December 2007 (EST)
Ah, score based. Hadn't thought of it that way. That would do of course. :) --Ronni 13:09, 6 December 2007 (EST)
It depends on the way the software is written if we should match BEFORE importing the GEDCOM. I believe we should leave the match code in one location, which means it's actually easier to import the data then delete/merge it into the main trunk after.--Trevorallred 14:53, 23 December 2007 (EST)

I think it's also important to remember that the problem of merging two overlapping trees actually is two problems: matching, then merging. It's much easier to first match the names, then queue them up for merging. At that point we "could" have the computer do the actual merging using some really good hueristics that Dallan has already eluded to. I think it's impossible for the computer to do the matching automatically. There are simply too many variables at this point to trust a machine to match.--Trevorallred 14:53, 23 December 2007 (EST)


I think that matching on a tree-by-tree basis probably isn't a great idea. It's not "work efficient" (for you CS folks out there). Matching shared genealogy is the heart and soul of what werelate is about, and I'm struck that it should be sort of fundamental to the way werelate works. What if we think about matching as more of a re-indexing process, where a set of match candidates is associated with any person or family. Whenever a person or family changes, it gets marked as needing to be recomputed for purposes of matching. When the match index for a person is recomputed, the previous match set becomes a starting point of families and people to rule in and out first. People who drop out of the match set or get added to the match are themselves considered changed so they are marked for being matched again. Of course this means that matching is a continuous and ongoing process, and that changes will have the effect of generating work for the matching engine (or robot, or whatever you call it) - but so what? That's what werelate is here to accomplish. It also means that matching work needs to be queued in a way that prevents any single cluster of related names from hanging up the match robot so that other areas of the werelate data base go begging. Maybe some sort of oldest unindexed page first process...--Jrm03063 17:28, 4 January 2008 (EST)


As individual people and families change, we'll certainly match just that one person/family, and that will be a continuous process. When I talk about tree matching, I'm thinking about GEDCOM uploads. If you upload a GEDCOM containing say 2,000 people, and 200 of them match someone else's tree, I don't think you want to be presented with 200 different match questions one at a time. It seems that it would be a better experience to present it as a two-step decision: e.g., "200 people appear to match tree A, and 50 people appear to match tree B". In the first step you click on the link that takes you to the list of matching people in tree A, and in the second step you decide which of the 200 match candidates from tree A you want to merge with. As you're checking boxes to determine which pairs of people to merge, the system let's you know which of the remaining matching pairs are related to people you've already decided to merge and which are not. So for GEDCOM uploads, it seems that making the matching decisions up-front will be a better experience than making them one at a time.--Dallan 11:34, 7 January 2008 (EST)


As long as we're talking about GEDCOM upload in particular, I completely agree, a two step process on a per tree basis is essential. I can imagine, for example, if everyone who has a Stephen Hopkins mayflower line were to upload it, after someone had gone to the trouble of creating a really comprehensive and nicely done Stephen Hopkins page - ick!

My remarks were in the context of looking for matches within existing data as it is changed and updated in ordinary use.--Jrm03063 12:28, 7 January 2008 (EST)

It occurs to me based on JRM's Stephen Hopkins' comment, that it might be useful in the GEDCOM merge process to have an option that identifies an identical person/family, but doesn't upload any new information. Say, for example, I have Stephen Hopkins in my db, but he's the father of a wife of a sibling to my line, so all I have is a vague birthdate from the Ancestral File. There's no need to "merge" that information into the page on WeRelate, it would just clutter up the Stephen Hopkins page. But my tree should connect to that page. And more likely, Stephen's daughter is already on WeRelate too, but maybe her husband isn't. So if I had an option like "link tree, do not upload my information", I could choose that, and the result would be a new family for Stephen's daughter, but the information already there on Stephen and his daughter isn't complicated with my not-ready-for-primetime details. I know we theoretically want to gather everyone's wisdom on these people, but the reality is that a lot of people have junk in their files, and the degree to which we can avoid having it poured unfiltered onto pages that people have worked hard to make coherent, the better the experience for everyone. This would also be helpful for people updating their own (or a close relative's) gedcom, and would cut down on both the automated and manual computer work if people can avoid having the computer do a lot of merging that will just have to be edited out by hand later.--Amelia.Gerlicher 13:29, 7 January 2008 (EST)

That's a good point. We ought to have a "disregard my information" merge option for both individual-matches and also for tree-matches.--Dallan 18:20, 8 January 2008 (EST)


Regarding merging, at least in respect to an imported GEDCOM, I think the submittal process ought to require the person to identify one person in the GEDCOM as an existing person in werelate. Then follow the relationships stored in the GEDCOM to match up the others. You can't rely on name matches or birthdate matches. Only persons connected to the anchor person get merged. Dangling trees are ignored.

What if someone's GEDCOM has no matches? They manually enter the anchor person, then import.

Conflicting data could be scored, though how you score between two different sources is beyond me. A source that is an ancestral file or ancestry.com is hardly a source since half of those are people's opinions, not reflections of real sources as indicated by the number of people out there propagating known errors. Precision might be a valid criteria, for example 24 Mar 1789 might be allowed to replace 1789 though it is not as clear if they are inconsistent as in 24 Mar 1789 and 1792.

So I believe extra weight goes to the first comers. If you want to change the data that is there, you have to do it manually. Not that first comers necessarily represent better data, but it ensures thoughtful overwriting. To err is human, but to really screw up requires a computer.

It means more manual editing, but I think that is necessary to avoid the damage caused by somebody importing a GEDCOM they downloaded from who knows what website or similar scenarios. The goal is accuracy first and as all experienced genealogists know, accuracy is not easy nor straightforward.

--Jrich 13:33, 4 April 2008 (EDT)


Before we ask people to identify the anchor person manually, I'd like to try an idea I have on finding anchor points automatically. I'll hopefully have something ready to try by June/July.--Dallan 11:48, 7 April 2008 (EDT)


I don't think I envy you your task.

Within the scope of a single person, I have much difficulty thinking that a computer can do a reliable match. However you choose to weight the different facts when looking for a match, it will be wrong in some cases. And that's assuming the GEDCOM has enough facts in it to start with. How many times have I seen where many individuals are represented by no more than a name? Or the same person called Mary here, Polly there. Or a town with four grandchildren all named after the same honored grandparent and all born in a very short timespan.

So there is not much hope unless you take into account earlier and later generations. Then it becomes far more likely to be accurate. However, remembering that many disagreements are exactly over who the parents were, or how many children, etc, will this work? What if the GEDCOM being merged has a string of 10 generations but right in the middle of it it has different parents from the person being merged with. So now does the computer create alternate parents and now reanchor the remaining subtree to the newly proposed parents? If the new parents are brand new to the database, then does that imply all their ancestors are new too? Maybe this GEDCOM is proposing a heretofore undocumented parent between matching grandparents and grandchildren. This argues for the previously discredited (by me) person-by-person matching.

How do you decide what scope to use?

I'm almost inclined to suggest that you "punt".

The secret weapon of your website is time. Over time the data will become better and better quality. The potential damage of computer-generated mistakes will get worse and worse. Speeding up data entry is not necessarily the top priority. (Enabling collaboration to arrive at a higher quality of data than is achievable by oneself is, IMHO.) The facts we are entering are no longer changing so there is really no rush to enter them. Over time there will be less data entry and more comparison anyway as the database gets better populated. Computer-aided data entry probably means the user has not taken the time to see if their input is needed, nor have they discovered that, "Heck, look at this! Somebody has some information I wasn't aware of! Who would have thought that was possible?"



However what is needed badly is an easy way to see that I am not entering a duplicate or to find the person I am interested in collaborating on. This is an entirely different search than a Google-like search for any use of the parts of their name anywhere on any page. It is a more structured search. Many sites can take characteristics I enter, such as name, range of birth dates, location, and return results where the matching people come first, then those with slight name variations, then increasingly remote birth dates, etc. The more John Smiths that get entered, the more important this will become.

Addendum: example: I created Mary Wheeler-134 the other day. If I search for the given name Mary and surname Wheeler in Namespace People and Families, I get 1012 pages. If I add Person: Mary Wheeler to the keywords I get 662. If I put "Person: Mary Wheeler",) i.e., with quotes, I get 113, which is probably how many Mary Wheelers there currently are remaining in the system. On the list of the 113, the displayed blurb shows no useful information except for her name in about half the cases. Quite hard to tell if any of them are the one I want. 20 years from now, how many Mary Wheelers will there be?

--Jrich 14:36, 7 April 2008 (EDT)


Perhaps a separate topic, but using the "Browse Pages" function and comparing the other 133 or so Mary Wheelers isn't going to work. It's been discussed in the past about using date ranges in the title to help make distinctions between Persons of the same name and since I use the "Browse Pages" feature quite a bit, I'm inclined to agree more and more that we need to come up with a better way of quickly identifying the Mary Wheelers we are really interested in. --Ronni 19:49, 7 April 2008 (EDT)


This probably does go somewhere else, but it builds on the previous comments. The Browse Pages does help. It is non-intuitive that browsing is more focused than searching, but I guess that part of the learning curve.

The titles are still a problem. It doesn't seem like it would be hard on the browsing page, assuming it is a most common problem there, to insert something to take the returned title, recognize certain namespaces in the title (i.e., Person:), dig into the page and build a more descriptive replacement. Although maybe digging into the page would be too costly?

It would be nice if the internal link button on the edit page caused a popup version of the Browse Pages page so you could search for and select your link instead of typing it. Am I just in need of more learning here?

Have you tried clicking on the "choose" link that you get when entering people and families on Person and Family pages?--Dallan 15:32, 10 April 2008 (EDT)
Actually I was talking about working on Talk pages. In my discussions, if I refer to a source, I think it would be nice to make that a link to the Source listing, or I mention another person, etc. The internal link button does what I want except it is just boilerplate. I was suggesting it would be nice if it did more. It is easier to type the double brackets so effectively the internal link button is only a reminder of what the format is.
That's a good idea; I'll add it to the todo list (which is getting a little long now so it might be awhile)--Dallan 18:08, 10 April 2008 (EDT)

--Jrich 12:44, 10 April 2008 (EDT)


The new search functionality will have a "match" function that will return results in relevancy-ranked order, and the search result list will include data elements like birth date&place and death date&place. I'm working on this now.

Matching is do-able, it just takes time to develop. Several years ago I worked on a matching algorithm that found 95% of the possible matches and picked the correct match 95% of the time. There will always be cases where the computer guesses wrong. Making the final match decision does not play to a computer's strengths. What a computer is good at is bringing the probable matches to a human's attention, which can significantly reduce the amount of time you have to spend searching for them yourself (unless you want to, which the new search functionality will allow).

Once we get the new match functionality working, I'll list probable matches when people try to to add new Person or Family pages so that they can choose to link to an existing page rather than create a new one.--Dallan 15:32, 10 April 2008 (EDT)


Trouble ahead? [1 February 2008]

I am all for merge but I can't help but wonder how all this will work in the best interest of WeRelate and keep users happy. I feel the idea of merge is to MERGE.. and not to be picky about who gets merged or not into "my tree". If there are two individuals that ARE the same person they would be merged, whether they are in fact relation or not.. such as the parents of an in-law or the parents of that parent's of the in-law. Specifically because "we Relate". I don't have a problem with this however I can see where other's might be offended. It is difficult to explain my point...

On Wiki Pedia nobody "owns" any wiki page there.. and all members can contribute and edit and those pages are permanent. Here on WeRelate folks are worried about THEIR databases... getting cluttered with unwanted people via a merge. So what happens after a merge and someone deletes their gedcom off of WeRelate??? what happens to pages for those folks that were in that Gedcom that is now gone? are they thereafter floating out there as orphans?

Example: Sally Snodgrass uploads "Gedcom A"; Bill Smith uploads "Gedcom B" and sees many if not most of his people match with "Gedcom A" and he spends hours merging and making it all look pretty.. then Sally Snodgrass sees that Bill merged the parents of an in-law with one of her Aunt's husbands and doesn't want so many people in her tree.. and in fact decides to just remove her tree altogether because she is miffed about Bill's work.

You can please some of the people some of the time but not all the people all the time. People are protective about their work. I don't think this would be an issue if folks did not want to download their gedcom with edits back onto their home machines, but I think there has been discussion about members hoping to be able to do that.

Why would WikiPedia have three pages that tell about Napoleon Bonepart?? Why would we choose not to automatically merge John Smith born 1903 died 1940 in the same place and who has a high "score" thus being a match? Just because? so if Sally Snodgrass does have John Smith and chooses NOT to merge him with an exact match.. but along comes Bill Smith who sees the obvious and goes ahead and matches these two up.. and therefor links up all the ancestors to John Smith, people who are NOT of any interest to Sally Snodgrass what will happen? and even if we call in counselors to have Bill and Sally be nice, what about Mr. Newbie that comes along and sees the same match and starts merging on his own as well? I know some of this is available to happen now, as we can merge, but as it stands it is so daunting to merge that I am guessing few would bother. However once it is automated there could be conflicts.

I myself am excited about the prospect of automated merging.. I feel this will help tremendously because my database has 56,000+ people in it. I have to break it down into SO many small gedcoms. I go to one of my immigrant ancestors and begin with him and include all his descendants, and repeat that process over and over, and thus I have all kinds of duplicates on WeRelate as a result, especially since I myself am in each of the gedcoms.. and so are my parents... and so are most of my grandparents and all their siblings, etc! Once automated I can just merge all the "mini gedcoms" into one big family. But will this BIG family then be too large to work at WeRelate? will a huge merged file cause the FTE to slow to a crawl? --Msscarlet1957 23:12, 24 January 2008 (EST)


I get the impression that merging is going to come in two flavors. One is to simply avoid or suppress upload of portions of a GEDCOM that are already present on werelate. I believe it's been described as a two-step process, where the entire GEDCOM is uploaded to some temporary space, compared against the overall content of werelate, and then somehow the results will be presented to the user allowing him to pick and choose what is actually conveyed into the general werelate space as new person/family pages. The second form is after-the-fact of upload - recognizing the different copies of Napoleon Boneparte. I believe there's a vision for a tool that will allow a user to say that "Person:Napoleon Boneparte (27)" and "Person:Napoleon Boneparte (28)" should be automatically consolidated to "Person:Napolean Boneparte (27)", and "Person:Napoleon Boneparte (28)" becomes a redirection, but you can do that sort of thing right now manually.

As for tree implications, I think that anyone with either Boney (27) or Boney (28) will still have those references, and they would just jump to Boney (27) when they go to view that part of their tree.

I don't think deleting a tree has any real global significance. I think it just amounts to a page of references - the pages for person, family, source - don't know the difference.

I've been doing lots of merges manually. I'm struck that this is really the point of it all. If someone wants to maintain their work in isolation, werelate just isn't a tool that they're going to like.--Jrm03063 07:37, 25 January 2008 (EST)


These are all good points. This is why merging is actually much more difficult to get right than matching. Here are some thoughts.

Merging is going to come in two flavors: a tree-based merge when you first upload your tree (we'll also have to do something like this for existing trees), and after-the-fact mergers for people that have been entered or edited on-line.

I agree that people getting offended because someone merged or edited their tree is going to happen. I also agree that WeRelate isn't the place for people who want to work in isolation. There are other websites for that. Hopefully it won't happen too often, but if someone does delete their tree, the parts of their tree that have been merged into someone else's tree (and so someone else is also watching those pages) won't be deleted.

I'm thinking that we'll also need an "unmerge" function. It would be pretty frustrating if merge were a one-way street.

The reason that we have the concept of a "tree" is so that you can limit the number of people that you care about. People in your tree can link to people that are in someone else's tree but not in yours. So if someone else merges their tree into yours, chances are that some of the newly-merged people link to people in their tree that are not in yours. You should be able to add those people to your tree, but it's your choice who from their tree you want to add into your tree. It's ok to leave them outside of your tree and just have people in your tree link to them.

I'm reluctant to automatically merge anyone, especially at the beginning. I'll keep a log of who people chose to merge and who people chose not to merge, and the score associated with each pair. If after awhile we see that if the score is above X people choose to merge 99% of the time, then we could consider doing an auto-merge in those cases. But I know of other genealogy databases that did auto-merging and people weren't too happy about it.

I'm slowly making the FTE better able to handle large trees. It's much better than it was a few months ago, but I don't think it's ready for a 56,000 person tree yet :-). But it should be by the end of the Summer; certainly by the end of the year.

As I get closer to implementing merge, I'll post more ideas here and ask for feedback.--Dallan 15:25, 28 January 2008 (EST)


I've been assuming that trees are just a table of references, and that deleting a tree has no particular implications for the person, family, source, image, or other pages referenced. Is that incorrect? Is there some sort of implied delete of a person page performed if a particular page is referenced by no other tree??? What if the page is referenced by another person or family page?--Jrm03063 15:37, 28 January 2008 (EST)

Your theory is incorrect. If you delete a tree, it deletes all the associated pages that are not either being "watched" or part of someone else's tree. If, for example, I'm watching your family page for Ann Smith and John Doe, but I'm not watching any of the person pages, all the person pages will go away and the links will turn "red" when you delete your tree. If I click on one of them, I get the page that says "this page has no content, click edit" along with a link that invites me to see that there's a deleted edit. Unfortunately, I can only see who and when deleted it, and not the content that was deleted.--Amelia.Gerlicher 20:05, 28 January 2008 (EST)
Amelia is right. Although it is also possible for a page to not belong to any tree and still be part of WeRelate: if you remove a page from your tree, that doesn't delete it. But if you delete your tree, it deletes all pages that are not being watched by someone else.
Well, I hope you had good reasons for that. It isn't very wiki-tuitive, and it runs rather counter to the notion that we're trying to build a continuous non-proprietary fabric that spans from the space being studied by one user to the spaces being studied by others. I suppose there's a notion that you don't want to have the space filled with junk that no one is explicitly watching, but for every literal delete (individual or by tree) there must be ten or more pages that are just abandoned in place. page deletes don't recycle the name or the number-permuted name (for people and families) so that doesn't seem real helpful. If abandonment is ultimately treated as a delete, you're going to wind up chucking useful stuff, so I don't imagine you want to do that. What is the summary rationale for this delete behavior? And what's the thinking on abandoned user content - I've been hoping that werelate keeps stuff essentially forever. Well done genealogical research would seem to have a useful shelf-life greater than that of the typical genealogist....User:jrm03063
I know it's not very wiki-like and I'm not completely happy with it, but here's why. First of all, just because nobody is watching a page, that doesn't mean that it gets automatically deleted. The only way a page gets deleted is if the only person watching the page deletes it. So if you delete your tree, all pages in your tree get deleted unless someone else is watching them. Here are my reasons for allowing people to delete pages:
(1) currently, if you've uploaded a GEDCOM and you want to re-upload an updated version of it, the updated pages aren't automatically merged with the original pages. So to avoid creating duplicates, we ask you to delete your original tree first. This problem will go away in the next couple of months because we'll allow people to upload updated GEDCOM's into the same tree.
(2) Early on we did not allow people to delete their trees, and several people complained that they could not delete their tree after deciding that they didn't want to use WeRelate anymore. Rather than arguing that deleting the pages that you had contributed was unwiki-like, I added the delete functionality instead.
Once we get problem (1) solved, I might turn deletion into something that only an administrator can do, and ask people to send one of the administrators an email if they want to delete their tree. This would give us a chance to notify anyone watching at least one of the pages in the tree that the pages nobody else is watching in the tree were about to be deleted, and to give them a chance to watch those pages if they wanted them to be retained.--Dallan 14:48, 1 February 2008 (EST)
Thanks for the reply. I can see that there are some practical concerns driving this, at least until things reach a greater state of completion. Just so I'm sure about it - does the delete occur when the last tree reference I have goes away? Or would it go the first time that a page was referenced in a tree being deleted? I very much hope the former. Also, in order to mess with larger chunks of people-space, set operations (union, intersection, exclusive-or?, copy/assign) might be handy...
Yes, it's the former. At some point I may add set operations, if they turn out to be generally useful.--Dallan 19:24, 4 February 2008 (EST)

I am probably going to be one of the users that causes trouble regarding automatic merges. While I like the concept of the idea in general and do wish to link to other families; some of the familes on WeRelate are not ones that I consider properly researched and sourced. WFT #233 or research of John Doe tells me nothing. There are at least six trees on RootsWeb that cite one source; which I have proven to be incorrect on the family that I am now entering. These people's trees are incorrect because they did not bother to do actual research. A simple check of census data, in this case, would have eliminated the problem. If I have no choice regarding the merging of my file with another one; I am not sure that I would continue to use WeRelate. Perhaps you can explain to me how you envision this concept of automatic merging to work? --Beth 18:50, 28 January 2008 (EST)

To clarify my comment about a source titled Research of John Doe; you will find that I do use the source, Research of John Doe. But first I must have complete confidence that this person performs quality research and second the researcher has provided me with the sources cited in the research. In my citation you also find the reference to the original source cited. When I view the original source, I change the source to that but still give John Doe credit for the research somewhere in the document. --Beth 18:00, 30 January 2008 (EST)
I think the idea is that when people get "merged" -- assuming that the two entries are really the same -- both sets of sources are retained. And anyone watching either original person gets an email. Then, as the one who cares more/has better information, you can edit out the nonsense appropriately. This is effectively what's done now with a manual merge -- if there's a sloppy Ancestral File version with the wrong parents, and a well documented version, I keep the second one virtually unchanged, and the first one gets a redirect. Usually if I actually delete parent or spouse links, I explain why, or add some sort of "unproven" disclaimer if I keep what I think (but can't prove) is bad data. If anyone gets upset at someone else deleting their unreliable sources or proven wrong research, I really have no sympathy. On the flipside, though, part of using WeRelate is accepting that someone can come along and edit what you add, and you have to accept the responsibility of changing it back if it's wrong.
On the technical issues, I think there are two concepts under consideration:
  1. Gedcom upload that informs you that X people are duplicates and asks if you want to merge or skip the upload of your data(which you should do for the peripheral people on your tree you aren't sure/don't know much about, or to avoid duplicating wholesale a whole bunch of work). Those merges are then accomplished automatically by adding the gedcom information to the existing record, as new "alt" fields, or appended to the existing note field.
  2. Merge approved by a human based on an automated match, that's then merged as above.
As Dallan notes above, true automated merge without human approval is unlikely for a while -- there are way too many cases of people with the same name in the same area born close together to assume that they are the same.

--Amelia.Gerlicher 20:05, 28 January 2008 (EST)

Right. Also, we'll have an "unmerge" button so someone can undo a merge in case that's necessary.--Dallan 22:51, 30 January 2008 (EST)

Merge Video will be needed... [7 February 2008]

Dallan, I just wanted to suggest that once you get the Merge thingy up and running.. we will definitely need a "how-to merge video" :-) --Msscarlet1957 08:38, 7 February 2008 (EST)

Very true! -:)--Dallan 22:46, 9 February 2008 (EST)

Hand-merging isn't so bad...and maybe it could be a whole lot better [25 March 2008]

I've spent the last few days doing a lot of merging, and the process isn't really that awful, especially once you develop a few practices that keep you from losing track of where you are. But it also seemed to me that it could be made a lot easier without anything in the way of UI changes. All that is required is being a bit smarter about what happens when a redirection page is checked in.

Consider a situation of two family pages that nominally represent the same family. At present, before redirecting A to B, I copy record guts from B to A, attach all the children on B to A, then merge the parents. This leaves me with a family that often contains duplicate children, but I havn't lost anything and it's easier to just work on resolving such duplicates on the target family page.

Could the check-in of a "#redirect" on a family page be jiggered such that:

* Any children (not already present on the destination) are reparented to the destination
* Any parent (not already present on the destination) is added as a parent or alternate parent

That leaves you with a consolidated family page that presents the needed merge in a more obvious way. It also prevents the situation of inadvertantly cutting off a line in the merge of a family. Finally, it is upwardly compatible with current practice (it would be very strange to use a family redirect as the way to cut away a particular incorrect line).

There's probably a corollary procedure for merging people - taking the union of the parent and spouse relationships and making sure the unique set is preserved in the redirection target.

Of course this leaves you to merge the page contents proper, but that's the easy bit anyway - just open both pages and keep them both alive until the redirection target has all the content of the source. If you stop in the middle of that, nothing is lost.

Thoughts?--Jrm03063 17:28, 10 March 2008 (EDT)


That seems like a really good idea! I don't see any downsides to it, and it would be pretty easy to implement. So when you redirect a family, the husband, wife, and children would be added automatically to the target family, and when you redirect a person, the parent and spouse families would be added automatically to the target person. What do others think about this? If there are no concerns, I can implement it the end of this week or early next week.--Dallan 11:07, 12 March 2008 (EDT)


I like it! I had actually started doing it this past week on a couple of merges. No one got lost, everything was tidy and easy to keep track of and it created a "bookmark" of sorts that I could come back to to finish up. I've thought about it for a couple of days now and haven't come up with a downside yet. So, if I understand this correctly, the steps involved for doing a merge this way would be:

  1. pick the target page
  2. copy all data (b, d, m, notes, etc) from duplicate pages onto the target page
  3. redirect dup pages to the target page
  4. note the alt spouses and/or additional children and continue to merge as needed

--Ronni 12:06, 12 March 2008 (EDT)

That's right!--Dallan 01:41, 17 March 2008 (EDT)

I'm glad to hear that this might be easy to do. It's essentially the mechanical practice that I follow in working through a merge so that I don't lose a connection. If the check-in of a "#redirect" had this additional behavior, I could move a lot faster and more safely.--Jrm03063 12:12, 12 March 2008 (EDT)


I like the idea of being able to include the spouse and children. My one attempt at hand merging left me shivering in fear! What a mess one can easily make (I found out to my dismay!) And the file I found to merge with has a LOT of merging to be done! Scary stuff! --Msscarlet1957 19:11, 12 March 2008 (EDT)

Ok, I'll add it in the next couple of days and leave a message here when it's ready. Thanks for the suggestion!--Dallan 01:41, 17 March 2008 (EDT)


It's a week later (I've spent too much time fixing bugs in the digital library), but the #redirect suggestion is working now. If you edit a page and make it a #redirect to another page, the people/families and images that the page links to will be added automatically to the redirect target. I tested it and everything went well, but if you run into problems please let me know. It's a great suggestion!--Dallan 21:56, 24 March 2008 (EDT)


This works wonderfully Dallan! And I agree... great suggestion! --Ronni 23:31, 24 March 2008 (EDT)


I'm glad this is working out. It's working great for me too. It's only possible though because whoever thought about the data structure up front was careful enough to allow for this. Whoever had the wisdom to allow for alternate parent connections and alternate husband/wife connections deserves thanks.

Now everyone, go forth and MERGE!--Jrm03063 13:02, 25 March 2008 (EDT)


Merges lost when GEDCOM updated? [18 March 2008]

Here are questions I have after working 4+ hours doing a HUGE merge (which is still in progress). There has been discussion about the ability to "re-upload" an existing GEDCOM to update it with new information obtained. And the way to do that would be to match the Reference Number created by the GEDCOM itself from past and present uploads.


Scenario:

What now happens with PersonA?

Maybe the "redirect" information should have a different place to be placed? But that would not totally solve the problem. I feel there is still potential of loosing any new events John Doe may have added to PersonA at home in his database before an upload to update his GEDCOM --Msscarlet1957 11:45, 14 March 2008 (EDT)


In order to support GEDCOM re-upload, a new source citation will have to be added to each person & family in their tree. This source citation will contain a "permanent link" (URL) to the specific version of the page that they last uploaded. We'll add these sources directly to the uploaded GEDCOM and make this modified GEDCOM available as a download. People will have to import this modified GEDCOM back into their desktop genealogy software in order to be able to re-upload their GEDCOM later. Since we'll add the sources directly into the uploaded GEDCOM file, there shouldn't be the information loss that you usually get when going from one GEDCOM format to another.

People will have to import this modified GEDCOM back into their desktop genealogy software in order to be able to re-upload their GEDCOM later. I am going to be one with a big problem with this part. There is no way I can import a GEDCOM from WeRelate back into my TMG (The Master Genealogist software). At present your software (or anyone's software for that matter) does not support all of TMG's abilities. The main difference being the ability to add a "witness" or Witnesses to any tag. For example: a 1920 census I add the information to the Mom and the Dad of a family, with "principle" rolls in that event. Their children are added with "witness" rolls, and the sentence structure for their participation for that event is totally different when a report is created to be published. For Example: "Ralph John Kuhn appeared on the 1910 Federal Census of Hopewell twp., Seneca Co., Ohio in the household of his parents Daniel Charles Kuhn and Lillian Sophie Kuhn." Whereas the Principles to this event have this sentence: "Daniel Charles Kuhn and Lillian Sophie Kuhn appeared on the 1910 Federal Census of Hopewell twp., Seneca Co., Ohio, enumerated 06 May 1910, renting their home. They only have two children at this time: Ralph and Gertrude. Lillian lists she is the mother of 3 children with 2 living. They are living next door to John F and Victoria Kuhn" Once I upload a GEDCOM to WeRelate, the principle information is there but all witnesses are lost. This does not bother me, because it happens anywhere I go. However I am unable to "import" my own gedcom back into my program. Instead, as I am doing this huge merge, I am hand entering any additional information I find in John Doe's file into my own database, on my machine as I go along. Anyone that uses TMG will also be unable to import their own GEDCOM from WeRelate, if they had any witnessed events.
I don't think I was clear on this. The only change that we would make to the GEDCOM you uploaded would be to add a new source citation to every individual and family. Otherwise it is exactly the same GEDCOM that you uploaded. We're not talking about doing a GEDCOM export from the wiki pages here; we're talking about modifying your uploaded GEDCOM directly - inserting source citations but otherwise keeping everything else the same (precisely to avoid the problem you mention). There would be a function for you to download your modified GEDCOM that would be separate from exporting a GEDCOM from the wiki pages. Most genealogy programs (I assume TMG is included) can export a GEDCOM and then re-import that GEDCOM without losing any information. And it shouldn't be that difficult for us to process an uploaded GEDCOM and insert new source citations but otherwise keep every other line the same.--Dallan 12:07, 18 March 2008 (EDT)

So when the person re-uploads the GEDCOM, we'll know which pages they're updating, what each page looked like when they last uploaded their GEDCOM, and what each page looks like now by following any redirects to get the current version. Using this information we can determine

If the changes made by the uploader are to different fields than the changes made by others, we apply the changes made by the uploader to the current version of the page. Changes made by others don't get erased; changes made by the uploader show up as changes to just those specific fields. The uploader must now download the new GEDCOM with source citations containing permanent links to the now-current versions of each page.

Suppose the uploader and others modify the same field. There are two ways we could go with this; I'm thinking about going with the second:

  1. We don't automatically modify these fields, but send the uploader an email telling them about the conflict (i.e., the changes they made and the changes made by others to the same fields), and asking the uploader to modify the conflicting fields by hand.
  2. Instead of modifying the field, we add the uploader's conflicting edit as an "alternate" piece of information, and send the uploader an email telling them about the conflict and that we added their change as an alternate.

Another issue is what happens if the new GEDCOM doesn't contain all of the person & family pages that are in the tree. Rather than trying to delete the missing pages, I'm thinking we should send the uploader an email with links to all of the pages in their tree that weren't in the newly-uploaded GEDCOM, and let them decide if they should be removed or not.

I think this covers all the bases. Does this answer all of your questions?--Dallan 01:41, 17 March 2008 (EDT)

I think that at least I can see that the information would not be lost IF John Doe where to follow the directions, but I do not see that happening either. At WorldConnect when you want to update your gedcom, you just check the box that this is an update and all works like a whiz, no additional efforts needed. There needs to be more effort made to cause WeRelate to be "easier" not more difficult that other websites.
So I see two problems ahead in your process:
  1. Any member using TMG will be unable to ever update their GEDCOM, they will have to delete their file and upload again, as is set up now, and thus loose all set up links to photos uploaded into the image section. And any changes made during any merges.
  2. Members may not want do updates because of the complicated process.
I know you are working very hard, Dallan, and I appreciate that. I really think WeRelate is a great site. Maybe somehow, someway there could be some other way to implement the update process? --Msscarlet1957 10:04, 17 March 2008 (EDT)

I think I understand where Dallan is going with this. Matching an arbitrary GEDCOM against the huge universe of werelate is really impractical. Trying to make a program smart enough to know what is a good enough match and what isn't is essentially unsolvable. What can be done though, is to attach a source reference that tells werelate specifically that a person somewhere in a gedcom absolutely is a certain person in the werelate universe. Generally speaking, the easy way to get your home system in sync w/werelate would be to obtain a fresh werelate GEDCOM download, which will have the appropriate tags in place for all the people - but you wouldn't have to. I presume that the "werelate" designator source/tag will have a format that allows you to directly enter it into your home genealogy program where appropriate.

There is a place for the sort of guessing/probable matching in werelate - it's when we have a feature that allows the system to browse for potential matches in the werelate universe. That does not result in automatic merging though, but instead, in a set of candidate matches that the next researching coming alone can review. If the human is persuaded by the match, then the human can perform a merge or request a default merge procedure. But combining detection with actual merging logic seems to me extremely perilous (take a look at ancestry.com's "one world tree").

I appreciate that it's not totally "hands off", but it's going to yield a far better data base.--Jrm03063 13:19, 17 March 2008 (EDT)


Yes, you could enter the source citations into you desktop genealogy program yourself. They'll be a human-readable citation with a URL in the citation text field. But with the ability to download what is essentially the same GEDCOM that you uploaded except with source citations added, you shouldn't have to.

As Jrm03063 points out, matching is problematic. Even if we're 99.5% accurate on matching re-uploaded people to previously-uploaded people, it means we'll either incorrectly-match or not match 25 people in a re-upload of a 5,000-person GEDCOM. That's too many.

There is another approach we could take. Some desktop genealogy programs store a unique identifier (UID) for every person. This identifier is included in the GEDCOM's they export, so that a person has the same UID in the GEDCOM file every time. If the GEDCOM includes UID's, then we could record the person's UID and the page version with which it is associated, so that the next time you upload a GEDCOM with the same UID's we could know what page versions to match. The problem is that only 42% of the people that have been uploaded to WeRelate to date have UID's. But when UID's exist, they could potentially be used in place of downloading a modified GEDCOM.

One advantage of downloading a modified GEDCOM is that you could share your modified GEDCOM with a cousin, and if they incorporated your GEDCOM into their genealogy and then generated a combined GEDCOM to upload into WeRelate, the system could recognize that some of the people in their GEDCOM already exist at WeRelate and they wouldn't have to go through the match+merge process for those people. The system would just apply whatever changes they had made to those people, just as if you were re-uploading your GEDCOM. If we were instead relying upon UID's, when your cousin uploaded their GEDCOM they would probably have to go through the match+merge process for the people that were also in your GEDCOM, since I don't know if we could assume that the UID's would remain the same your cousin's GEDCOM.--Dallan 12:07, 18 March 2008 (EDT)


It's great to be flexible Dallan, but I think you'll make yourself crazy trying to support weird ID/UID stuff. Unless the value is from a reasonable third party (say an ancestral file number, or whatever the successor strategy may be) I don't think an id-based alternative (to the primary werelate url) is wise for trying to figure out who matches who. The url approach that you've mentioned is the sort of thing we want anyway, since a downstream consumer of the GEDCOM may very well be interested in the contents of the associated werelate page. Making the source do double-duty as a tag at re-import time is a really fortuitous coincidence that reinforces good practice. I don't think you want to clutter the story with identifiers that just can't work as well and (often) will not survive a merge. A url to a merged page will usually redirect to somewhere useful...--Jrm03063 13:40, 18 March 2008 (EDT)


Dowloading -uploading to update a Gedcom [10 April 2008]

All genealogy software is not the same. I can only speak for "The Master Genealogist" TMG as I am a user. For me to "import" a GEDCOM into TMG that I had previously uploaded to WeRelate causes me concerns.

  1. I do not upload my whole database, because it is so large (currently at 58,571 members). I break it down, choosing an ancestral surname, beginning with the oldest member of that family and including all descendants, spouses and parents to the spouses to include in a small GEDCOM segment. I then upload that segment to WeRelate. Since I am a descendant of each of these ancestors, I am in each segmented GEDCOM, as are many of my kin. This ends up creating duplication from one segment to another, a necessary evil. I currently have seven segments on WeRelate and plan on adding many more.
  2. TMG does in fact assign a UID to each person created (that is how I always know the number of folks in my database at a glance). TMG also assigns a UID dataset number. My main dataset being 1:1 thru 1:58571. When I import a GEDCOM all the folks from an import get a UID and a Dataset UID. so if I had uploaded to WeRelate my HELLER GEDCOM of 2703 members, and then have to re-import that same bunch, they will appear in my database as members 2:1 thru 2:2703. This means if I would import even my own gedcom all those people show up in my database as duplicates with new UID's. I feel this then creates total havoc in my database! Doing a merge within TMG would take me weeks of work to straighten it all out.
  3. I do not include all tags when I create a GEDCOM for upload. TMG has ability to create unlimited number of tags, beyond birth, death, burial, baptism, etc. TMG has the ability to filter out any of these which I do not want to include when creating a GEDCOM. I have created tags called "CorrespondenceIn", "CorrespondenceOut", and "Research" None of which is included when I upload. Then if I were to re-import that GEDCOM which did not contain all the tags, I would effectively be loosing all data in those tags.

Yes I could hand enter some code into each individual, but even that is unrealistic, as just ONE of my segments contains 2703 people, I have many segments I would like to place on WeRelate as I get them ready. At Rootsweb's WorldConnect, you just click on the name of the GEDCOM you want to update, and choose the gedcom to upload and it's done. Maybe you could create a small Gedcom to upload there, so you could see how the process works and maybe implement something similar at WeRelate. I believe they do use the UID's. I need something more user friendly. If I uploaded my whole Gedcom (once your system can accept such a large file) importation and then upload for updating would still be an issue, because of my filtered out tags upon Gedcom creation. --Msscarlet1957 15:31, 18 March 2008 (EDT)


I feel your pain. There are always data conversion/transfer headaches whenever one system is first hooked up with another. I agree that tagging your home data set would be a daunting process, even on a subset basis. On the other hand, you get a couple of important things for your effort - a very important source for your own data base and a way to safely interact with the werelate community data base.
Even an operation as large as ancestry/TGN, with all their resources, often makes quite a muddle of things. I've already untangled a few of their "one world tree" matches that worked their way into werelate. In fairness as well, I don't think roots web is really a reasonable comparison. I could be wrong, but my skim of their stuff suggested that they aren't really merging the uploaded GEDCOMs into a common data base. I think they are just saving them as discrete sets and offering tools that graze over them without actually creating a permanently merged result.
Maybe dallan can put in a restricted merge/upload that only merges those people that are correctly tagged and produces a report of the names seen but ignored? That might help you in performing a piecemeal update...
The other thing that should be stressed is that we all need to fundamentally change the way we've been doing genealogy. Your large data base is a major accomplishment, but how about a data set of 100K? 1M? More? As highly refined and capable a tool as TMG is, the model of individual research is fundamentally limited. There's just only so much of you to go around! Taking the hit of getting your data synced up with werelate isn't just a chance to transfer your data from system A to system B, but a chance to get lots of help and to help others. Your work also has a real chance to live on and be built upon. You can certainly protect your data from being lost to the world of research by tossing it over the wall to the LDS or world connect, but that's not nearly as "alive" as data that's under constant review and improvement as things are on a wiki.
Please, keep the faith!--Jrm03063 21:01, 18 March 2008 (EDT)


I thank you for your encouragement, Jrm03063, obviously you can read between the lines and see that I feel quite discouraged just now. In fact last night I actually sat in front of the boob tube and watched HGTV all night and could not even think about going to my computer after dinner. (I am one to easily work until midnight on my genealogy). Yes you are correct in your assumption that Rootsweb does no merging, but my reference was to the way they streamline their member's updating process, not the way they work in general. I agree that wiki's and genealogy are a wonderful concept, that is why I had been devoting 3-4 hours a day to improving and merging my pages here on WeRelate. But then we don't know for sure WeRelate will be around for any length of time, as it is so difficult to learn and use. As long as I am here on a regular basis, it becomes easier and easier, but if I am away from it for a length of time, I have to go thru the tutorials again. but that is another topic altogether. For now I am just not willing to "Take the hit of getting my data synced with WeRelate" I am not interested in importing any Gedcom, not even my own from WeRelate, that will cause me so much work and/or data loss. In fact I rarely import Gecoms at all, I open them in a separate TMG database, shown on my second monitor and decide from there who to hand enter, using my own entry conventions.
For now I shall continuing to upload gedcom segments, without thought of updating them, this will at least be "a chance to get lots of help and to help others". However, I have been unsuccessful in convincing any cousin to join WeRelate to collaborate, (which contributes to my discouragement). But I do invite them, usually one or two a day. --Msscarlet1957 09:56, 19 March 2008 (EDT)

I hadn't considered that people would upload only a portion of their GEDCOM, but in retrospect it makes sense. I really do want to make the upload+download process easy for people who want to continue to use their desktop genealogy program because I believe that probably half of our users will want to operate that way.

While writing this response I realized there is a flaw in my proposal. After subsequent uploads of a GEDCOM file, we can't assume that the person data in the uploaded GEDCOM is the same as the person data on the updated Person page, because the Person page might have information that you have not incorporated into your GEDCOM file. So to determine the updates you have made we'll compare your current GEDCOM against your previous GEDCOM. And to determine conflicts we'll compare the current version of the page also against your previous GEDCOM. If your current GEDCOM has a different value for a field than your previous GEDCOM, then you have updated that field since your last GEDCOM upload. If the the current version of the page also has a different value for that field than your previous GEDCOM, then we'll say there is a conflict, and your updated value will be stored as an "alternate" name/event instead of updating the primary name/event, and you'll get an email telling you about it.

We still have to associate each person in your desktop genealogy program with the page title that was generated for that person. We can do this in one of three ways:

This is off-topic, but as you have suggestions for improving usability, or if you find out what keeps your cousins from joining, please send me an email or leave a message on my talk page.


I was using the incremental GEDCOM upload procedure too, but only because there were a few lines that I wanted to flesh out with census records while I still had an ancestry subscription. I've allowed that to expire for the time being, and expect in the future only to download GEDCOMs for backup and reporting purposes. I don't expect to actually record any research off-line.

In fact, if I thought that werelate was going away, without a wiki alternative, I would probably see about putting together my own server to run it.

The only real home-based local stuff that I might do, would be something to keep track of the living. Of course I understand why we need to keep the living off a public genealogy system, but it's a bit of an aggravation to have to think about things in two layers. I have thought about what might be involved in creating a hybrid environment so that I could record information about the living and have it just reside locally on my machine (some sort of pass through for the folks who've shuffled off to werelate), but I just don't have time to do anything with that right now....--Jrm03063 17:09, 1 April 2008 (EDT)

I think the ideal would be a desktop genealogy program with a "synchronize" button that synchronizes the non-living people with WeRelate (upload changes you've made and download changes from the website) but keeps living data private. Someday I plan to write this -- it will make uploading data to WeRelate a lot easier than uploading GEDCOM's -- but not this year.--Dallan 09:45, 2 April 2008 (EDT)
Oh my, weren't you there for the "thou shalt not duplicate data lecture"? Maybe you gave that lecture!  :) Anyway, I actually think that the desktop genealogy program ___is werelate___. I think it's the same code base that you have right now, but with a flag indicating whether it's running stand alone or in distributed (local & backing global server) mode. If you write a special purpose program, at best you duplicate what werelate is already doing at the expense of an additional code base. Ick. It seems like all that should be needed is a piece of code that maps inbound URLs through a table such that locally known PERSON pages are served up and saved locally (maybe you just look for "living" in the person page name). Any page not known locally gets passed on to "actual" werelate. Mind you, I don't know exactly how to do this sort of thing, but it seems like something that ought to be possible... --Jrm03063 Wed Apr 2 10:09:01 EDT 2008
That could be done, but I think that there are also cases where someone will want to download their tree into a program that they can run disconnected from the Internet on their laptop, make changes in that program, and then upload the changes (and receive any changes made by others) when they re-connect. Not everyone will want to work that way, but many will. Eventually we'll need to make that process easier than downloading and uploading GEDCOM's.
Having said that, it would also be possible to modify the Family Tree Explorer to save living pages locally on your hard disk. If we do that, then it's just a question of whether the Family Tree Explorer always reads pages for non-living people from the server, or if it also had the ability to save non-living pages locally and synchronize them with the server, which would allow people to run disconnected occasionally.--Dallan 15:21, 3 April 2008 (EDT)
I don't think I disagree particularly. What I'm suggesting is a full, yet local, version of werelate. The local version would have some additional smarts to be able to sort out what is local, what is out on the net, what is shadowed on a temporary basis, what is maintained strictly locally (living people), etc. I know it's a tall order to try to figure out how to package all the necessary pieces up so they work nicely together, but it can't be worse than trying to rewrite a local version of werelate. Presumably, what you would want to do (with a local werelate) is to work in terms of wiki pages. That's going to need some level of wiki software support and I'll bet it becomes a slippery slope really quick. I quite agree that upload/download exchange based on GEDCOM is not apt to be very satisfying. Literally and figuratively, I think a lot would get lost in translation.
It's either write a desktop application with limited wiki support (i.e., the app probably wouldn't display tables), or modify the wiki software to make it possible for people to create pages for living people that weren't readable by others (or were readable by only certain people). That's also not a trivial undertaking.--Dallan 15:32, 10 April 2008 (EDT)
My thinking is that this should be able to be pulled together on Linux/Mac first. I've seen wiki servers for windows so there is hope...
The problem is every wiki software app is different -- they have different syntax, and the extensions I've written for MediaWiki would have to be rewritten. This is also not a trivial undertaking.--Dallan 15:32, 10 April 2008 (EDT)
BTW, have you considered dropping source archives on the digital library? --Jrm03063 Mon Apr 7 15:45:38 EDT 2008
Not sure what you mean by "dropping source archives"?--Dallan 15:32, 10 April 2008 (EDT)

Where do we stand on merging? [6 May 2008]

I'm very happy with the improvements in "redirect" behavior, as they make merging a much simpler business. I wanted to give werelate a shove in the direction of the large connected community tree that it's meant to support, so I've spent the last month or so merging through early New England. It's been generally smooth going, and very satisfying too - a lot of information bubbles up when you bring different user contributions together. Still, I'm afraid I'm somewhat alone in this endeavor. I'm struck that until werelate gets a reputation for having a really large and well connected community tree - not just a bunch of GEDCOMs that live in the same pool - werelate won't take off.

So why hasn't merging become important for the masses? I think there are a few reasons:

Without a passion to merge the werelate space, I think we're losing the strongest feature of the site's design. How can we get there?--Jrm03063 12:21, 2 May 2008 (EDT)


I'm there with you, Jrm (as you may have noticed). One think that strikes me is the almost complete lack of reaction I see when I merge pages -- perhaps there's no need because the merge isn't controversial, but am I really that good? ;-)

I think you're right that usability is an issue. Merging isn't hard if you understand both the general wiki concept and how that applies to genealogy - but that's a small group of people. I think search is certainly a big problem -- the tricks I use to find duplicate people are almost all things I wouldn't want to explain to my grandmother. Luckily that's under construction. I think the current search not only makes merging difficult, but it makes using the site for regular research almost impossible, so hopefully more people will stick around in the future, which will lead them to be more interested in merging.

There's something else that might be a problem, however -- notification. I just found this entry by looking at my Watchlist, something I do every so often to appease my curiosity. I didn't get an email, nor have I gotten one about the pages I'm watching that you changed in the last few days. They're not in my junk filter, either. (But I did get notifications for two other pages, so it's not totally broken). If this is happening on a widespread scale, it means that people are not only unaware of (new! exciting!) changes to their own tree, but they may not even be aware that a merge is possible, neither of which does much for the communal editing. (And, whether it's related or not, I have one page in particular that used to have at least 10 people watching it and now has two. So either notification worked too well and they decided they didn't like getting emails, or something is weird.)--Amelia 08:56, 4 May 2008 (EDT)

It does seem like there might be a notification problem somewhere. Just to make sure, you're saying that when you view your watchlist and click on the "Show all pages changed since last visited" link, you see pages that you haven't gotten an email about - is that correct?--Dallan 15:09, 6 May 2008 (EDT)

Well, I have been busy entering pages and have not checked recently for possible merges. I have too many pages to check on a page by page basis. I am waiting for Dallan to implement his merge feature. This morning I entered the surname Coker and location United States and searched. On result page 131-140 I actually found a duplicate page. However this page has no sources. It looks like the entire tree is sourced by so and so's gedcom. Well I really don't wish to merge my page with an unsourced tree. This tree seems to be on the maternal line of my page not the Coker line and I do not intend to research the maternal line. So how are y'all handling this? If the tree was sourced, I would consider it a wonderful opportunity to combine the pages but since it is not I am less than enthusiatic.

I am not sure that I wish to have unsourced trees on WeRelate. These trees probably already exist on Ancestry and Rootsweb. --Beth 09:10, 4 May 2008 (EDT)


I don't believe many people are using this site because it does not seem ready. I have added some comments to Talk pages suggesting errors and giving sources and have gotten zero response. After two weeks, which seemed like a fair time, given vacations and all, I tried changing one as I had suggested on the Talk page and even that got no response. Assuming that the notification is working, I think a lot of the early people were just trying it out and got discouraged or were just curious, not serious. I personally have stopped entering data (see my previous comments on the difficulties of determining if there are duplications) until I think the effort will be worthwhile (either a reasonable automated merging, about which I have at best a wait and see attitude, but more likely no faith in it working, or better searching that makes identifying likely duplicates better.)

Unsourced trees do not bother me. If I want to merge with an unsourced tree, I will add my sources and the important part will no longer be unsourced. My concern is that someone will come along and either merge over, and ignorantly overlay entries refined through years of discussion and collaboration, without providing sources or paying attention to past discussions. This is why I think changes should be "proposed" and then voted on by all people watching a page. A much more conservative, but still democratic approach.

I know it is not a good selling point to suggest that making data entry hard to do is a feature, but I think it is. (Personally, I don't see tons of use in uploading gedcoms because I would have to clean mine up anyway to merge.) I would be happy to gradually enter my data and sources a little bit at a time over years, in exchange for participating in discussions with other *interested* persons. Likewise, if I can only propose a change and then must wait for it to get approved, or rejected, then I can wait. The time is not important, since the data is not going anywhere, and if I disagree, I will continue to keep my version of the truth locally, knowing I must find more evidence to convince the jury.

--Jrich 09:58, 4 May 2008 (EDT)


  1. Hey trust me, Amelia, I'm seeing your merge activity. The changes in my Watchlist are huge! :) I would like to get in there and look over the merges, but time is an issue with me right now and also I am concentrating on a different area right now, which brings me to an idea I've had for a while. To encourage and support collaboration I wonder if we couldn't set aside time to problem solve or feature a particular family or families. Sort of like how we do our "Featured Page" where nominations are added, we could have a list of families to work on together. It doesn't even have to be a family we're related to (although getting familiar with non-related family can be a challenge). It also doesn't have to be anything big .. maybe finding a death date or an obit. Just something to encourage and show how collaboration can work. I don't know that I've actually seen a collaboration take place on WeRelate, so that in itself would be a learning process on how best to share data, conflicting resources, etc. These "problem solving" pages would be featured on the Main Page (which I assume is the main portal into WeRelate) so that others can see what WE as a community are working on.
  2. I haven't looked at it recently, but does the Help page on Merging need updated to reflect the new redirect behavior?
  3. I don't have it set in my preferences to be emailed when changes occur. I just check my Watchlist everyday, several times day. I assume I'm catching everything? But I've heard others mention not getting email notifications of changes, so it may be a problem.
  4. I second Jrm's idea on seeing how big the overall tree is after a merge.
  5. Beth's comment about not wanting to merge with an unsourced tree or merging with pages she's not interested in is the biggest obstacle we have to overcome with WeRelate. I don't mean to single you out Beth, because I have to overcome similar feelings too when I work on WeRelate. My Tree is safe in my genealogy program at home. No one but me can touch it. My TREE as a separate, isolated tree should not even be a concept on WeRelate. I think this is the hardest issue to overcome on WeRelate. So what if the page is unsourced. So what if they have an alt date of some kind. Note it, comment on it, etc. (Re: Jrich's comments above on this.) Think of these as working copies, copies that will improve over time, either by you or someone else. Again, MY TREE is in my genealogy program at home. No one but me can touch it. MY TREE does not exist on WeRelate, but I have contributed over 1000 names to the WeRelate Tree.
  6. I am afraid that WeRelate has the potential of being just another dumping ground for GEDCOMs. While patrolling pages, I see the GEDCOMs that have been abandoned and there are many of them. I'm like Jrich, I don't know that uploading GEDCOMs serve that much of a purpose, because they do need to be cleaned up. I'm still cleaning up mine and only uploaded a very small fraction of my entire tree. And I give a hearty "Amen" to Jrich's last paragraph. He said what I was trying to say.

--Ronni 12:35, 4 May 2008 (EDT)



Hey Ronni; I believe I pretty well have the concept of it is not my tree anymore; but I cannot wrap my head around merging with an unsourced tree. It may not be "my tree" but my work will then be associated with it. What I enter on WeRelate will stay; whether I do or not. You misunderstood one point that I attempted to make, I think. I did not intend for one to believe that I was disinterested in the pages. I am interested. I just do not have the time to edit or a research this maternal line; a family that married into the Coker line fairly recently. So if I merge the page therefore the trees, I don't have time to source the unsourced pages so unless someone else does they will remain unsourced.

I don't entirely understand your statement about "my work". Each page has a revision log which shows which ID edited it. If you add sources, you will be listed as editing a page, but not solely responsible for its content. Presumably your description will say what you are doing. Further, no other pages will show your ID editing other pages. --Jrich 15:46, 4 May 2008 (EDT)
Maybe I don't understand how downloading gedcoms will work either. I assumed that if one merges a page that connects 2 trees then the ultimate gedcom download will include both. The tree with sources (unless the page is still under construction) and the other tree with a few sources but mostly Ancestry's World Tree. Exactly what information will the downloaded gedcom contain? Do the people involved in the creation of the pages get any credit in the downloaded gedcom? I really do not know. --Beth 18:56, 4 May 2008 (EDT)
I haven't uploaded any gedcoms so I don't really know what happens. The pages I have seen show a user ID and "gedcom upload" in the revision