The question about allowing anyone to download the file needs some serious consideration. I'm concerned about 'harvesters' who gather lots of different charts and then post them as their own work without either checking for errors or giving any credit to the author. An advantage of the tree staying on WeRelate (as opposed to being downloaded by anyone) is that when corrections are needed they can be made on WeRelate where everyone can see them. But if someone else downloads the file and passes it around, if errors are in their downloaded version, they will be perpetuating the errors - they won't know of the corrections made later on WeRelate. I envision pros and cons on this myself so I recognize the need for serious debate and/or consideration of the subject of downloading while it is still in planning stage.--Janiejac 22:47, 13 September 2007 (EDT)
WorldConnect has come up with a very good compromise protocol on this issue of downloading. It gives the author the options of allowing all to be downloaded OR only a couple of geneations, or something like that. You might check that out. Thanks for the serious consideration.--Dr. Bill 22:43, 15 September 2007 (EDT)
I hadn't considered Janiejac's point either -- I think it's a good one. Download isn't scheduled until around the end of the year, so we have time for more discussion.--Dallan 13:04, 18 September 2007 (EDT)
Could someone redirect a portion of this exchange to a new subject called 'downloading discussion'? This has sort of evolved from collaboration to downloading.
I want to keep the subject of downloading current and get others point of view on this while it is still in the planning stages. When I upload a file either to my site or to rootsweb or to WeRelate, I do upload all my notes and sources with it. I do believe in sharing and send anyone who requests it a register starting with the individual they are interested in and including notes and sources. But I don't give away my whole data base, notes, sources and all. I want interested folks to contact me with additions/corrections/suggestions and don't want to find all my data and notes posted on someone else's web site.
If I upload to WeRelate and it gets edited by myself or anybody else, I want to be able to download the whole thing back to my computer to continue to work offline. And I do like Rootsweb/WorldConnect's ability to designate just how much of one's chart can be downloaded. But the ethical question comes to mind - if others can add to or edit the chart - should that entitle them to download my whole data base? I'd appreciate input from others on this issue.--Janiejac 12:43, 29 September 2007 (EDT)
Allowing downloads of GEDCOMs is pretty essential, and an opportunity to boot. As has been observed, some folks like to be able to work on things off-line. Others perhaps want to take material to another system to generate different sorts of reports. I take the view that we need a symmetric capability - if you can upload a GEDCOM, you sure ought to be able to reverse the process. One of the reasons I've lost a lot of interest in ancestry.com isn't the expense, but the crappy GEDCOM they produce (and worse, they can't even fully re-import their own GEDCOM - how embarrassing). It seems that they've been intentionally inept in order to strand data under their proprietary control. The result...I'm looking for an alternative. Besides, if someone was really serious about massive harvesting of werelate data bases, they won't be doing it via GEDCOM, so they could probably do it right now.
A GEDCOM download is an opportunity, because a reasonable GEDCOM will be scattered with note/source links back to the werelate site. Skim an ancestry.com GEDCOM and you'll find dozens of links back to ancestry if the GEDCOM has any sources attached. One of the first things I think I would do with a werelate GEDCOM is to replace my ancestry data with a werelate GEDCOM. Then, if people are sniffing around my open tree and source information, they'll find their way to werelate.
The way that werelate gains credibility and preeminance isn't by taking a proprietry view of information, but by making it so totally accessible and free that there is no real advantage to getting it elsewhere. It's the wiki way. The information equivalent of if you love it set it free.--Jrm03063 14:38, 8 November 2007 (EST)
That's an interesting idea about providing links back to WeRelate in note fields embedded in the GEDCOM. We would have to do something like that anyway in order to satisfy the attribution requirement of our license. Please keep comments coming on this topic. We won't get to GEDCOM download until after match+merge, so we have some time to get comments from everyone.--Dallan 18:47, 8 November 2007 (EST)
I think downloading a GEDCOM is a very important feature, and should not be restricted. Even though I intend to do my work primarily in WeRelate going forward, I'd like to be able to download GEDCOMs for various reasons, including ability to put it into other software to generate various pretty-printed reports I can't do here, and as a "back up" of the work I do here. While I appreciate the various degrees of control that ancestry.com gives you when you upload a GEDCOM, there's a significant difference between WeRelate and Ancestry (or most other places like it). On Ancestry, when you upload a GEDCOM, it remains your tree. Here, when you upload a GEDCOM, it becomes your contribution to the ongoing wiki, which other people may add to, link to, correct, etc. From the moment you upload a GEDCOM here, it is no longer your tree, and it wouldn't make any sense for you to be able to dictate who could subsequently download it, especially after it has been enhanced by the work of others.
I appreciate the concern about careless people who might download your work, pass it around, and you lose the opportunity for updates. But I'm not sure we can solve the problem of careless people. :-) I for one keep track of where I got valuable information, and always like to keep in touch with those I've collaborated with on common lines. I think the suggestions that the downloaded GEDCOMs have back-links to WeRelate where appropriate are good ones.
That's my $.02. --TomChatt 01:56, 9 November 2007 (EST)
I've gone back and forth on this issue (i.e., no restrictions vs some restrictions). JRM's comment about embedded links to WeRelate is a very good idea. Tom's comment about "my tree" now being "our tree" needs to be reiterated because it is essentially what WeRelate is all about. That idea alone is one that I think still isn't completely understood when someone starts putting their data online here. I have observed that "misunderstanding" several times in the last few months. If we understand the concept of what is mine is now ours in regards to WeRelate, then restrictions on GEDCOMs would be few if any at all. --Ronni 04:38, 9 November 2007 (EST)
I agree with Ronni and TomChatt that the community aspect of the data on WeRelate demands that we have a Gedcom download. If the purpose of wikifying genealogy is to get the best information out there, we must have a way for it to get off of WeRelate into the "wild." But in order to keep supporting the mission of producing high-quality data, it is crucial that downloaded gedcoms be sourced properly. I imagine a download where the sources are all the source page "tites" on WeRelate. That would be bad. It would badly degrade the quality of source citation in any properly sourced database, and would create a tremendous amount of work to replace any links back to the WeRelate pages with the actual publication and date information that would allow me to locate the source. I don't object to links back to the source pages, which do contribute useful information, but the downloaded sources should be as complete as possible (using the fields filled out on the source page, I would imagine).
On a separate but related issue, what do we do about the licensing requirements, particularly if someone chooses not to download (or import) sources? Perhaps some explicit statements and instructions during the process about the attribution requirements if people redistribute (I know they can do this now, but it's going to be a much bigger problem once downloading is permitted).
And that reminds me of a technical issue we (uh, you, Dallan) need to be sure to solve -- imbedded links in notes that go to other places on WeRelate need to be rendered as full links that are intelligible when imported into a genealogy program. --Amelia.Gerlicher 14:11, 9 November 2007 (EST)
I'm thinking that a downloaded GEDCOM would include information from the Source/MySource pages on WeRelate as source records in the GEDCOM. I agree that we'll have to include some explicit statements on the download page about needing to attribute. We could put the attribution links to WeRelate on notes attached to each person/family, or on a source record that is cited by every person/family -- any thoughts on which is best? Your comment about turning embedded wiki links to HTML links is a good reminder -- I'll make a note of that.--Dallan 11:29, 16 November 2007 (EST)
Hi, new contributer. Beginer level genealogist. Consider this a comment from the man on the street.... Yes you should allow downloads. But people will need "help" to avoid pitfalls, whether it is an upload or a download. For example, I am one of those careless people who hasn't paid proper attention to how I entered information in our Family Tree Maker. My wife and I have bastardized our usage of the fields so that when I load it up into Werelate, data shows up where it should not. If someone were to download what I loaded they will have to sort through some strange stuff. I need to improve my discipline in managing info in the FTW. (sources, events, and notes fields) I also need to convince my wife that her approach to puting data where she wants is not going to work in the long run. (for example I can't get her to not put Rev. or DR. in the name field...)
I plan on maintaining my own database (FTW) as my primary repository on my home computer and "contribute" to Werelate by publishing what I want to share. (Probably everthing I have as I like to share) But, I will not use Werelate as my primary repository.
A page on gedcom file format and pro's and con's about how people have used genealogy programs incorrectly and the problems this causes as people get more invested into their data repositories would be good...if it doesn't already exist.
(PS I take back any negative comments about my wife, she just handed me tea and home made cookies...) PPs is there a spell checker?
Thxs --PeterP 18:48, 26 November 2007 (EST)
Hi Peter, one of our big challenges is going to be making the GEDCOM export good enough so that you can incorporate the new material that others have added to your tree into your home database on FTW, so that you don't lose what others have added. As you've seen with your GEDCOM, using the fields in FTW for purposes other than what they're for makes the GEDCOM output look funny. I'm not sure about the different oddities that typically occur, but feel free to add any of your observations to this page. And no, there's no spell checker, but Firefox has one built in.--Dallan 17:13, 4 December 2007 (EST)
My vote is a definite yes for allowing downloads of gedcoms; no restrictions. I suggest that you communicate this to new users when they register. Require new users to check a box that the user understands that gedcoms can be downloaded with no restrictions. There are plenty of sites with restrictions; not what I wish for this site.
I would also like the ability to download images or is this already possible? --Beth 10:45, 14 December 2007 (EST)
It sounds like the general consensus is that we should allow GEDCOM downloads. There's already a statement on the GEDCOM import page and on every edit page that "All contributions to WeRelate are released under the GNU Free Documentation License 1.2 (GFDL)." and that "Others can add to, edit, and redistribute your contributions." I just bolded the first part on the GEDCOM import page to highlight it. We could require people to check a box, but unless it becomes a problem it's not as high of a priority as other things.
You can currently download images (one at a time -- right-click on the image to save it to your local disk). Some images are uploaded under fair-use though, so you may not be able to do certain things with those images (possibly not upload them to a commercial site).--Dallan 00:07, 16 December 2007 (EST)
I think that is fantastic news Dallan. Glad to know that I can also download images. I hope every user understands the concept of WeRelate including the GNU Free Documentation License. Call me a pessimist but I envision some users getting upset about this or that and deciding to remove "their" tree as has happened on Ancestry and Rootsweb and probably other sites as well. I removed my tree from a site, but that was because I used the merge feature of their software and the file was so messed up that I gave up and removed it. Anyway just thinking that a statement in "plain English" may save some future woes. Thank you and all of your volunteers for your hard work and dedication to WeRelate.
--Beth 18:22, 17 December 2007 (EST)
I switched the bolding in the gedcom upload text to emphasize the phrase that describes what others can do with your contributions (add to, edit, redistribute) and added "download" as another specific possibility. Hopefully this will make things clearer.--Dallan 17:03, 18 December 2007 (EST)
Thanks Dallan,
I noticed an option to delete one's family tree in the FTE; can the user delete their family tree? --Beth 07:42, 19 December 2007 (EST)
Yes, you can delete the pages in your tree so long as nobody else is "watching" them. If another user is watching one of your pages (which happens if they add the page to their own tree, or if they edit the page and leave the "watch this page" box checked, or if they click on the "Watch" link at the top of the page), then that page does not get deleted.
A problem caused by this approach is what happens when you are watching one member of a family that someone else has uploaded, but have forgotten to watch the other family members, and the original uploader removes the tree. The page that you watched is still there, but the other family members have been deleted. I can restore them if this happens, but one of the things on the todo list for next quarter is a screen that will tell you where your "off-tree" links are -- pages in your tree that link to pages not in your tree -- and give you a chance to add those pages to your tree.--Dallan 12:08, 21 December 2007 (EST)
I am happy to share. However I am concerned about harvesters who then may put the money on for profit sites. It might be a bit friendlier to have the person just contact the submitter. That way they can make contact, chat and then share information as they wish.--Sheri 20:06, 5 June 2008 (EDT)
Anybody else undertaken merging of trees? I've recently been doing manual work on the Slafter family of colonial New England, creating pages by hand. But I discovered that just a couple days ago, somebody did a GEDCOM upload that duplicates a portion of the tree I'd been working on. I looked at the Help topic for merging pages, and have been following that procedure. But I'm realizing that it's quite complicated if you're talking about whole dup branches of a tree, and not just a duplicate page. When you start redirecting pages that are in a tree, then you start getting disconnected bits of tree and "orphan" pages. I ended up making a manual inventory of all the pages with the same names, and which ones needed to get merged/redirected with which. Not for the faint of heart (nor for the non-methodical)!
One thing I'm wondering about -- if I merge two pages and redirect one to the other, do the people who are "watching" the redirected page get somehow transferred to the watchlist of the other merged page? Seems like that's what you'd want. (And if the redirected pages are in other people's "trees", does that get patched up as it should?) I'm hoping I haven't done anything to break other folks' trees.
Also, most of the GEDCOM uploads I've noticed are associated with User pages that don't exist. Does that mean they've un-registered from WeRelate? Or does that just mean they never bothered to create a "user home page"? Is it okay to create their User/talk page in order to leave a message there? Will they get notified? ---TomChatt 05:02, 29 September 2007 (EDT)
A follow-up to Amelia's comment, and perhaps by way of clarification, I think that bad/unreliable/flawed sources absolutely have a place on our pages - and an important one at that. They should be cited and noted as bad/unreliable/flawed and, ideally, the research that established them as such should be noted. Otherwise, folks just keep rediscovering previously discredited information. This is particularly true of folks like me, who don't have a huge background so we don't instantly know that some sources are not trustworthy. For example, I understand that the Mayflower passenger Peter Brown is the origin of a large number of discredited genealogical lines. At some point or other he was attributed as having had a son. This was later discredited, but chaos has remained in this area for something beyond 100 years. This site in particular, provides a good chance to document both the accepted and discredited research.--Jrm03063 14:48, 21 February 2008 (EST)
While we are on the topic of merging trees, a new user asked me if we should ask permission from the other user we want to merge with before actually merging into their tree. What's everyone's opinion on this? All manners and politeness aside, you could wait weeks or months for a response or never get "permission" to merge with another tree. This particular topic goes along with the "Downloading GEDCOM" topic as well, in that I see it as understanding what happens to your data once it's put online at WeRelate. I realize this has the potential to be a touchy issue with some, so I'm curious as to everyone's thinking or understanding on this. --Ronni 10:50, 17 November 2007 (EST)
I vote "no" as well, but I do as a practice look to see if the User I'm about to do a "major" merge with is active on WR by looking at their contributions. --Ronni 03:49, 21 November 2007 (EST)
I too vote no. The whole point here is collaboration and there can't be collaboration as long as there are duplicate trees. --Trevorallred 14:40, 23 December 2007 (EST)
Here's another question along these same lines: merging two overlapping trees might involve merging hundreds (or even thousands) of individual person and family pages. With lots of pages to merge, most people aren't going to take the time to analyze each pair of pages to merge very carefully. So we need to have a pretty reasonable "default" merge strategy. What should that strategy be? For the text we can put the text from one page after the text from the other page. For the events we can list differing birth/marriage/death events from one page as "alternate birth/marriage/death" events on the merged page. Similarly for differing names -- one name can be listed as an "alternate" name. But which events/names should be the "main" events/names and which should be the "alternate" ones? I can think of two possible approaches; maybe there are more?
Any thoughts?--Dallan 22:29, 17 November 2007 (EST)
I think that any technique for merging John Doe (i) and John Doe (j), must preserve all the information from both "i" and "j", and must clearly indicate that it is an automatic merge (and the provenance of the contributions). What parts are more "believable" or better sourced is going to be easy for a human to understand but pretty tough to make a program understand. Better to be sure you lose nothing and hope a human will clean things up.
Besides merging existing people, I was also wondering whether there is a way to perform a less than complete GEDCOM upload (thereby avoiding the need to merge common individuals). Can we imagine a reasonable UI that would break GEDCOM import into a two-step process? Instead of a one shot load the whole batch, a two step process that would build up a list of names (from the GEDCOM) that already appear to be present on werelate? The user would then be free to pick whether those names are uploaded as new individuals or whether the existing werelate individual is substituted for a particular person.--Jrm03063 18:44, 18 November 2007 (EST)
I think it's also important to remember that the problem of merging two overlapping trees actually is two problems: matching, then merging. It's much easier to first match the names, then queue them up for merging. At that point we "could" have the computer do the actual merging using some really good hueristics that Dallan has already eluded to. I think it's impossible for the computer to do the matching automatically. There are simply too many variables at this point to trust a machine to match.--Trevorallred 14:53, 23 December 2007 (EST)
I think that matching on a tree-by-tree basis probably isn't a great idea. It's not "work efficient" (for you CS folks out there). Matching shared genealogy is the heart and soul of what werelate is about, and I'm struck that it should be sort of fundamental to the way werelate works. What if we think about matching as more of a re-indexing process, where a set of match candidates is associated with any person or family. Whenever a person or family changes, it gets marked as needing to be recomputed for purposes of matching. When the match index for a person is recomputed, the previous match set becomes a starting point of families and people to rule in and out first. People who drop out of the match set or get added to the match are themselves considered changed so they are marked for being matched again. Of course this means that matching is a continuous and ongoing process, and that changes will have the effect of generating work for the matching engine (or robot, or whatever you call it) - but so what? That's what werelate is here to accomplish. It also means that matching work needs to be queued in a way that prevents any single cluster of related names from hanging up the match robot so that other areas of the werelate data base go begging. Maybe some sort of oldest unindexed page first process...--Jrm03063 17:28, 4 January 2008 (EST)
As individual people and families change, we'll certainly match just that one person/family, and that will be a continuous process. When I talk about tree matching, I'm thinking about GEDCOM uploads. If you upload a GEDCOM containing say 2,000 people, and 200 of them match someone else's tree, I don't think you want to be presented with 200 different match questions one at a time. It seems that it would be a better experience to present it as a two-step decision: e.g., "200 people appear to match tree A, and 50 people appear to match tree B". In the first step you click on the link that takes you to the list of matching people in tree A, and in the second step you decide which of the 200 match candidates from tree A you want to merge with. As you're checking boxes to determine which pairs of people to merge, the system let's you know which of the remaining matching pairs are related to people you've already decided to merge and which are not. So for GEDCOM uploads, it seems that making the matching decisions up-front will be a better experience than making them one at a time.--Dallan 11:34, 7 January 2008 (EST)
As long as we're talking about GEDCOM upload in particular, I completely agree, a two step process on a per tree basis is essential. I can imagine, for example, if everyone who has a Stephen Hopkins mayflower line were to upload it, after someone had gone to the trouble of creating a really comprehensive and nicely done Stephen Hopkins page - ick!
My remarks were in the context of looking for matches within existing data as it is changed and updated in ordinary use.--Jrm03063 12:28, 7 January 2008 (EST)
That's a good point. We ought to have a "disregard my information" merge option for both individual-matches and also for tree-matches.--Dallan 18:20, 8 January 2008 (EST)
Regarding merging, at least in respect to an imported GEDCOM, I think the submittal process ought to require the person to identify one person in the GEDCOM as an existing person in werelate. Then follow the relationships stored in the GEDCOM to match up the others. You can't rely on name matches or birthdate matches. Only persons connected to the anchor person get merged. Dangling trees are ignored.
What if someone's GEDCOM has no matches? They manually enter the anchor person, then import.
Conflicting data could be scored, though how you score between two different sources is beyond me. A source that is an ancestral file or ancestry.com is hardly a source since half of those are people's opinions, not reflections of real sources as indicated by the number of people out there propagating known errors. Precision might be a valid criteria, for example 24 Mar 1789 might be allowed to replace 1789 though it is not as clear if they are inconsistent as in 24 Mar 1789 and 1792.
So I believe extra weight goes to the first comers. If you want to change the data that is there, you have to do it manually. Not that first comers necessarily represent better data, but it ensures thoughtful overwriting. To err is human, but to really screw up requires a computer.
It means more manual editing, but I think that is necessary to avoid the damage caused by somebody importing a GEDCOM they downloaded from who knows what website or similar scenarios. The goal is accuracy first and as all experienced genealogists know, accuracy is not easy nor straightforward.
--Jrich 13:33, 4 April 2008 (EDT)
Before we ask people to identify the anchor person manually, I'd like to try an idea I have on finding anchor points automatically. I'll hopefully have something ready to try by June/July.--Dallan 11:48, 7 April 2008 (EDT)
I don't think I envy you your task.
Within the scope of a single person, I have much difficulty thinking that a computer can do a reliable match. However you choose to weight the different facts when looking for a match, it will be wrong in some cases. And that's assuming the GEDCOM has enough facts in it to start with. How many times have I seen where many individuals are represented by no more than a name? Or the same person called Mary here, Polly there. Or a town with four grandchildren all named after the same honored grandparent and all born in a very short timespan.
So there is not much hope unless you take into account earlier and later generations. Then it becomes far more likely to be accurate. However, remembering that many disagreements are exactly over who the parents were, or how many children, etc, will this work? What if the GEDCOM being merged has a string of 10 generations but right in the middle of it it has different parents from the person being merged with. So now does the computer create alternate parents and now reanchor the remaining subtree to the newly proposed parents? If the new parents are brand new to the database, then does that imply all their ancestors are new too? Maybe this GEDCOM is proposing a heretofore undocumented parent between matching grandparents and grandchildren. This argues for the previously discredited (by me) person-by-person matching.
How do you decide what scope to use?
I'm almost inclined to suggest that you "punt".
The secret weapon of your website is time. Over time the data will become better and better quality. The potential damage of computer-generated mistakes will get worse and worse. Speeding up data entry is not necessarily the top priority. (Enabling collaboration to arrive at a higher quality of data than is achievable by oneself is, IMHO.) The facts we are entering are no longer changing so there is really no rush to enter them. Over time there will be less data entry and more comparison anyway as the database gets better populated. Computer-aided data entry probably means the user has not taken the time to see if their input is needed, nor have they discovered that, "Heck, look at this! Somebody has some information I wasn't aware of! Who would have thought that was possible?"
However what is needed badly is an easy way to see that I am not entering a duplicate or to find the person I am interested in collaborating on. This is an entirely different search than a Google-like search for any use of the parts of their name anywhere on any page. It is a more structured search. Many sites can take characteristics I enter, such as name, range of birth dates, location, and return results where the matching people come first, then those with slight name variations, then increasingly remote birth dates, etc. The more John Smiths that get entered, the more important this will become.
Addendum: example: I created Mary Wheeler-134 the other day. If I search for the given name Mary and surname Wheeler in Namespace People and Families, I get 1012 pages. If I add Person: Mary Wheeler to the keywords I get 662. If I put "Person: Mary Wheeler",) i.e., with quotes, I get 113, which is probably how many Mary Wheelers there currently are remaining in the system. On the list of the 113, the displayed blurb shows no useful information except for her name in about half the cases. Quite hard to tell if any of them are the one I want. 20 years from now, how many Mary Wheelers will there be?
--Jrich 14:36, 7 April 2008 (EDT)
Perhaps a separate topic, but using the "Browse Pages" function and comparing the other 133 or so Mary Wheelers isn't going to work. It's been discussed in the past about using date ranges in the title to help make distinctions between Persons of the same name and since I use the "Browse Pages" feature quite a bit, I'm inclined to agree more and more that we need to come up with a better way of quickly identifying the Mary Wheelers we are really interested in. --Ronni 19:49, 7 April 2008 (EDT)
This probably does go somewhere else, but it builds on the previous comments. The Browse Pages does help. It is non-intuitive that browsing is more focused than searching, but I guess that part of the learning curve.
The titles are still a problem. It doesn't seem like it would be hard on the browsing page, assuming it is a most common problem there, to insert something to take the returned title, recognize certain namespaces in the title (i.e., Person:), dig into the page and build a more descriptive replacement. Although maybe digging into the page would be too costly?
It would be nice if the internal link button on the edit page caused a popup version of the Browse Pages page so you could search for and select your link instead of typing it. Am I just in need of more learning here?
--Jrich 12:44, 10 April 2008 (EDT)
The new search functionality will have a "match" function that will return results in relevancy-ranked order, and the search result list will include data elements like birth date&place and death date&place. I'm working on this now.
Matching is do-able, it just takes time to develop. Several years ago I worked on a matching algorithm that found 95% of the possible matches and picked the correct match 95% of the time. There will always be cases where the computer guesses wrong. Making the final match decision does not play to a computer's strengths. What a computer is good at is bringing the probable matches to a human's attention, which can significantly reduce the amount of time you have to spend searching for them yourself (unless you want to, which the new search functionality will allow).
Once we get the new match functionality working, I'll list probable matches when people try to to add new Person or Family pages so that they can choose to link to an existing page rather than create a new one.--Dallan 15:32, 10 April 2008 (EDT)
I am all for merge but I can't help but wonder how all this will work in the best interest of WeRelate and keep users happy. I feel the idea of merge is to MERGE.. and not to be picky about who gets merged or not into "my tree". If there are two individuals that ARE the same person they would be merged, whether they are in fact relation or not.. such as the parents of an in-law or the parents of that parent's of the in-law. Specifically because "we Relate". I don't have a problem with this however I can see where other's might be offended. It is difficult to explain my point...
On Wiki Pedia nobody "owns" any wiki page there.. and all members can contribute and edit and those pages are permanent. Here on WeRelate folks are worried about THEIR databases... getting cluttered with unwanted people via a merge. So what happens after a merge and someone deletes their gedcom off of WeRelate??? what happens to pages for those folks that were in that Gedcom that is now gone? are they thereafter floating out there as orphans?
Example: Sally Snodgrass uploads "Gedcom A"; Bill Smith uploads "Gedcom B" and sees many if not most of his people match with "Gedcom A" and he spends hours merging and making it all look pretty.. then Sally Snodgrass sees that Bill merged the parents of an in-law with one of her Aunt's husbands and doesn't want so many people in her tree.. and in fact decides to just remove her tree altogether because she is miffed about Bill's work.
You can please some of the people some of the time but not all the people all the time. People are protective about their work. I don't think this would be an issue if folks did not want to download their gedcom with edits back onto their home machines, but I think there has been discussion about members hoping to be able to do that.
Why would WikiPedia have three pages that tell about Napoleon Bonepart?? Why would we choose not to automatically merge John Smith born 1903 died 1940 in the same place and who has a high "score" thus being a match? Just because? so if Sally Snodgrass does have John Smith and chooses NOT to merge him with an exact match.. but along comes Bill Smith who sees the obvious and goes ahead and matches these two up.. and therefor links up all the ancestors to John Smith, people who are NOT of any interest to Sally Snodgrass what will happen? and even if we call in counselors to have Bill and Sally be nice, what about Mr. Newbie that comes along and sees the same match and starts merging on his own as well? I know some of this is available to happen now, as we can merge, but as it stands it is so daunting to merge that I am guessing few would bother. However once it is automated there could be conflicts.
I myself am excited about the prospect of automated merging.. I feel this will help tremendously because my database has 56,000+ people in it. I have to break it down into SO many small gedcoms. I go to one of my immigrant ancestors and begin with him and include all his descendants, and repeat that process over and over, and thus I have all kinds of duplicates on WeRelate as a result, especially since I myself am in each of the gedcoms.. and so are my parents... and so are most of my grandparents and all their siblings, etc! Once automated I can just merge all the "mini gedcoms" into one big family. But will this BIG family then be too large to work at WeRelate? will a huge merged file cause the FTE to slow to a crawl? --Msscarlet1957 23:12, 24 January 2008 (EST)
I get the impression that merging is going to come in two flavors. One is to simply avoid or suppress upload of portions of a GEDCOM that are already present on werelate. I believe it's been described as a two-step process, where the entire GEDCOM is uploaded to some temporary space, compared against the overall content of werelate, and then somehow the results will be presented to the user allowing him to pick and choose what is actually conveyed into the general werelate space as new person/family pages. The second form is after-the-fact of upload - recognizing the different copies of Napoleon Boneparte. I believe there's a vision for a tool that will allow a user to say that "Person:Napoleon Boneparte (27)" and "Person:Napoleon Boneparte (28)" should be automatically consolidated to "Person:Napolean Boneparte (27)", and "Person:Napoleon Boneparte (28)" becomes a redirection, but you can do that sort of thing right now manually.
As for tree implications, I think that anyone with either Boney (27) or Boney (28) will still have those references, and they would just jump to Boney (27) when they go to view that part of their tree.
I don't think deleting a tree has any real global significance. I think it just amounts to a page of references - the pages for person, family, source - don't know the difference.
I've been doing lots of merges manually. I'm struck that this is really the point of it all. If someone wants to maintain their work in isolation, werelate just isn't a tool that they're going to like.--Jrm03063 07:37, 25 January 2008 (EST)
These are all good points. This is why merging is actually much more difficult to get right than matching. Here are some thoughts.
Merging is going to come in two flavors: a tree-based merge when you first upload your tree (we'll also have to do something like this for existing trees), and after-the-fact mergers for people that have been entered or edited on-line.
I agree that people getting offended because someone merged or edited their tree is going to happen. I also agree that WeRelate isn't the place for people who want to work in isolation. There are other websites for that. Hopefully it won't happen too often, but if someone does delete their tree, the parts of their tree that have been merged into someone else's tree (and so someone else is also watching those pages) won't be deleted.
I'm thinking that we'll also need an "unmerge" function. It would be pretty frustrating if merge were a one-way street.
The reason that we have the concept of a "tree" is so that you can limit the number of people that you care about. People in your tree can link to people that are in someone else's tree but not in yours. So if someone else merges their tree into yours, chances are that some of the newly-merged people link to people in their tree that are not in yours. You should be able to add those people to your tree, but it's your choice who from their tree you want to add into your tree. It's ok to leave them outside of your tree and just have people in your tree link to them.
I'm reluctant to automatically merge anyone, especially at the beginning. I'll keep a log of who people chose to merge and who people chose not to merge, and the score associated with each pair. If after awhile we see that if the score is above X people choose to merge 99% of the time, then we could consider doing an auto-merge in those cases. But I know of other genealogy databases that did auto-merging and people weren't too happy about it.
I'm slowly making the FTE better able to handle large trees. It's much better than it was a few months ago, but I don't think it's ready for a 56,000 person tree yet :-). But it should be by the end of the Summer; certainly by the end of the year.
As I get closer to implementing merge, I'll post more ideas here and ask for feedback.--Dallan 15:25, 28 January 2008 (EST)
I've been assuming that trees are just a table of references, and that deleting a tree has no particular implications for the person, family, source, image, or other pages referenced. Is that incorrect? Is there some sort of implied delete of a person page performed if a particular page is referenced by no other tree??? What if the page is referenced by another person or family page?--Jrm03063 15:37, 28 January 2008 (EST)
I am probably going to be one of the users that causes trouble regarding automatic merges. While I like the concept of the idea in general and do wish to link to other families; some of the familes on WeRelate are not ones that I consider properly researched and sourced. WFT #233 or research of John Doe tells me nothing. There are at least six trees on RootsWeb that cite one source; which I have proven to be incorrect on the family that I am now entering. These people's trees are incorrect because they did not bother to do actual research. A simple check of census data, in this case, would have eliminated the problem. If I have no choice regarding the merging of my file with another one; I am not sure that I would continue to use WeRelate. Perhaps you can explain to me how you envision this concept of automatic merging to work? --Beth 18:50, 28 January 2008 (EST)
--Amelia.Gerlicher 20:05, 28 January 2008 (EST)
Dallan, I just wanted to suggest that once you get the Merge thingy up and running.. we will definitely need a "how-to merge video" :-) --Msscarlet1957 08:38, 7 February 2008 (EST)
I've spent the last few days doing a lot of merging, and the process isn't really that awful, especially once you develop a few practices that keep you from losing track of where you are. But it also seemed to me that it could be made a lot easier without anything in the way of UI changes. All that is required is being a bit smarter about what happens when a redirection page is checked in.
Consider a situation of two family pages that nominally represent the same family. At present, before redirecting A to B, I copy record guts from B to A, attach all the children on B to A, then merge the parents. This leaves me with a family that often contains duplicate children, but I havn't lost anything and it's easier to just work on resolving such duplicates on the target family page.
Could the check-in of a "#redirect" on a family page be jiggered such that:
* Any children (not already present on the destination) are reparented to the destination
* Any parent (not already present on the destination) is added as a parent or alternate parent
That leaves you with a consolidated family page that presents the needed merge in a more obvious way. It also prevents the situation of inadvertantly cutting off a line in the merge of a family. Finally, it is upwardly compatible with current practice (it would be very strange to use a family redirect as the way to cut away a particular incorrect line).
There's probably a corollary procedure for merging people - taking the union of the parent and spouse relationships and making sure the unique set is preserved in the redirection target.
Of course this leaves you to merge the page contents proper, but that's the easy bit anyway - just open both pages and keep them both alive until the redirection target has all the content of the source. If you stop in the middle of that, nothing is lost.
Thoughts?--Jrm03063 17:28, 10 March 2008 (EDT)
That seems like a really good idea! I don't see any downsides to it, and it would be pretty easy to implement. So when you redirect a family, the husband, wife, and children would be added automatically to the target family, and when you redirect a person, the parent and spouse families would be added automatically to the target person. What do others think about this? If there are no concerns, I can implement it the end of this week or early next week.--Dallan 11:07, 12 March 2008 (EDT)
I like it! I had actually started doing it this past week on a couple of merges. No one got lost, everything was tidy and easy to keep track of and it created a "bookmark" of sorts that I could come back to to finish up. I've thought about it for a couple of days now and haven't come up with a downside yet. So, if I understand this correctly, the steps involved for doing a merge this way would be:
--Ronni 12:06, 12 March 2008 (EDT)
I'm glad to hear that this might be easy to do. It's essentially the mechanical practice that I follow in working through a merge so that I don't lose a connection. If the check-in of a "#redirect" had this additional behavior, I could move a lot faster and more safely.--Jrm03063 12:12, 12 March 2008 (EDT)
Ok, I'll add it in the next couple of days and leave a message here when it's ready. Thanks for the suggestion!--Dallan 01:41, 17 March 2008 (EDT)
It's a week later (I've spent too much time fixing bugs in the digital library), but the #redirect suggestion is working now. If you edit a page and make it a #redirect to another page, the people/families and images that the page links to will be added automatically to the redirect target. I tested it and everything went well, but if you run into problems please let me know. It's a great suggestion!--Dallan 21:56, 24 March 2008 (EDT)
This works wonderfully Dallan! And I agree... great suggestion! --Ronni 23:31, 24 March 2008 (EDT)
I'm glad this is working out. It's working great for me too. It's only possible though because whoever thought about the data structure up front was careful enough to allow for this. Whoever had the wisdom to allow for alternate parent connections and alternate husband/wife connections deserves thanks.
Now everyone, go forth and MERGE!--Jrm03063 13:02, 25 March 2008 (EDT)
Here are questions I have after working 4+ hours doing a HUGE merge (which is still in progress). There has been discussion about the ability to "re-upload" an existing GEDCOM to update it with new information obtained. And the way to do that would be to match the Reference Number created by the GEDCOM itself from past and present uploads.
Scenario:
What now happens with PersonA?
Maybe the "redirect" information should have a different place to be placed? But that would not totally solve the problem. I feel there is still potential of loosing any new events John Doe may have added to PersonA at home in his database before an upload to update his GEDCOM --Msscarlet1957 11:45, 14 March 2008 (EDT)
In order to support GEDCOM re-upload, a new source citation will have to be added to each person & family in their tree. This source citation will contain a "permanent link" (URL) to the specific version of the page that they last uploaded. We'll add these sources directly to the uploaded GEDCOM and make this modified GEDCOM available as a download. People will have to import this modified GEDCOM back into their desktop genealogy software in order to be able to re-upload their GEDCOM later. Since we'll add the sources directly into the uploaded GEDCOM file, there shouldn't be the information loss that you usually get when going from one GEDCOM format to another.
So when the person re-uploads the GEDCOM, we'll know which pages they're updating, what each page looked like when they last uploaded their GEDCOM, and what each page looks like now by following any redirects to get the current version. Using this information we can determine
If the changes made by the uploader are to different fields than the changes made by others, we apply the changes made by the uploader to the current version of the page. Changes made by others don't get erased; changes made by the uploader show up as changes to just those specific fields. The uploader must now download the new GEDCOM with source citations containing permanent links to the now-current versions of each page.
Suppose the uploader and others modify the same field. There are two ways we could go with this; I'm thinking about going with the second:
Another issue is what happens if the new GEDCOM doesn't contain all of the person & family pages that are in the tree. Rather than trying to delete the missing pages, I'm thinking we should send the uploader an email with links to all of the pages in their tree that weren't in the newly-uploaded GEDCOM, and let them decide if they should be removed or not.
I think this covers all the bases. Does this answer all of your questions?--Dallan 01:41, 17 March 2008 (EDT)
I think I understand where Dallan is going with this. Matching an arbitrary GEDCOM against the huge universe of werelate is really impractical. Trying to make a program smart enough to know what is a good enough match and what isn't is essentially unsolvable. What can be done though, is to attach a source reference that tells werelate specifically that a person somewhere in a gedcom absolutely is a certain person in the werelate universe. Generally speaking, the easy way to get your home system in sync w/werelate would be to obtain a fresh werelate GEDCOM download, which will have the appropriate tags in place for all the people - but you wouldn't have to. I presume that the "werelate" designator source/tag will have a format that allows you to directly enter it into your home genealogy program where appropriate.
There is a place for the sort of guessing/probable matching in werelate - it's when we have a feature that allows the system to browse for potential matches in the werelate universe. That does not result in automatic merging though, but instead, in a set of candidate matches that the next researching coming alone can review. If the human is persuaded by the match, then the human can perform a merge or request a default merge procedure. But combining detection with actual merging logic seems to me extremely perilous (take a look at ancestry.com's "one world tree").
I appreciate that it's not totally "hands off", but it's going to yield a far better data base.--Jrm03063 13:19, 17 March 2008 (EDT)
Yes, you could enter the source citations into you desktop genealogy program yourself. They'll be a human-readable citation with a URL in the citation text field. But with the ability to download what is essentially the same GEDCOM that you uploaded except with source citations added, you shouldn't have to.
As Jrm03063 points out, matching is problematic. Even if we're 99.5% accurate on matching re-uploaded people to previously-uploaded people, it means we'll either incorrectly-match or not match 25 people in a re-upload of a 5,000-person GEDCOM. That's too many.
There is another approach we could take. Some desktop genealogy programs store a unique identifier (UID) for every person. This identifier is included in the GEDCOM's they export, so that a person has the same UID in the GEDCOM file every time. If the GEDCOM includes UID's, then we could record the person's UID and the page version with which it is associated, so that the next time you upload a GEDCOM with the same UID's we could know what page versions to match. The problem is that only 42% of the people that have been uploaded to WeRelate to date have UID's. But when UID's exist, they could potentially be used in place of downloading a modified GEDCOM.
One advantage of downloading a modified GEDCOM is that you could share your modified GEDCOM with a cousin, and if they incorporated your GEDCOM into their genealogy and then generated a combined GEDCOM to upload into WeRelate, the system could recognize that some of the people in their GEDCOM already exist at WeRelate and they wouldn't have to go through the match+merge process for those people. The system would just apply whatever changes they had made to those people, just as if you were re-uploading your GEDCOM. If we were instead relying upon UID's, when your cousin uploaded their GEDCOM they would probably have to go through the match+merge process for the people that were also in your GEDCOM, since I don't know if we could assume that the UID's would remain the same your cousin's GEDCOM.--Dallan 12:07, 18 March 2008 (EDT)
It's great to be flexible Dallan, but I think you'll make yourself crazy trying to support weird ID/UID stuff. Unless the value is from a reasonable third party (say an ancestral file number, or whatever the successor strategy may be) I don't think an id-based alternative (to the primary werelate url) is wise for trying to figure out who matches who. The url approach that you've mentioned is the sort of thing we want anyway, since a downstream consumer of the GEDCOM may very well be interested in the contents of the associated werelate page. Making the source do double-duty as a tag at re-import time is a really fortuitous coincidence that reinforces good practice. I don't think you want to clutter the story with identifiers that just can't work as well and (often) will not survive a merge. A url to a merged page will usually redirect to somewhere useful...--Jrm03063 13:40, 18 March 2008 (EDT)
All genealogy software is not the same. I can only speak for "The Master Genealogist" TMG as I am a user. For me to "import" a GEDCOM into TMG that I had previously uploaded to WeRelate causes me concerns.
Yes I could hand enter some code into each individual, but even that is unrealistic, as just ONE of my segments contains 2703 people, I have many segments I would like to place on WeRelate as I get them ready. At Rootsweb's WorldConnect, you just click on the name of the GEDCOM you want to update, and choose the gedcom to upload and it's done. Maybe you could create a small Gedcom to upload there, so you could see how the process works and maybe implement something similar at WeRelate. I believe they do use the UID's. I need something more user friendly. If I uploaded my whole Gedcom (once your system can accept such a large file) importation and then upload for updating would still be an issue, because of my filtered out tags upon Gedcom creation. --Msscarlet1957 15:31, 18 March 2008 (EDT)
I hadn't considered that people would upload only a portion of their GEDCOM, but in retrospect it makes sense. I really do want to make the upload+download process easy for people who want to continue to use their desktop genealogy program because I believe that probably half of our users will want to operate that way.
While writing this response I realized there is a flaw in my proposal. After subsequent uploads of a GEDCOM file, we can't assume that the person data in the uploaded GEDCOM is the same as the person data on the updated Person page, because the Person page might have information that you have not incorporated into your GEDCOM file. So to determine the updates you have made we'll compare your current GEDCOM against your previous GEDCOM. And to determine conflicts we'll compare the current version of the page also against your previous GEDCOM. If your current GEDCOM has a different value for a field than your previous GEDCOM, then you have updated that field since your last GEDCOM upload. If the the current version of the page also has a different value for that field than your previous GEDCOM, then we'll say there is a conflict, and your updated value will be stored as an "alternate" name/event instead of updating the primary name/event, and you'll get an email telling you about it.
We still have to associate each person in your desktop genealogy program with the page title that was generated for that person. We can do this in one of three ways:
This is off-topic, but as you have suggestions for improving usability, or if you find out what keeps your cousins from joining, please send me an email or leave a message on my talk page.
I was using the incremental GEDCOM upload procedure too, but only because there were a few lines that I wanted to flesh out with census records while I still had an ancestry subscription. I've allowed that to expire for the time being, and expect in the future only to download GEDCOMs for backup and reporting purposes. I don't expect to actually record any research off-line.
In fact, if I thought that werelate was going away, without a wiki alternative, I would probably see about putting together my own server to run it.
The only real home-based local stuff that I might do, would be something to keep track of the living. Of course I understand why we need to keep the living off a public genealogy system, but it's a bit of an aggravation to have to think about things in two layers. I have thought about what might be involved in creating a hybrid environment so that I could record information about the living and have it just reside locally on my machine (some sort of pass through for the folks who've shuffled off to werelate), but I just don't have time to do anything with that right now....--Jrm03063 17:09, 1 April 2008 (EDT)
I'm very happy with the improvements in "redirect" behavior, as they make merging a much simpler business. I wanted to give werelate a shove in the direction of the large connected community tree that it's meant to support, so I've spent the last month or so merging through early New England. It's been generally smooth going, and very satisfying too - a lot of information bubbles up when you bring different user contributions together. Still, I'm afraid I'm somewhat alone in this endeavor. I'm struck that until werelate gets a reputation for having a really large and well connected community tree - not just a bunch of GEDCOMs that live in the same pool - werelate won't take off.
So why hasn't merging become important for the masses? I think there are a few reasons:
Without a passion to merge the werelate space, I think we're losing the strongest feature of the site's design. How can we get there?--Jrm03063 12:21, 2 May 2008 (EDT)
I'm there with you, Jrm (as you may have noticed). One think that strikes me is the almost complete lack of reaction I see when I merge pages -- perhaps there's no need because the merge isn't controversial, but am I really that good? ;-)
I think you're right that usability is an issue. Merging isn't hard if you understand both the general wiki concept and how that applies to genealogy - but that's a small group of people. I think search is certainly a big problem -- the tricks I use to find duplicate people are almost all things I wouldn't want to explain to my grandmother. Luckily that's under construction. I think the current search not only makes merging difficult, but it makes using the site for regular research almost impossible, so hopefully more people will stick around in the future, which will lead them to be more interested in merging.
There's something else that might be a problem, however -- notification. I just found this entry by looking at my Watchlist, something I do every so often to appease my curiosity. I didn't get an email, nor have I gotten one about the pages I'm watching that you changed in the last few days. They're not in my junk filter, either. (But I did get notifications for two other pages, so it's not totally broken). If this is happening on a widespread scale, it means that people are not only unaware of (new! exciting!) changes to their own tree, but they may not even be aware that a merge is possible, neither of which does much for the communal editing. (And, whether it's related or not, I have one page in particular that used to have at least 10 people watching it and now has two. So either notification worked too well and they decided they didn't like getting emails, or something is weird.)--Amelia 08:56, 4 May 2008 (EDT)
Well, I have been busy entering pages and have not checked recently for possible merges. I have too many pages to check on a page by page basis. I am waiting for Dallan to implement his merge feature. This morning I entered the surname Coker and location United States and searched. On result page 131-140 I actually found a duplicate page. However this page has no sources. It looks like the entire tree is sourced by so and so's gedcom. Well I really don't wish to merge my page with an unsourced tree. This tree seems to be on the maternal line of my page not the Coker line and I do not intend to research the maternal line. So how are y'all handling this? If the tree was sourced, I would consider it a wonderful opportunity to combine the pages but since it is not I am less than enthusiatic.
I am not sure that I wish to have unsourced trees on WeRelate. These trees probably already exist on Ancestry and Rootsweb. --Beth 09:10, 4 May 2008 (EDT)
I don't believe many people are using this site because it does not seem ready. I have added some comments to Talk pages suggesting errors and giving sources and have gotten zero response. After two weeks, which seemed like a fair time, given vacations and all, I tried changing one as I had suggested on the Talk page and even that got no response. Assuming that the notification is working, I think a lot of the early people were just trying it out and got discouraged or were just curious, not serious. I personally have stopped entering data (see my previous comments on the difficulties of determining if there are duplications) until I think the effort will be worthwhile (either a reasonable automated merging, about which I have at best a wait and see attitude, but more likely no faith in it working, or better searching that makes identifying likely duplicates better.)
Unsourced trees do not bother me. If I want to merge with an unsourced tree, I will add my sources and the important part will no longer be unsourced. My concern is that someone will come along and either merge over, and ignorantly overlay entries refined through years of discussion and collaboration, without providing sources or paying attention to past discussions. This is why I think changes should be "proposed" and then voted on by all people watching a page. A much more conservative, but still democratic approach.
I know it is not a good selling point to suggest that making data entry hard to do is a feature, but I think it is. (Personally, I don't see tons of use in uploading gedcoms because I would have to clean mine up anyway to merge.) I would be happy to gradually enter my data and sources a little bit at a time over years, in exchange for participating in discussions with other *interested* persons. Likewise, if I can only propose a change and then must wait for it to get approved, or rejected, then I can wait. The time is not important, since the data is not going anywhere, and if I disagree, I will continue to keep my version of the truth locally, knowing I must find more evidence to convince the jury.
--Jrich 09:58, 4 May 2008 (EDT)
--Ronni 12:35, 4 May 2008 (EDT)
Hey Ronni; I believe I pretty well have the concept of it is not my tree anymore; but I cannot wrap my head around merging with an unsourced tree. It may not be "my tree" but my work will then be associated with it. What I enter on WeRelate will stay; whether I do or not. You misunderstood one point that I attempted to make, I think. I did not intend for one to believe that I was disinterested in the pages. I am interested. I just do not have the time to edit or a research this maternal line; a family that married into the Coker line fairly recently. So if I merge the page therefore the trees, I don't have time to source the unsourced pages so unless someone else does they will remain unsourced.