WeRelate talk:Merging and downloading trees



Merging trees? [12 March 2008]

Anybody else undertaken merging of trees? I've recently been doing manual work on the Slafter family of colonial New England, creating pages by hand. But I discovered that just a couple days ago, somebody did a GEDCOM upload that duplicates a portion of the tree I'd been working on. I looked at the Help topic for merging pages, and have been following that procedure. But I'm realizing that it's quite complicated if you're talking about whole dup branches of a tree, and not just a duplicate page. When you start redirecting pages that are in a tree, then you start getting disconnected bits of tree and "orphan" pages. I ended up making a manual inventory of all the pages with the same names, and which ones needed to get merged/redirected with which. Not for the faint of heart (nor for the non-methodical)!

I agree, definitely not for the faint of heart! LOL. I've done several merges in the past, but because it does involve a methodical checklist of things to be done, I tend to only handle the "minor" merges now until Dallan finishes the new merge program. Amelia has done several merges and appears to have a method to the madness. --Ronni 06:34, 29 September 2007 (EDT)
I second the idea that merging is definitely complicated. There have been various threads in the past as to the various odd effects of a redirect. The most important thing to remember is that if you redirect a family page, it effectively removes the contents of that page -- so all the individuals connected to the family still have pages, and they are now unconnected to the new merged page. (And vice versa with persons to family pages, but that's not such a big deal). I don't necessarily have a great method, but for what it's worth, my general plan has been to identify the main couple I'm interested in, and open up a tab (window) with each family. Then I navigate to the oldest child on each of them down as far as more than one family goes. The idea is to get to people without spouses to merge, or to get to the point where only one tree has further descendants. Once I reach the "end", I can work up: merge kids (which have no spouses by definition in this scenario), merge spouse, merge family page, merge the descendant of the original couple. If the spouse at any level has parents and siblings, this becomes additionally complicated, because one basically has to repeat the process with the spouse's parents' descendants before even getting around to merging the original spouse. This, so far, seems to be kind of rare on the stuff I've worked on, but there's at least one project I skipped in favor of waiting for the automated merge - there were just too many branches. But I hope that helps. (And Tom, welcome to the group of somewhat crazy people merging New England families!)--Amelia.Gerlicher 15:46, 29 September 2007 (EDT)

One thing I'm wondering about -- if I merge two pages and redirect one to the other, do the people who are "watching" the redirected page get somehow transferred to the watchlist of the other merged page? Seems like that's what you'd want. (And if the redirected pages are in other people's "trees", does that get patched up as it should?) I'm hoping I haven't done anything to break other folks' trees.

Dallan just recently added that function that redirected watchers get added to the new pages. Everyone that is watching the "old" pages should now be automatically added to the watchlist of the "new" pages. And as long as the "old" pages are being redirected to the "new" pages, there shouldn't be any "holes" so to speak. But I gotta tell you, I've almost created a mess or two when I was merging some families. I did get orphan pages or have a family without spouse or two. Don't know how that happened, because I *thought* I was being methodical. LOL. --Ronni 06:34, 29 September 2007 (EDT)

Also, most of the GEDCOM uploads I've noticed are associated with User pages that don't exist. Does that mean they've un-registered from WeRelate? Or does that just mean they never bothered to create a "user home page"? Is it okay to create their User/talk page in order to leave a message there? Will they get notified? ---TomChatt 05:02, 29 September 2007 (EDT)

I think in most cases they have not created a user page and/or are simply not active in the community. Usually if someone "unregisters" they'll delete their tree before leaving. In any case, it's ok to add to their talk page. They'll either see it when/if they log back on or, if they have it setup in their profile, they'll be notified by email that they have new mail on their talk page. You can also check their contributions at the bottom of their user page to get an idea of their activity. --Ronni 06:34, 29 September 2007 (EDT)
I'm guessing a number of people uploaded GEDCOM's to see what would happen, without taking time to create a profile page for themselves. I expect that they'll visit WeRelate more frequently once match-merge is working and they receive messages about invitations to merge their pages, and that they'll create profile pages then. You can be pretty sure if you leave a message on someone's talk page that they'll get an email notifying them about it (because most people haven't turned that option off in preferences). But if you want to be absolutely sure, you can click on the "E-mail this user" link at the bottom of their user page or talk page to send them an email.--Dallan 22:42, 2 October 2007 (EDT)

A follow-up to Amelia's comment, and perhaps by way of clarification, I think that bad/unreliable/flawed sources absolutely have a place on our pages - and an important one at that. They should be cited and noted as bad/unreliable/flawed and, ideally, the research that established them as such should be noted. Otherwise, folks just keep rediscovering previously discredited information. This is particularly true of folks like me, who don't have a huge background so we don't instantly know that some sources are not trustworthy. For example, I understand that the Mayflower passenger Peter Brown is the origin of a large number of discredited genealogical lines. At some point or other he was attributed as having had a son. This was later discredited, but chaos has remained in this area for something beyond 100 years. This site in particular, provides a good chance to document both the accepted and discredited research.--Jrm03063 14:48, 21 February 2008 (EST)

Ronni.. to clarify... the "old" page would be the one with the other person watching.. to be redirected to the "new" page, being my page that shows me watching. Correct? --Msscarlet1957 19:14, 12 March 2008 (EDT)
Well, rereading what I wrote, "old" and "new" were the wrong terms to use. Target and redirected might be better words. During a redirect, watchers on the redirected page will be transferred to the target page. Choosing the best target page though doesn't really have anything to do with whether you are watching it or not, or who created it, or how long it's been on WeRelate. This help file explains some of the criteria that's being used when deciding which page to make the target page. --Ronni 00:47, 13 March 2008 (EDT)

Asking Permission [23 December 2007]

While we are on the topic of merging trees, a new user asked me if we should ask permission from the other user we want to merge with before actually merging into their tree. What's everyone's opinion on this? All manners and politeness aside, you could wait weeks or months for a response or never get "permission" to merge with another tree. This particular topic goes along with the "Downloading GEDCOM" topic as well, in that I see it as understanding what happens to your data once it's put online at WeRelate. I realize this has the potential to be a touchy issue with some, so I'm curious as to everyone's thinking or understanding on this. --Ronni 10:50, 17 November 2007 (EST)

I vote no (surprise). Not only would we wait forever, but there's rarely going to be a reason for someone to say no. The only legitimate reason I can think of is if there's a genuine disagreement over whether someone is the same person or not, and I've very rarely ever encountered a case where that was actually true (as opposed to a bad combination of Ancestral File nonsense). If the point of the site is to get the most/best information about a particular person, and the merge ends up with an entry with more information than the previous one, then it's all a win-win. If someone gets upset (assuming the merge doesn't delete anything that's reliable and not duplicated), then they're missing the point of the site. But, I will say that this assumes that the person doing the merging knows enough about the family to use the more reliable information where there's a difference, or to only add parents/spouse when there's adequate support. If you're in doubt about that, better to ask --Amelia.Gerlicher 12:57, 17 November 2007 (EST)

I vote "no" as well, but I do as a practice look to see if the User I'm about to do a "major" merge with is active on WR by looking at their contributions. --Ronni 03:49, 21 November 2007 (EST)

I too vote no. The whole point here is collaboration and there can't be collaboration as long as there are duplicate trees. --Trevorallred 14:40, 23 December 2007 (EST)

Merge Strategy [10 April 2008]

Here's another question along these same lines: merging two overlapping trees might involve merging hundreds (or even thousands) of individual person and family pages. With lots of pages to merge, most people aren't going to take the time to analyze each pair of pages to merge very carefully. So we need to have a pretty reasonable "default" merge strategy. What should that strategy be? For the text we can put the text from one page after the text from the other page. For the events we can list differing birth/marriage/death events from one page as "alternate birth/marriage/death" events on the merged page. Similarly for differing names -- one name can be listed as an "alternate" name. But which events/names should be the "main" events/names and which should be the "alternate" ones? I can think of two possible approaches; maybe there are more?

  1. The earlier-created page always "wins" -- its events/names are the main ones by default. The justification for this approach is that as new people come to the site, upload their trees, and merge their trees into existing trees, it's going to be a bother for existing users if the "main" events in their trees are constantly changing. It will be easier for the new users to adjust to changes to the main events on their pages as the result of the merge, because presumably they'll do most of the merging shortly after they've uploaded their tree. Also, it's likely that the existing pages are more accurate, since they're probably being watched by more people.
  2. We try to come up with rules for which name/event "wins" by default -- the one with the more specific date/place, the one that is sourced, etc.

Any thoughts?--Dallan 22:29, 17 November 2007 (EST)

I like the reasoning behind number 1. As to suggestion number 2, will it really matter if one has more specific data than it's merge mate since I'm assuming all the data from one will be copied to the other?
Like you said, merging trees could involve hundreds of pages, but in the merging process, we will have the option to "cut off" the merge where we want? For instance, I'm generally only interested in the parents of an allied spouse and thus would not want to add their grandparents, siblings, etc to my tree. --Ronni 13:45, 18 November 2007 (EST)
I'm thinking that by default you would add only the merged pages to your tree, not relatives that you had not merged into your existing pages. The problem of identifying people who are not in your tree but are related to people in your tree so that you can choose whether you want to add them to your tree is another important problem though, and it needs to be addressed at the same time that we address merging, if not sooner. I'm thinking that we would provide a screen that would list all the people outside of your tree along with the people in your tree that they're related to, and allow you to choose which ones to add to your tree.--Dallan 23:15, 20 November 2007 (EST)

I think that any technique for merging John Doe (i) and John Doe (j), must preserve all the information from both "i" and "j", and must clearly indicate that it is an automatic merge (and the provenance of the contributions). What parts are more "believable" or better sourced is going to be easy for a human to understand but pretty tough to make a program understand. Better to be sure you lose nothing and hope a human will clean things up.

Besides merging existing people, I was also wondering whether there is a way to perform a less than complete GEDCOM upload (thereby avoiding the need to merge common individuals). Can we imagine a reasonable UI that would break GEDCOM import into a two-step process? Instead of a one shot load the whole batch, a two step process that would build up a list of names (from the GEDCOM) that already appear to be present on werelate? The user would then be free to pick whether those names are uploaded as new individuals or whether the existing werelate individual is substituted for a particular person.--Jrm03063 18:44, 18 November 2007 (EST)

That's an interesting idea. I'll have to think about that some more. Perhaps the tree (multi-person & family) matching + merging could be handled at GEDCOM upload time. For pages that are entered or edited by hand, we could support matching + merging of individual person and family pages by sending an email when the system finds matches and providing a "merge" link at the top of every person and family page. But maybe it would be ok to limit the multi-person & family merge to just the GEDCOM upload process.--Dallan 23:15, 20 November 2007 (EST)
Does that mean that the large overlapping trees that we've been leaving for the automatic merge will still have to be done by hand? Or am I misunderstanding what you're responding to?--Amelia.Gerlicher 23:32, 20 November 2007 (EST)
I thought about that too after I wrote the above. There's no way we'd want to make people merge existing trees by hand. If we use the above approach, we'd have to do something special so that people with existing trees could merge them as if they had just uploaded their tree.--Dallan 10:14, 22 November 2007 (EST)
But we would want to have the ability to "trigger" this merge feature whenever we want, correct? I have been envisioning something similar to the FTE that just sits there waiting for me to use it or not. If we have a merge utility that will scan our trees whenever we want and then give us a list of possible duplicates, then on a case by case basis we could decide whether we want to merge or not. I also envision this program being able to scan our data and compare with others based on certain criteria that is user defineable (dob, dod, place, parents, etc) and not just the name of the individual we are comparing. I also envision (if you're gonna dream, dream big <g>) this utility being able to handle people that I can mark as NOT duplicates, in other words, once I've scanned them and decided they are not a match, I can mark them so they aren't included again in future scans. --Ronni 10:47, 22 November 2007 (EST)
That's one of my big questions in matching. Matching your tree against the other trees to see where possible overlaps are is a pretty machine-intensive process, especially if your tree has a lot of people in it. If we supported this I'd want to limit how often people could request that their tree be "re-matched" against the other trees to once a month or something. Another alternative would be to allow people to match a specific person (a family really, since the match will take relatives' information into account) in their tree against the other trees, which would be much less machine-intensive, but probably also a lot less helpful since you'd have to visit a lot of pages if you wanted to re-match your entire tree.
Remembering the people that you have marked as not a match is a good idea.
Why would you want the match criteria to be user-definable? I was thinking that the computer would calculate a score based upon all of the pieces of information (dob, dod, parents, etc.) that matched, not just the name.--Dallan 17:13, 4 December 2007 (EST)
Ah, score based. Hadn't thought of it that way. That would do of course. :) --Ronni 13:09, 6 December 2007 (EST)
It depends on the way the software is written if we should match BEFORE importing the GEDCOM. I believe we should leave the match code in one location, which means it's actually easier to import the data then delete/merge it into the main trunk after.--Trevorallred 14:53, 23 December 2007 (EST)

I think it's also important to remember that the problem of merging two overlapping trees actually is two problems: matching, then merging. It's much easier to first match the names, then queue them up for merging. At that point we "could" have the computer do the actual merging using some really good hueristics that Dallan has already eluded to. I think it's impossible for the computer to do the matching automatically. There are simply too many variables at this point to trust a machine to match.--Trevorallred 14:53, 23 December 2007 (EST)

I think that matching on a tree-by-tree basis probably isn't a great idea. It's not "work efficient" (for you CS folks out there). Matching shared genealogy is the heart and soul of what werelate is about, and I'm struck that it should be sort of fundamental to the way werelate works. What if we think about matching as more of a re-indexing process, where a set of match candidates is associated with any person or family. Whenever a person or family changes, it gets marked as needing to be recomputed for purposes of matching. When the match index for a person is recomputed, the previous match set becomes a starting point of families and people to rule in and out first. People who drop out of the match set or get added to the match are themselves considered changed so they are marked for being matched again. Of course this means that matching is a continuous and ongoing process, and that changes will have the effect of generating work for the matching engine (or robot, or whatever you call it) - but so what? That's what werelate is here to accomplish. It also means that matching work needs to be queued in a way that prevents any single cluster of related names from hanging up the match robot so that other areas of the werelate data base go begging. Maybe some sort of oldest unindexed page first process...--Jrm03063 17:28, 4 January 2008 (EST)

As individual people and families change, we'll certainly match just that one person/family, and that will be a continuous process. When I talk about tree matching, I'm thinking about GEDCOM uploads. If you upload a GEDCOM containing say 2,000 people, and 200 of them match someone else's tree, I don't think you want to be presented with 200 different match questions one at a time. It seems that it would be a better experience to present it as a two-step decision: e.g., "200 people appear to match tree A, and 50 people appear to match tree B". In the first step you click on the link that takes you to the list of matching people in tree A, and in the second step you decide which of the 200 match candidates from tree A you want to merge with. As you're checking boxes to determine which pairs of people to merge, the system let's you know which of the remaining matching pairs are related to people you've already decided to merge and which are not. So for GEDCOM uploads, it seems that making the matching decisions up-front will be a better experience than making them one at a time.--Dallan 11:34, 7 January 2008 (EST)

As long as we're talking about GEDCOM upload in particular, I completely agree, a two step process on a per tree basis is essential. I can imagine, for example, if everyone who has a Stephen Hopkins mayflower line were to upload it, after someone had gone to the trouble of creating a really comprehensive and nicely done Stephen Hopkins page - ick!

My remarks were in the context of looking for matches within existing data as it is changed and updated in ordinary use.--Jrm03063 12:28, 7 January 2008 (EST)

It occurs to me based on JRM's Stephen Hopkins' comment, that it might be useful in the GEDCOM merge process to have an option that identifies an identical person/family, but doesn't upload any new information. Say, for example, I have Stephen Hopkins in my db, but he's the father of a wife of a sibling to my line, so all I have is a vague birthdate from the Ancestral File. There's no need to "merge" that information into the page on WeRelate, it would just clutter up the Stephen Hopkins page. But my tree should connect to that page. And more likely, Stephen's daughter is already on WeRelate too, but maybe her husband isn't. So if I had an option like "link tree, do not upload my information", I could choose that, and the result would be a new family for Stephen's daughter, but the information already there on Stephen and his daughter isn't complicated with my not-ready-for-primetime details. I know we theoretically want to gather everyone's wisdom on these people, but the reality is that a lot of people have junk in their files, and the degree to which we can avoid having it poured unfiltered onto pages that people have worked hard to make coherent, the better the experience for everyone. This would also be helpful for people updating their own (or a close relative's) gedcom, and would cut down on both the automated and manual computer work if people can avoid having the computer do a lot of merging that will just have to be edited out by hand later.--Amelia.Gerlicher 13:29, 7 January 2008 (EST)

That's a good point. We ought to have a "disregard my information" merge option for both individual-matches and also for tree-matches.--Dallan 18:20, 8 January 2008 (EST)

Regarding merging, at least in respect to an imported GEDCOM, I think the submittal process ought to require the person to identify one person in the GEDCOM as an existing person in werelate. Then follow the relationships stored in the GEDCOM to match up the others. You can't rely on name matches or birthdate matches. Only persons connected to the anchor person get merged. Dangling trees are ignored.

What if someone's GEDCOM has no matches? They manually enter the anchor person, then import.

Conflicting data could be scored, though how you score between two different sources is beyond me. A source that is an ancestral file or ancestry.com is hardly a source since half of those are people's opinions, not reflections of real sources as indicated by the number of people out there propagating known errors. Precision might be a valid criteria, for example 24 Mar 1789 might be allowed to replace 1789 though it is not as clear if they are inconsistent as in 24 Mar 1789 and 1792.

So I believe extra weight goes to the first comers. If you want to change the data that is there, you have to do it manually. Not that first comers necessarily represent better data, but it ensures thoughtful overwriting. To err is human, but to really screw up requires a computer.

It means more manual editing, but I think that is necessary to avoid the damage caused by somebody importing a GEDCOM they downloaded from who knows what website or similar scenarios. The goal is accuracy first and as all experienced genealogists know, accuracy is not easy nor straightforward.

--Jrich 13:33, 4 April 2008 (EDT)

Before we ask people to identify the anchor person manually, I'd like to try an idea I have on finding anchor points automatically. I'll hopefully have something ready to try by June/July.--Dallan 11:48, 7 April 2008 (EDT)

I don't think I envy you your task.

Within the scope of a single person, I have much difficulty thinking that a computer can do a reliable match. However you choose to weight the different facts when looking for a match, it will be wrong in some cases. And that's assuming the GEDCOM has enough facts in it to start with. How many times have I seen where many individuals are represented by no more than a name? Or the same person called Mary here, Polly there. Or a town with four grandchildren all named after the same honored grandparent and all born in a very short timespan.

So there is not much hope unless you take into account earlier and later generations. Then it becomes far more likely to be accurate. However, remembering that many disagreements are exactly over who the parents were, or how many children, etc, will this work? What if the GEDCOM being merged has a string of 10 generations but right in the middle of it it has different parents from the person being merged with. So now does the computer create alternate parents and now reanchor the remaining subtree to the newly proposed parents? If the new parents are brand new to the database, then does that imply all their ancestors are new too? Maybe this GEDCOM is proposing a heretofore undocumented parent between matching grandparents and grandchildren. This argues for the previously discredited (by me) person-by-person matching.

How do you decide what scope to use?

I'm almost inclined to suggest that you "punt".

The secret weapon of your website is time. Over time the data will become better and better quality. The potential damage of computer-generated mistakes will get worse and worse. Speeding up data entry is not necessarily the top priority. (Enabling collaboration to arrive at a higher quality of data than is achievable by oneself is, IMHO.) The facts we are entering are no longer changing so there is really no rush to enter them. Over time there will be less data entry and more comparison anyway as the database gets better populated. Computer-aided data entry probably means the user has not taken the time to see if their input is needed, nor have they discovered that, "Heck, look at this! Somebody has some information I wasn't aware of! Who would have thought that was possible?"

However what is needed badly is an easy way to see that I am not entering a duplicate or to find the person I am interested in collaborating on. This is an entirely different search than a Google-like search for any use of the parts of their name anywhere on any page. It is a more structured search. Many sites can take characteristics I enter, such as name, range of birth dates, location, and return results where the matching people come first, then those with slight name variations, then increasingly remote birth dates, etc. The more John Smiths that get entered, the more important this will become.

Addendum: example: I created Mary Wheeler-134 the other day. If I search for the given name Mary and surname Wheeler in Namespace People and Families, I get 1012 pages. If I add Person: Mary Wheeler to the keywords I get 662. If I put "Person: Mary Wheeler",) i.e., with quotes, I get 113, which is probably how many Mary Wheelers there currently are remaining in the system. On the list of the 113, the displayed blurb shows no useful information except for her name in about half the cases. Quite hard to tell if any of them are the one I want. 20 years from now, how many Mary Wheelers will there be?

--Jrich 14:36, 7 April 2008 (EDT)

Perhaps a separate topic, but using the "Browse Pages" function and comparing the other 133 or so Mary Wheelers isn't going to work. It's been discussed in the past about using date ranges in the title to help make distinctions between Persons of the same name and since I use the "Browse Pages" feature quite a bit, I'm inclined to agree more and more that we need to come up with a better way of quickly identifying the Mary Wheelers we are really interested in. --Ronni 19:49, 7 April 2008 (EDT)

This probably does go somewhere else, but it builds on the previous comments. The Browse Pages does help. It is non-intuitive that browsing is more focused than searching, but I guess that part of the learning curve.

The titles are still a problem. It doesn't seem like it would be hard on the browsing page, assuming it is a most common problem there, to insert something to take the returned title, recognize certain namespaces in the title (i.e., Person:), dig into the page and build a more descriptive replacement. Although maybe digging into the page would be too costly?

It would be nice if the internal link button on the edit page caused a popup version of the Browse Pages page so you could search for and select your link instead of typing it. Am I just in need of more learning here?

Have you tried clicking on the "choose" link that you get when entering people and families on Person and Family pages?--Dallan 15:32, 10 April 2008 (EDT)
Actually I was talking about working on Talk pages. In my discussions, if I refer to a source, I think it would be nice to make that a link to the Source listing, or I mention another person, etc. The internal link button does what I want except it is just boilerplate. I was suggesting it would be nice if it did more. It is easier to type the double brackets so effectively the internal link button is only a reminder of what the format is.
That's a good idea; I'll add it to the todo list (which is getting a little long now so it might be awhile)--Dallan 18:08, 10 April 2008 (EDT)

--Jrich 12:44, 10 April 2008 (EDT)

The new search functionality will have a "match" function that will return results in relevancy-ranked order, and the search result list will include data elements like birth date&place and death date&place. I'm working on this now.

Matching is do-able, it just takes time to develop. Several years ago I worked on a matching algorithm that found 95% of the possible matches and picked the correct match 95% of the time. There will always be cases where the computer guesses wrong. Making the final match decision does not play to a computer's strengths. What a computer is good at is bringing the probable matches to a human's attention, which can significantly reduce the amount of time you have to spend searching for them yourself (unless you want to, which the new search functionality will allow).

Once we get the new match functionality working, I'll list probable matches when people try to to add new Person or Family pages so that they can choose to link to an existing page rather than create a new one.--Dallan 15:32, 10 April 2008 (EDT)

Trouble ahead? [25 February 2009]

I am all for merge but I can't help but wonder how all this will work in the best interest of WeRelate and keep users happy. I feel the idea of merge is to MERGE.. and not to be picky about who gets merged or not into "my tree". If there are two individuals that ARE the same person they would be merged, whether they are in fact relation or not.. such as the parents of an in-law or the parents of that parent's of the in-law. Specifically because "we Relate". I don't have a problem with this however I can see where other's might be offended. It is difficult to explain my point...

On Wiki Pedia nobody "owns" any wiki page there.. and all members can contribute and edit and those pages are permanent. Here on WeRelate folks are worried about THEIR databases... getting cluttered with unwanted people via a merge. So what happens after a merge and someone deletes their gedcom off of WeRelate??? what happens to pages for those folks that were in that Gedcom that is now gone? are they thereafter floating out there as orphans?

Example: Sally Snodgrass uploads "Gedcom A"; Bill Smith uploads "Gedcom B" and sees many if not most of his people match with "Gedcom A" and he spends hours merging and making it all look pretty.. then Sally Snodgrass sees that Bill merged the parents of an in-law with one of her Aunt's husbands and doesn't want so many people in her tree.. and in fact decides to just remove her tree altogether because she is miffed about Bill's work.

You can please some of the people some of the time but not all the people all the time. People are protective about their work. I don't think this would be an issue if folks did not want to download their gedcom with edits back onto their home machines, but I think there has been discussion about members hoping to be able to do that.

Why would WikiPedia have three pages that tell about Napoleon Bonepart?? Why would we choose not to automatically merge John Smith born 1903 died 1940 in the same place and who has a high "score" thus being a match? Just because? so if Sally Snodgrass does have John Smith and chooses NOT to merge him with an exact match.. but along comes Bill Smith who sees the obvious and goes ahead and matches these two up.. and therefor links up all the ancestors to John Smith, people who are NOT of any interest to Sally Snodgrass what will happen? and even if we call in counselors to have Bill and Sally be nice, what about Mr. Newbie that comes along and sees the same match and starts merging on his own as well? I know some of this is available to happen now, as we can merge, but as it stands it is so daunting to merge that I am guessing few would bother. However once it is automated there could be conflicts.

I myself am excited about the prospect of automated merging.. I feel this will help tremendously because my database has 56,000+ people in it. I have to break it down into SO many small gedcoms. I go to one of my immigrant ancestors and begin with him and include all his descendants, and repeat that process over and over, and thus I have all kinds of duplicates on WeRelate as a result, especially since I myself am in each of the gedcoms.. and so are my parents... and so are most of my grandparents and all their siblings, etc! Once automated I can just merge all the "mini gedcoms" into one big family. But will this BIG family then be too large to work at WeRelate? will a huge merged file cause the FTE to slow to a crawl? --Msscarlet1957 23:12, 24 January 2008 (EST)

I get the impression that merging is going to come in two flavors. One is to simply avoid or suppress upload of portions of a GEDCOM that are already present on werelate. I believe it's been described as a two-step process, where the entire GEDCOM is uploaded to some temporary space, compared against the overall content of werelate, and then somehow the results will be presented to the user allowing him to pick and choose what is actually conveyed into the general werelate space as new person/family pages. The second form is after-the-fact of upload - recognizing the different copies of Napoleon Boneparte. I believe there's a vision for a tool that will allow a user to say that "Person:Napoleon Boneparte (27)" and "Person:Napoleon Boneparte (28)" should be automatically consolidated to "Person:Napolean Boneparte (27)", and "Person:Napoleon Boneparte (28)" becomes a redirection, but you can do that sort of thing right now manually.

As for tree implications, I think that anyone with either Boney (27) or Boney (28) will still have those references, and they would just jump to Boney (27) when they go to view that part of their tree.

I don't think deleting a tree has any real global significance. I think it just amounts to a page of references - the pages for person, family, source - don't know the difference.

I've been doing lots of merges manually. I'm struck that this is really the point of it all. If someone wants to maintain their work in isolation, werelate just isn't a tool that they're going to like.--Jrm03063 07:37, 25 January 2008 (EST)

These are all good points. This is why merging is actually much more difficult to get right than matching. Here are some thoughts.

Merging is going to come in two flavors: a tree-based merge when you first upload your tree (we'll also have to do something like this for existing trees), and after-the-fact mergers for people that have been entered or edited on-line.

I agree that people getting offended because someone merged or edited their tree is going to happen. I also agree that WeRelate isn't the place for people who want to work in isolation. There are other websites for that. Hopefully it won't happen too often, but if someone does delete their tree, the parts of their tree that have been merged into someone else's tree (and so someone else is also watching those pages) won't be deleted.

I'm thinking that we'll also need an "unmerge" function. It would be pretty frustrating if merge were a one-way street.

The reason that we have the concept of a "tree" is so that you can limit the number of people that you care about. People in your tree can link to people that are in someone else's tree but not in yours. So if someone else merges their tree into yours, chances are that some of the newly-merged people link to people in their tree that are not in yours. You should be able to add those people to your tree, but it's your choice who from their tree you want to add into your tree. It's ok to leave them outside of your tree and just have people in your tree link to them.

I'm reluctant to automatically merge anyone, especially at the beginning. I'll keep a log of who people chose to merge and who people chose not to merge, and the score associated with each pair. If after awhile we see that if the score is above X people choose to merge 99% of the time, then we could consider doing an auto-merge in those cases. But I know of other genealogy databases that did auto-merging and people weren't too happy about it.

I'm slowly making the FTE better able to handle large trees. It's much better than it was a few months ago, but I don't think it's ready for a 56,000 person tree yet :-). But it should be by the end of the Summer; certainly by the end of the year.

As I get closer to implementing merge, I'll post more ideas here and ask for feedback.--Dallan 15:25, 28 January 2008 (EST)

I've been assuming that trees are just a table of references, and that deleting a tree has no particular implications for the person, family, source, image, or other pages referenced. Is that incorrect? Is there some sort of implied delete of a person page performed if a particular page is referenced by no other tree??? What if the page is referenced by another person or family page?--Jrm03063 15:37, 28 January 2008 (EST)

Your theory is incorrect. If you delete a tree, it deletes all the associated pages that are not either being "watched" or part of someone else's tree. If, for example, I'm watching your family page for Ann Smith and John Doe, but I'm not watching any of the person pages, all the person pages will go away and the links will turn "red" when you delete your tree. If I click on one of them, I get the page that says "this page has no content, click edit" along with a link that invites me to see that there's a deleted edit. Unfortunately, I can only see who and when deleted it, and not the content that was deleted.--Amelia.Gerlicher 20:05, 28 January 2008 (EST)
Amelia is right. Although it is also possible for a page to not belong to any tree and still be part of WeRelate: if you remove a page from your tree, that doesn't delete it. But if you delete your tree, it deletes all pages that are not being watched by someone else.
Well, I hope you had good reasons for that. It isn't very wiki-tuitive, and it runs rather counter to the notion that we're trying to build a continuous non-proprietary fabric that spans from the space being studied by one user to the spaces being studied by others. I suppose there's a notion that you don't want to have the space filled with junk that no one is explicitly watching, but for every literal delete (individual or by tree) there must be ten or more pages that are just abandoned in place. page deletes don't recycle the name or the number-permuted name (for people and families) so that doesn't seem real helpful. If abandonment is ultimately treated as a delete, you're going to wind up chucking useful stuff, so I don't imagine you want to do that. What is the summary rationale for this delete behavior? And what's the thinking on abandoned user content - I've been hoping that werelate keeps stuff essentially forever. Well done genealogical research would seem to have a useful shelf-life greater than that of the typical genealogist....User:jrm03063
I know it's not very wiki-like and I'm not completely happy with it, but here's why. First of all, just because nobody is watching a page, that doesn't mean that it gets automatically deleted. The only way a page gets deleted is if the only person watching the page deletes it. So if you delete your tree, all pages in your tree get deleted unless someone else is watching them. Here are my reasons for allowing people to delete pages:
(1) currently, if you've uploaded a GEDCOM and you want to re-upload an updated version of it, the updated pages aren't automatically merged with the original pages. So to avoid creating duplicates, we ask you to delete your original tree first. This problem will go away in the next couple of months because we'll allow people to upload updated GEDCOM's into the same tree.
(2) Early on we did not allow people to delete their trees, and several people complained that they could not delete their tree after deciding that they didn't want to use WeRelate anymore. Rather than arguing that deleting the pages that you had contributed was unwiki-like, I added the delete functionality instead.
Once we get problem (1) solved, I might turn deletion into something that only an administrator can do, and ask people to send one of the administrators an email if they want to delete their tree. This would give us a chance to notify anyone watching at least one of the pages in the tree that the pages nobody else is watching in the tree were about to be deleted, and to give them a chance to watch those pages if they wanted them to be retained.--Dallan 14:48, 1 February 2008 (EST)
Thanks for the reply. I can see that there are some practical concerns driving this, at least until things reach a greater state of completion. Just so I'm sure about it - does the delete occur when the last tree reference I have goes away? Or would it go the first time that a page was referenced in a tree being deleted? I very much hope the former. Also, in order to mess with larger chunks of people-space, set operations (union, intersection, exclusive-or?, copy/assign) might be handy...
Yes, it's the former. At some point I may add set operations, if they turn out to be generally useful.--Dallan 19:24, 4 February 2008 (EST)

I am probably going to be one of the users that causes trouble regarding automatic merges. While I like the concept of the idea in general and do wish to link to other families; some of the familes on WeRelate are not ones that I consider properly researched and sourced. WFT #233 or research of John Doe tells me nothing. There are at least six trees on RootsWeb that cite one source; which I have proven to be incorrect on the family that I am now entering. These people's trees are incorrect because they did not bother to do actual research. A simple check of census data, in this case, would have eliminated the problem. If I have no choice regarding the merging of my file with another one; I am not sure that I would continue to use WeRelate. Perhaps you can explain to me how you envision this concept of automatic merging to work? --Beth 18:50, 28 January 2008 (EST)

To clarify my comment about a source titled Research of John Doe; you will find that I do use the source, Research of John Doe. But first I must have complete confidence that this person performs quality research and second the researcher has provided me with the sources cited in the research. In my citation you also find the reference to the original source cited. When I view the original source, I change the source to that but still give John Doe credit for the research somewhere in the document. --Beth 18:00, 30 January 2008 (EST)
I think the idea is that when people get "merged" -- assuming that the two entries are really the same -- both sets of sources are retained. And anyone watching either original person gets an email. Then, as the one who cares more/has better information, you can edit out the nonsense appropriately. This is effectively what's done now with a manual merge -- if there's a sloppy Ancestral File version with the wrong parents, and a well documented version, I keep the second one virtually unchanged, and the first one gets a redirect. Usually if I actually delete parent or spouse links, I explain why, or add some sort of "unproven" disclaimer if I keep what I think (but can't prove) is bad data. If anyone gets upset at someone else deleting their unreliable sources or proven wrong research, I really have no sympathy. On the flipside, though, part of using WeRelate is accepting that someone can come along and edit what you add, and you have to accept the responsibility of changing it back if it's wrong.
On the technical issues, I think there are two concepts under consideration:
  1. Gedcom upload that informs you that X people are duplicates and asks if you want to merge or skip the upload of your data(which you should do for the peripheral people on your tree you aren't sure/don't know much about, or to avoid duplicating wholesale a whole bunch of work). Those merges are then accomplished automatically by adding the gedcom information to the existing record, as new "alt" fields, or appended to the existing note field.
  2. Merge approved by a human based on an automated match, that's then merged as above.
As Dallan notes above, true automated merge without human approval is unlikely for a while -- there are way too many cases of people with the same name in the same area born close together to assume that they are the same.

--Amelia.Gerlicher 20:05, 28 January 2008 (EST)

Right. Also, we'll have an "unmerge" button so someone can undo a merge in case that's necessary.--Dallan 22:51, 30 January 2008 (EST)

I am for Merging when it is proven to be the same person if not sure leave it be.....When I began work on My Family I only added the In-Law them self, but then I began to think that this is a family tree not just my tree, so for the sake of my brothers and sisters children and my cousins I began to add anything I could find as long as they connected to someone in my family...I have several thousand names...on my hard drive I keep the trees divided as my mother's family, my father's family, my wife's mother's family, and her father's family, the reason for that is my cousins on my mother's side is not interested in my wife's family or my father's family etc....but on here I love the fact that they all tied together....this to me is a proving ground for my work...a place it can be look over by others and added to or corrected...I am more interested in facts than thinking I am beyond mistakes, in which I make more than my share...If everyone could realize what this site can bring about...Facts not just guess work on our family history... when I began I took everyone's work as being right it did not take long to fine that was wrong...I found one line on another site that connected to my line, it had the father listed, then for his father it had his son again and then it went back about 6 generation repeating the same father and son over and over...some time it takes other viewing our work to catch our mistakes.....I uploaded 6 trees and at the bottom of each page when I edit it give me the chose of checking which tree or trees I want this page added to...--Dlbradley1 14:16, 25 February 2009 (EST)

Merge Video will be needed... [7 February 2008]

Dallan, I just wanted to suggest that once you get the Merge thingy up and running.. we will definitely need a "how-to merge video" :-) --Msscarlet1957 08:38, 7 February 2008 (EST)

Very true! -:)--Dallan 22:46, 9 February 2008 (EST)

Hand-merging isn't so bad...and maybe it could be a whole lot better [25 March 2008]

I've spent the last few days doing a lot of merging, and the process isn't really that awful, especially once you develop a few practices that keep you from losing track of where you are. But it also seemed to me that it could be made a lot easier without anything in the way of UI changes. All that is required is being a bit smarter about what happens when a redirection page is checked in.

Consider a situation of two family pages that nominally represent the same family. At present, before redirecting A to B, I copy record guts from B to A, attach all the children on B to A, then merge the parents. This leaves me with a family that often contains duplicate children, but I havn't lost anything and it's easier to just work on resolving such duplicates on the target family page.

Could the check-in of a "#redirect" on a family page be jiggered such that:

  • Any children (not already present on the destination) are reparented to the destination
  • Any parent (not already present on the destination) is added as a parent or alternate parent

That leaves you with a consolidated family page that presents the needed merge in a more obvious way. It also prevents the situation of inadvertantly cutting off a line in the merge of a family. Finally, it is upwardly compatible with current practice (it would be very strange to use a family redirect as the way to cut away a particular incorrect line).

There's probably a corollary procedure for merging people - taking the union of the parent and spouse relationships and making sure the unique set is preserved in the redirection target.

Of course this leaves you to merge the page contents proper, but that's the easy bit anyway - just open both pages and keep them both alive until the redirection target has all the content of the source. If you stop in the middle of that, nothing is lost.

Thoughts?--Jrm03063 17:28, 10 March 2008 (EDT)

That seems like a really good idea! I don't see any downsides to it, and it would be pretty easy to implement. So when you redirect a family, the husband, wife, and children would be added automatically to the target family, and when you redirect a person, the parent and spouse families would be added automatically to the target person. What do others think about this? If there are no concerns, I can implement it the end of this week or early next week.--Dallan 11:07, 12 March 2008 (EDT)

I like it! I had actually started doing it this past week on a couple of merges. No one got lost, everything was tidy and easy to keep track of and it created a "bookmark" of sorts that I could come back to to finish up. I've thought about it for a couple of days now and haven't come up with a downside yet. So, if I understand this correctly, the steps involved for doing a merge this way would be:

  1. pick the target page
  2. copy all data (b, d, m, notes, etc) from duplicate pages onto the target page
  3. redirect dup pages to the target page
  4. note the alt spouses and/or additional children and continue to merge as needed

--Ronni 12:06, 12 March 2008 (EDT)

That's right!--Dallan 01:41, 17 March 2008 (EDT)

I'm glad to hear that this might be easy to do. It's essentially the mechanical practice that I follow in working through a merge so that I don't lose a connection. If the check-in of a "#redirect" had this additional behavior, I could move a lot faster and more safely.--Jrm03063 12:12, 12 March 2008 (EDT)

I like the idea of being able to include the spouse and children. My one attempt at hand merging left me shivering in fear! What a mess one can easily make (I found out to my dismay!) And the file I found to merge with has a LOT of merging to be done! Scary stuff! --Msscarlet1957 19:11, 12 March 2008 (EDT)

Ok, I'll add it in the next couple of days and leave a message here when it's ready. Thanks for the suggestion!--Dallan 01:41, 17 March 2008 (EDT)

It's a week later (I've spent too much time fixing bugs in the digital library), but the #redirect suggestion is working now. If you edit a page and make it a #redirect to another page, the people/families and images that the page links to will be added automatically to the redirect target. I tested it and everything went well, but if you run into problems please let me know. It's a great suggestion!--Dallan 21:56, 24 March 2008 (EDT)

This works wonderfully Dallan! And I agree... great suggestion! --Ronni 23:31, 24 March 2008 (EDT)

I'm glad this is working out. It's working great for me too. It's only possible though because whoever thought about the data structure up front was careful enough to allow for this. Whoever had the wisdom to allow for alternate parent connections and alternate husband/wife connections deserves thanks.

Now everyone, go forth and MERGE!--Jrm03063 13:02, 25 March 2008 (EDT)

Where do we stand on merging? [6 May 2008]

I'm very happy with the improvements in "redirect" behavior, as they make merging a much simpler business. I wanted to give werelate a shove in the direction of the large connected community tree that it's meant to support, so I've spent the last month or so merging through early New England. It's been generally smooth going, and very satisfying too - a lot of information bubbles up when you bring different user contributions together. Still, I'm afraid I'm somewhat alone in this endeavor. I'm struck that until werelate gets a reputation for having a really large and well connected community tree - not just a bunch of GEDCOMs that live in the same pool - werelate won't take off.

So why hasn't merging become important for the masses? I think there are a few reasons:

  • Merging got a bad reputation as difficult, and apt to create more problems than it solved, when links could get dropped on the floor in the process of a redirect. That is no longer the case, but I doubt that most people realize it. I've tried to encourage other folks when I encountered them in the merging process, but interest seemed low.
  • People are waiting for something better to help with merges. Like what? Like when? I'm not saying that development ought to occur in secret, but the impression that there will be a better approach "real soon now" stifles progress.
  • Duplication isn't obvious. I suppose folks that stick around for a little while learn that a family page with a sequence number above "1" is a pretty strong indicator that they are in territory already covered. But how many take the next step of clicking on the URL line, dropping a "1" (or "2" or whatever) into the family page name to see what else lurks out there?
  • Search results are not presented in a sequence that would tend to reveal duplication. We really need a search based not just on static criteria, but with results sorted by probable relevance. That sort of search would, I think, tend to cluster common individuals and families together in the results.
  • There's no way to keep score! If you manage to attach your tree of 3000 to a world graph of 50000, that's a very cool thing. Right now, we really don't know how big the overall tree might be.

Without a passion to merge the werelate space, I think we're losing the strongest feature of the site's design. How can we get there?--Jrm03063 12:21, 2 May 2008 (EDT)

I'm there with you, Jrm (as you may have noticed). One think that strikes me is the almost complete lack of reaction I see when I merge pages -- perhaps there's no need because the merge isn't controversial, but am I really that good? ;-)

I think you're right that usability is an issue. Merging isn't hard if you understand both the general wiki concept and how that applies to genealogy - but that's a small group of people. I think search is certainly a big problem -- the tricks I use to find duplicate people are almost all things I wouldn't want to explain to my grandmother. Luckily that's under construction. I think the current search not only makes merging difficult, but it makes using the site for regular research almost impossible, so hopefully more people will stick around in the future, which will lead them to be more interested in merging.

There's something else that might be a problem, however -- notification. I just found this entry by looking at my Watchlist, something I do every so often to appease my curiosity. I didn't get an email, nor have I gotten one about the pages I'm watching that you changed in the last few days. They're not in my junk filter, either. (But I did get notifications for two other pages, so it's not totally broken). If this is happening on a widespread scale, it means that people are not only unaware of (new! exciting!) changes to their own tree, but they may not even be aware that a merge is possible, neither of which does much for the communal editing. (And, whether it's related or not, I have one page in particular that used to have at least 10 people watching it and now has two. So either notification worked too well and they decided they didn't like getting emails, or something is weird.)--Amelia 08:56, 4 May 2008 (EDT)

It does seem like there might be a notification problem somewhere. Just to make sure, you're saying that when you view your watchlist and click on the "Show all pages changed since last visited" link, you see pages that you haven't gotten an email about - is that correct?--Dallan 15:09, 6 May 2008 (EDT)

Well, I have been busy entering pages and have not checked recently for possible merges. I have too many pages to check on a page by page basis. I am waiting for Dallan to implement his merge feature. This morning I entered the surname Coker and location United States and searched. On result page 131-140 I actually found a duplicate page. However this page has no sources. It looks like the entire tree is sourced by so and so's gedcom. Well I really don't wish to merge my page with an unsourced tree. This tree seems to be on the maternal line of my page not the Coker line and I do not intend to research the maternal line. So how are y'all handling this? If the tree was sourced, I would consider it a wonderful opportunity to combine the pages but since it is not I am less than enthusiatic.

I am not sure that I wish to have unsourced trees on WeRelate. These trees probably already exist on Ancestry and Rootsweb. --Beth 09:10, 4 May 2008 (EDT)

I don't believe many people are using this site because it does not seem ready. I have added some comments to Talk pages suggesting errors and giving sources and have gotten zero response. After two weeks, which seemed like a fair time, given vacations and all, I tried changing one as I had suggested on the Talk page and even that got no response. Assuming that the notification is working, I think a lot of the early people were just trying it out and got discouraged or were just curious, not serious. I personally have stopped entering data (see my previous comments on the difficulties of determining if there are duplications) until I think the effort will be worthwhile (either a reasonable automated merging, about which I have at best a wait and see attitude, but more likely no faith in it working, or better searching that makes identifying likely duplicates better.)

Unsourced trees do not bother me. If I want to merge with an unsourced tree, I will add my sources and the important part will no longer be unsourced. My concern is that someone will come along and either merge over, and ignorantly overlay entries refined through years of discussion and collaboration, without providing sources or paying attention to past discussions. This is why I think changes should be "proposed" and then voted on by all people watching a page. A much more conservative, but still democratic approach.

I know it is not a good selling point to suggest that making data entry hard to do is a feature, but I think it is. (Personally, I don't see tons of use in uploading gedcoms because I would have to clean mine up anyway to merge.) I would be happy to gradually enter my data and sources a little bit at a time over years, in exchange for participating in discussions with other *interested* persons. Likewise, if I can only propose a change and then must wait for it to get approved, or rejected, then I can wait. The time is not important, since the data is not going anywhere, and if I disagree, I will continue to keep my version of the truth locally, knowing I must find more evidence to convince the jury.

--Jrich 09:58, 4 May 2008 (EDT)

  1. Hey trust me, Amelia, I'm seeing your merge activity. The changes in my Watchlist are huge! :) I would like to get in there and look over the merges, but time is an issue with me right now and also I am concentrating on a different area right now, which brings me to an idea I've had for a while. To encourage and support collaboration I wonder if we couldn't set aside time to problem solve or feature a particular family or families. Sort of like how we do our "Featured Page" where nominations are added, we could have a list of families to work on together. It doesn't even have to be a family we're related to (although getting familiar with non-related family can be a challenge). It also doesn't have to be anything big .. maybe finding a death date or an obit. Just something to encourage and show how collaboration can work. I don't know that I've actually seen a collaboration take place on WeRelate, so that in itself would be a learning process on how best to share data, conflicting resources, etc. These "problem solving" pages would be featured on the Main Page (which I assume is the main portal into WeRelate) so that others can see what WE as a community are working on.
  2. I haven't looked at it recently, but does the Help page on Merging need updated to reflect the new redirect behavior?
  3. I don't have it set in my preferences to be emailed when changes occur. I just check my Watchlist everyday, several times day. I assume I'm catching everything? But I've heard others mention not getting email notifications of changes, so it may be a problem.
  4. I second Jrm's idea on seeing how big the overall tree is after a merge.
  5. Beth's comment about not wanting to merge with an unsourced tree or merging with pages she's not interested in is the biggest obstacle we have to overcome with WeRelate. I don't mean to single you out Beth, because I have to overcome similar feelings too when I work on WeRelate. My Tree is safe in my genealogy program at home. No one but me can touch it. My TREE as a separate, isolated tree should not even be a concept on WeRelate. I think this is the hardest issue to overcome on WeRelate. So what if the page is unsourced. So what if they have an alt date of some kind. Note it, comment on it, etc. (Re: Jrich's comments above on this.) Think of these as working copies, copies that will improve over time, either by you or someone else. Again, MY TREE is in my genealogy program at home. No one but me can touch it. MY TREE does not exist on WeRelate, but I have contributed over 1000 names to the WeRelate Tree.
  6. I am afraid that WeRelate has the potential of being just another dumping ground for GEDCOMs. While patrolling pages, I see the GEDCOMs that have been abandoned and there are many of them. I'm like Jrich, I don't know that uploading GEDCOMs serve that much of a purpose, because they do need to be cleaned up. I'm still cleaning up mine and only uploaded a very small fraction of my entire tree. And I give a hearty "Amen" to Jrich's last paragraph. He said what I was trying to say.

--Ronni 12:35, 4 May 2008 (EDT)

Hey Ronni; I believe I pretty well have the concept of it is not my tree anymore; but I cannot wrap my head around merging with an unsourced tree. It may not be "my tree" but my work will then be associated with it. What I enter on WeRelate will stay; whether I do or not. You misunderstood one point that I attempted to make, I think. I did not intend for one to believe that I was disinterested in the pages. I am interested. I just do not have the time to edit or a research this maternal line; a family that married into the Coker line fairly recently. So if I merge the page therefore the trees, I don't have time to source the unsourced pages so unless someone else does they will remain unsourced.

I don't entirely understand your statement about "my work". Each page has a revision log which shows which ID edited it. If you add sources, you will be listed as editing a page, but not solely responsible for its content. Presumably your description will say what you are doing. Further, no other pages will show your ID editing other pages. --Jrich 15:46, 4 May 2008 (EDT)
Maybe I don't understand how downloading gedcoms will work either. I assumed that if one merges a page that connects 2 trees then the ultimate gedcom download will include both. The tree with sources (unless the page is still under construction) and the other tree with a few sources but mostly Ancestry's World Tree. Exactly what information will the downloaded gedcom contain? Do the people involved in the creation of the pages get any credit in the downloaded gedcom? I really do not know. --Beth 18:56, 4 May 2008 (EDT)
I haven't uploaded any gedcoms so I don't really know what happens. The pages I have seen show a user ID and "gedcom upload" in the revision log. The ID doing the upload is listed as the first watcher of the page, though I believe you can remove the page from your watchlist and erase this if you wanted. I haven't tried, but believe you could do a diff and possibly see exactly what each revision contributed to the page. However, I also believe that uploading creates all new pages (at least so far) so if another page exists for that person, your upload would create a duplicate, and both pages would exist each with their own author identified. I assume a merge would show up in the revision log as above. --Jrich 17:13, 5 May 2008 (EDT)
I suspect that GEDCOM downloads will be specific to particular user "trees". A user tree isn't really what you might think of as a tree in another genealogy system, but rather, a list of wiki pages (the contents of which may or may not represent a connected tree). Merging two pages from two separate trees will only change how the two trees view that particular page - nothing else. Only if you explicitly add or remove a page from a tree will anything be dropped from the list that represents a user "tree".
That's right. A GEDCOM download will include only those pages that you have added to your tree, so if you merge with a page in another tree but don't add those pages to your tree, then your download won't include them. The downloaded pages will contain a link (source or note?) back to the wiki pages, which will list the authors in the page's history.--Dallan 15:09, 6 May 2008 (EDT)
Thank you for the clarification. Sometimes I am a little slow on the uptake. This changes my view on merging pages; I think. Anyway I will give it a go and see what happens.--Beth 18:15, 6 May 2008 (EDT)

About gedcoms, I initially thought that that the gedcom capability was a must; but have since changed my mind. I am not uploading any gedcoms but I have decided to manually enter all of my data. If we disallow gedcoms then most of the problems with quality will disappear. I believe that quality over quantity is desirable. You are less likely to attract serious researchers; the more junk you allow.

When the merge feature is enhanced; I suppose one can attempt to merge my page and I will protest and the majority will rule. --Beth 12:54, 4 May 2008 (EDT)

I don't want to stifle enthusiasm for merging, but I don't think that WeRelate ready yet for large-scale use. There are too many known issues: search, match, merge, etc. that make it difficult to use for most people. So we haven't been actively promoting it, nor have we been doing things like issuing newsletters or whatnot to encourage existing users to come back to the website. (Although the lack of notification does seem to be a bug.) What we have is in some ways right now an ideal situation: a group of dedicated people who care about genealogy and how to make collaboration work, who can help decide how WeRelate ought to function. I know that development progress is slow (believe me), but as I step back and look at where we are in comparision with other websites I think we're headed in the right direction - a much better direction than I or most others could have come up with on their own.

With a couple of notable exceptions :-) we really haven't explored merge. Merge is something that will get more attention this summer. I agree that merge is at the heart of the promise of WeRelate, and it is frustrating to me that we're not seeing more collaboration. But I think that the website is too complex for most people. We've got to make it easier to use before calling most people back. While GEDCOM upload has some benefits, I agree that it also has some downsides. I think we need to add a step to the upload process to help ensure that the upload doesn't just generate a bunch of duplicates - perhaps something that requires the submitter to look at the submitted GEDCOM and make merge decisions or else the GEDCOM gets removed. And I think we should allow people to request that an unsourced, abandoned tree that's getting in their way be deleted. I imagine as things progress this summer we'll have more ideas around how merge ought to work, and how to reduce abandoned GEDCOM's.

For now I'm still working on search. It's obviously taking longer than I anticipated. The good news is that I have hired a student to work over the summer so progress will hopefully pick up.--Dallan 15:09, 6 May 2008 (EDT)

Colonial Merge Wrap-Up [6 May 2008]

I'm just about reaching the end of a personal merge campaign/vendetta involving early New England settlers. I found that I could find stuff to merge more-or-less mechanically, by recognizing that family names with sequence numbers above "1" typically flag a family that is duplicated. Over the course of my work, I would say that I only hit perhaps two or three families where no part of the name was "Unknown" and there were two or more real families present. Even in those cases, I suspect that the trees I was merging actually had errors, but no matter. Over the last six or so weeks I've redirected something over 1900 family pages, so there's something like a %99 chance that a duplicated page name (again, without "unknown" appearing somewhere) represents an actual duplicated family.

Using the knowledge that duplicated family names are hallmarks of probably duplicated tree fragments, I was able to come up with some approaches to systematic detection of duplicates. Starting with my own family tree page, I looked for every example of a family page with a sequence number above "1". I would go to the page in question, skim the contents, then directly change the URL entry to point at the page associated with sequence number "1" (or "2", or whatever). Assuming a duplicate was found, I would always merge the family page down to "1".

Merging of family pages no longer cuts away family relationships that exist in the redirection source, so the result of merging one or more family pages down to page "1", is to create a superset of the family relationships - from the various duplicated pages - in the single target page. Working within the single consolidated family page, I would then merge duplicated father, mother, and child pages as appropriate. The result of this operation is the creation of superset person pages, that will often contain subsequent duplicate spouses or parents. I recommend avoiding the urge to move on to redirecting the next layer out, before completing the first layer - for a large merge, you can get pretty confused. After merging the various parents and children in a family, if there are more than a trivial number of duplicated family or spousal relationship pages pointed at by the person pages of the starting family, you may wish to write those down (or write them into a "to do" user page. From there, pick the next duplicated family and repeat the process.

After resolving all the duplicate family pages on my family tree page, I moved on to my "watchlist". As you merge family and person pages, your watchlist will grow and additional pages to merge will become apparent.

Finally, after clearing away all the potential merge candidates in my watch list, I moved on to an ad-hoc search for matches (still exploiting the property that duplicated family pages generally indicate actual duplication). I wanted to be able to find family pages that were created after the pages that I was looking at in my tree and watch list. Initially, I did this by opening my watch list and selecting the text associated with all of the family pages found there. I built a table for this material where the first column is the family name, the second contains clickable links to the instances of the page that I know about. The third contains a hypothetical link to the page name that would be "next" after the pages that I know about. By watching the list of hypothetical links, to see if any of the named files are present, I can find possible duplicates across my entire set of family pages. Eventually, creating this table by hand became too much of an aggravation, so I started copying my watchlist of family pages into a local file on my system. Using a script that I had written for the purpose, I then automatically generate the wiki "table" content I need. My current version of this table can be seen at User:Jrm03063/Family Overview Table.--Jrm03063 16:37, 6 May 2008 (EDT)

Wow, that's amazing! I'm really glad to hear that the family-centric match + merge approach is working well, because I've been planning to focus on matching family pages to find duplicates in the automated process as well. Your table is very cool as well!--Dallan 17:44, 6 May 2008 (EDT)

Generalizing Warnings, Including Duplication Detection [30 May 2008]

I've developed this material since it first appeared here, and it can now be found at User:Jrm03063/A Functional Specification for Consistency Verification in WeRelate. I'm viewing warnings about possible duplication as essentially the same thing as warnings about more traditional sorts of genealogy database integrity issues (birth after death, etc...).

--Jrm03063 29 Jun 2008

Seems like a good idea. We could update the warnings when the page was saved, or when it was indexed 10-60 minutes later. I'd probably want to make the warnings list non-editable, so not a regular "wiki" page, but a "special" page that just retrieves the warnings for the page from a database table and displays them. One issue with delaying warning re-generation is that you wouldn't get notified that the page had warnings when you saved the page. It might be better to re-generate the warnings after every page save in order to give people immediate feedback.--Dallan 19:36, 27 May 2008 (EDT)

I'm convinced that anything that moves us toward a larger, more consistent, and more correct data base can only be to the good. Perhaps it doesn't matter how thin or flawed data is at the time it starts life in werelate, as long as the evolutionary path is toward better and more complete data over time. I think that something like a warning infrastructure, that just happens to also suggest merge candidates, would help give things a healthy shove.--Jrm03063 20:04, 27 May 2008 (EDT)

Okay, I see a need for an immediate notice if one is creating a duplicate page with the exact same name. I just created a duplicate family page today and should have noticed that it was number two but did not. Now I need to merge the two pages. The #2 page has more information and the recommendation is to merge to the page with the most info. I wish that one could merge the info and then choose the number for the new page, preferably the lower number. And then I think that #2 should be available for reuse after the two users agree on the merge. If neither number can be reused; it makes no difference whether I merge to #1 or #2. --Beth 21:11, 30 May 2008 (EDT)

Merging pages [8 September 2008]

It would be nice to have the ability to merge the sources also when merging a page. The Help page on merging has not been updated to show the latest changes.--Beth 20:18, 25 August 2008 (EDT)

Please, only if it's optional. Something automatic like the parent/spouse bit would be really annoying. Many of the pages I merge have either junk or duplicative sources that I don't copy over, or heavily edit when I do. Moving them automatically would require twice the editing.--Amelia 21:10, 25 August 2008 (EDT)
There's something to be said for both of these positions. But I don't think you do sources without also doing life events/facts and narrative. If you did everything, then (when you did the editing that Amelia mentions) you would at least only have to do it on a single destination page by cutting away junk and tidying up the stuff you want to keep. I definitely have some concerns about losing stuff when I make a merge.
Indeed, as I think about this a bit more. If you had a complete merge (nothing display pretty, just a complete superset of the two pages, with an expectation that the target page will need editing), the page revision history would have a page entry that contained all the merged information. Your editing session that did the initial cull of stuff would be apparent in comparing the two different editions of the page. When the cull occurs in the context of getting a page ready to merge, and then adding the stuff by hand to another page, it's a lot less clear what is happenning.
Getting rid of sources that are "pure junk" though, seems like a separate task where we want some sort of administrative/robot support. When a source or mysource page gets designated for purge, we don't want to then have to go to all the pages that may reference the source. Instead, we want a super-delete, that gets rid not only of the junk source, but all the references (not the whole page - just the citation).


I think there are two situations:

  • Merging two pages: In this case I've been thinking of a side-by-side merge screen that by default specifies that everything from one page be added to the other page, except source citations that the system determined were duplicate or pure junk (these would by default not be added). The user could override these defaults before merging the pages. If a source citation wasn't carried over into the merged page, we wouldn't delete the corresponding MySource; occasionally we might go through and remove MySources that weren't linked to by anything.
  • Merging a GEDCOM: In this case the user would be presented with a list of potential pages to merge into before pages had even been created for their tree, and they would specify which pairs of pages to merge. If the user doesn't use the merge screen to modify the default settings for a particular pair, then the default settings would be used to merge that pair of pages: add differing information from their page onto the existing page, except duplicate/junk source citations. My guess is that when merging a GEDCOM, most people will go with the default merge settings most of the time, so the defaults have to be as "reasonable" as we can make them.

After writing this I can see that your approach of one revision including everything from both pages, followed by a second revision omitting certain material would be pretty useful. The side-by-side merge screen would be a convenient way to exclude certain info and what some people have come to expect after using other programs. Perhaps the merge screen should create two revisions if some information from the merging page is excluded: the first adding all info from the merging page, the second removing excluded information.--Dallan 13:35, 26 August 2008 (EDT)

  • A guided merge of two pages certainly would look cool, but I don't think it turns out to be a lot of bang for some serious programming bucks - in fact, I think it's trouble we don't need. The ordinary clicking/cutting and reorganization tools that you have on a single page will do just fine tidying up a page that represents a superset of one or more originating pages. I also find, that as I'm working through a large merge it's more important to get rid of the surplus pages than (at that moment) worrying about what the combined page will look like. I think I would be most satisfied with a dumb merge. Assume that page 'B' is being merged to 'A'. Grab all the information from page 'B' and drop it in a block at the end of the narrative section of 'A'. It could be a section with a header indicating 'data from Person B (nn)'. The section would contain all the facts, connections, sources, and narrative from 'B' in a simplistic layout that will be easy to cut and paste around when you have a chance to tidy the page later on. Make absolutely no effort to interleave sources, facts, or narrative - that only obfuscates the origin of the pre-merge data (besides creating more work for the programmer). I suppose it's also true that it doesn't create anything (or at least much) that is new for users to grasp. This may be a case where "dumb is beautiful".

Solveig promises to update the help pages when the kids are back in school. I do want to try merging sources automatically, but I agree that we'll have to do something to drop junk and duplicative sources before merging.--Dallan 23:16, 25 August 2008 (EDT)

From a personl point of view, I am merging my pages because I decided to upload a gedcom; it would be useful to have all events and sources merged; I can then do the cleanup. Alternate names are not merged either. We need to indicate on the help page exactly what is merged and what is not merged.--Beth 02:13, 26 August 2008 (EDT)
So here's what I'm trying to avoid: I would like it to be as easy as possible to avoid corrupting the existing data with junk from the Ancestral File/OneWorldTree/etc.. We have nice merged pages that capture all the known information on many early colonists. People on the Mayflower, for example, who have been the source of much research, including the debunking of earlier theories. I'm not saying these pages have everything possible, but the chances of a new entry adding better information are pretty slim. The more work goes into hand editing existing pages, the bigger this problem becomes.--Amelia 23:40, 26 August 2008 (EDT)
I agree. On the other hand, in a twisted sort of way, the presence of nothing but one world tree and similar sorts of "source" does tell you what you're looking at in a backhanded sort of way. If a merge adds the entire body of the source page as a block, at the end of the target page with literal text that identifies the content as the merged information from some predecessor page, you wind up with a page where it's probably easy to decide what to do next. None of the new data is interleaved with or corrupting the existing page, but the old data from the predecessor page remains intact as an identifiable unit for deletion or integration as appropriate. If it has something of value, that element is retained and the rest dropped. It's easy to see the evolution of the page from the introduction of the merged-in page until the point at which the new data is again a single unified presentation. If the merged in page adds nothing but soft data and one-world-tree sources, then it's very clear why the next page edit drops the merged in block without changing anything. If some nugget was worth saving, that's clear too. But it's deceptive when a shaky page loses it's shaky sources, and the remaining data gets added to a merge target as gospel truth.
I don't understand the problem with sources. Perhaps the word merge is the problem. I just wish to have the sources added from the page that you are merging to the new page. I don't want the sources actually merged. If you don't want the source on the new page; isn't it fairly simple to hit the remove button?--Beth 07:30, 27 August 2008 (EDT)
Right now, if you don't want parents or spouse links added to the new page (if, say, they are the wrong parents), you have to edit the to be merged page to remove that link, then edit it again to do the redirect, in addition to any edit to the newly merged page to add info. My first comment was aimed at avoiding that first step, based on my thought that your request does not require the full merge functionality. The other comments I think are aimed more generally at how full merge will work.--Amelia 11:57, 27 August 2008 (EDT)
Hi Amelia, how are you and your baby? You and Jrm are the more experienced mergers and understand the process more than I. Whatever y'all decide on, I am sure, will be the best method. When I merge my personal pages; I try to enter all of the data from the page with the higher ID # that will not be transferred by the redirect which appears to be alternate names, events (other than birth, marriage and death) and sources and notes to the page with the lower ID#. Not sure if burial is tranferred automatically. I then redirect the page and clean up the new page. I guess if there were people on the old page that I did not wish to include I would remove them before the redirect; but mine are duplicates so I redirect them so I then have all of the duplicates on the new page and that helps me keep track of the pages that I still need to merge. Is this the method that you use?--Beth 15:41, 27 August 2008 (EDT)
That's basically the procedure I would follow. I clip everything from the source page and add it crudely at the end of the target page. I'll then redirect the target page to the combined page. Finally, I edit the combined page to get rid of useless junk and so forth.
On the question of page merge generally, the only things that get transferred automatically (at present) are parent and family connections. This was useful because the old merge technique required you to move every individual connection by hand (from one page to another), and such connections were often lost. I could be just as happy though, if the family connections were not automatically added to the appropriate father, mother, or family member connection field, but rather, if those connections were preserved in a common block, with everything else that came from an originating page in a merge operation. Such connections would be preserved as mere hypertext references - not automatically as parents, children and so forth in the destination/merged page (as is done presently). When you edit the merged page you could move the family / child /etc. connections to their appropriate locations, or delete them if they do not belong.--Jrm03063

An issue is that in the case of merging in a new GEDCOM, where you might have hundreds of to-be-merged pages, I want the "default" settings to do the right thing most of the time, without requiring follow-on edits of the merged pages. So instead of putting the merged information in a block at the end of the narrative text, I'd rather add unique dates and places (if not already recorded on the page) as additional events by default. Same with sources -- if they're not duplicative and not pure junk, add them as well. Same with relationships. The problem is under this inline approach, once they've been added it becomes more challenging to remember which names/events/relationships were added by the merge and which ones where there before if you want to edit the page to remove some of them. This is where I think a merge screen would be useful -- to show people what is going to be merged into the page.

I'm all for keeping this merge screen simple though.

  1. Here's one idea: have you ever clicked the "Show changes" button at the bottom of the page after making changes but before saving the page? One possibility is to have the merge screen be essentially the "show changes" view with the information that by default would be added to the page already added inline to the page fields. From this "show changes" page you could see what was being added and you could edit the page to take out what you didn't want.
  2. A second idea is to have the merge screen be similar to a diff screen, with the to-be-merged page on the left and the merge-target page on the right, with checkboxes next to each line of the to-be-merged page. If the box was checked it means that you want to add the line into the merged page, and if it's not checked it means you don't. This represents a simple interface for the common case of just excluding certain information from the to-be-merged page, but if you wanted to do any other editing, you'd have to edit the merge target after the merge.--Dallan 17:18, 28 August 2008 (EDT)
My experience with merging is that it's just never a single source to a single target. It's one patch of tree onto another patch. It's also often several pages to one - not just one to one. A merge usually is of two or more entire tree segments. As a result, you don't want to focus on a single person or family page, but instead, on the different representations of the family context you're trying to bring together. I'm not sure what it might look like, but if you could instead come up with a page that allowed us to specify how one collection of pages should map to another collection (and then, run the auto-merge/upload-merge on the set of connections) - that would really be useful. Maybe not an entire segment, but instead, two or more family pages.
Here's an idea. Create a tabular orientation of one or more designated family pages. Put all the "husbands" on the top row, all the "wives" on the next row. Make an effort to match up children from the different family pages. Create some controls that let the user define which people are unique and which are duplicates, by moving the people up and down in the columns so that they line up on common rows for common people. You would also need a way to flag existing duplicates in a single family. Hmmm..... --Jrm03063

This seems like a good idea. Let me see what I can come up with. How often does the "duplicates in a single family" come up? If not too often, then I'll probably just handle that case with a "Person" merge screen.--Dallan 18:44, 2 September 2008 (EDT)

It's typical at present, since the merge of families specifically results in duplicated person pages. It would be less typical if we could merge families as a collected set of person merges to be done using the standard/default person merge mechanism. Under that scenario though, an isolated merge of two disparate person-pages would also be quite uncommon. I was thinking that a vertical (as well as horizontal) pairing mechanism would allow both sorts of merges within the same display/paradigm. Two more things - besides providing a default "family" page merge, you would also need to allow the page to express which person and family pages are retained in the as a result of the merge.
The amount of material needing to be merged, still seems quite large. We need to move a lot faster than any one to one merge mechanism will support. --Jrm03063

I have a couple related ideas to offer:

  • Multiple merging modes
  • Merge to talk page

In reading Dallan's comments, I was struck that he was thinking about pages that are new/unworked/immature. Amelia, on the other hand, was concerned about pages that are highly mature. My own thinking, about adding merged information as a block at the end of a page fell somewhere in the middle. So essentially, we're talking about doing the right sort of merge based on the maturity development of a page. Here's the idea:

  • A page lacking any categories, watched by only one or two users, is considered "immature".
  • A page being watched by less than seven users, and lacking a specific "maturity" category, is considered "developed"
  • A page being watched by seven or more users, having a "talk" page, or marked with a category called "mature" (or perhaps, in other categories that imply maturity (mayflower passengers perhaps?) is considered to be mature.

Given these levels of maturity in a merge target, an appropriate merge strategy could be used.

  • Immature pages could be merged as Dallan seems to have suggested - just go ahead and embed events, sources, etc., in the correct locations.
  • Developed pages are left more alone - the information from a merge source is added as a contiguous block at the end of the page.
  • Mature pages are accorded the most respect - the block of information from a source page is put on the talk page. The actual person page is left alone.

If anyone asked me, I would be totally happy with only two modes - essentially the forms I describe as "developed" and "mature". I think it would be simpler code to write and leave information in a state where a history inspection would yield understandable results. But if there's a determination to do an interleaved page merge, I don't think that creates problems if it's restricted to immature pages.--Jrm03063 11:14, 4 September 2008 (EDT)

I'm currently thinking that an automated merge is not a good idea, so I'm leaning toward as simple a manual merge process as possible:

  • a "Compare" screen where you could compare two or more families side-by-side, select the families to merge, and choose which husbands, wives, and children to merge by moving them up and down. Clicking on a "Merge" button would take you to the following screen:
  • a "Merge" screen that would list family data as well as person data for each of the husbands, wives, and children to merge. The system would choose one of the families to be the merge target (could the family with the most people watching it, or the earliest-created family); the rest of the family pages and their associated person pages would be merge sources. Merge sources and targets would be displayed in a table, with columns on the left for the merge sources and a column on the right for the merge target. Each data element would be displayed in a separate row. An "Add" checkbox would be next to each data element of the merge sources that differs from the corresponding data element in the merge target. If this box is checked, the data element will be added to the merge target. By default, this box is checked only if the system determines that the data element is not duplicative/junk. (For example, a birthdate of 1805 wouldn't be checked if the merge target's birthdate is "18 Mar 1805", but a birthdate of "18 Mar 1804" would.) Clicking on "Save" at the bottom of this screen would turn the merge sources into redirects and add the selected information to the merge target.
  • In addition to handing entire families, the "Compare" and "Merge" screens could also work to merge individual person pages, but I agree this will be less common.

I think that multiple merge modes would make the system more difficult for newcomers; I'd rather come up with an approach that works well for all cases. To satisfy the approaches described by Jrm03063, the merge screen could also add all of the information from the merged pages:

  1. into a block at the end of the merge target,
  2. on the merge target's talk page,
  3. directly to the merged page, creating a revision that would be replaced immediately by a revision without the data elements that the user didn't add (the two revisions could be compared in the history to see what wasn't added).
  4. A fourth option is to not store the un-added information from the merge sources anywhere on the merge target -- the merge source could be viewed by following a link embedded in an automatically-generated summary comment for the merged revision, which would be visible in the merge target's history.

My preference is option four, because I believe that a common case will be merging an immature page into a developed or mature page, where the immature page doesn't have anything different to add. In that case I'd like to not update the merge target at all so that watchers of that page don't get notified every time someone merges a page that doesn't have anything new to add to the page.

I could add an "Edit (expert)" button to the "Merge" screen (in addition to the "Save" button) that would follow option 1 -- add all information from the merge sources into a block at the end of the merge target and open up an edit screen on the merge target so that you could edit it by hand. But I'm not sure how much this would buy you vs. being able to check boxes to say which data elements from the merge sources end up in the merge target. I may be missing something though; you have done much more merging than I have.--Dallan 11:28, 5 September 2008 (EDT)

Are you still planning on supporting some sort of auto-merge (once associations have been established) on upload?--Jrm03063 11:59, 5 September 2008 (EDT)

I've been thinking lately that as much as I try to have the system not add "junk" from GEDCOM pages in an automated merge, it's going to happen, and an automated merge will give GEDCOM uploaders an easy way to update possibly hundreds of pages at once, and most of the updates will be junk. Others will receive emails about these updates, and seeing that the updates are mostly junk they'll tend to stop reading their change notification emails. So I've been thinking about an alternative: make GEDCOM uploaders merge each family one at a time, but give them an option to exclude from the upload matching ancestors of people who have already been merged. People who are excluded from the upload won't have to be merged, which will save the uploader time. This should encourage people who have long lines of ancestors that they don't care that much about to choose to just go with whatever is already at WeRelate for those ancestors rather than taking the time to merge their information into those people.--Dallan 23:42, 6 September 2008 (EDT)

I have a couple of questions/thoughts -

  • Are you (Dallan) still planning to try to support an iterative download->upload->download->.... process?
Yes, that's still the case -- hopefully by the end of the year.
  • I havn't seen GEDCOMs appearing since the warning message was updated. Does anyone have a sense of this?
In May 59 GEDCOM's were uploaded, in June 35 GEDCOM's were uploaded, in July 63 GEDCOM's were uploaded, and in August 77 GEDCOM's were uploaded. So I'd say that the number of GEDCOM's is going up. File size is decreasing though: average file size was 622K in May, 1130K in June, 767K in July, and 381K in August. That's good overall I think.--Dallan 12:25, 8 September 2008 (EDT)
  • Even though I would like a more powerful family merge capability, it may be that a less powerful one that people aren't afraid to use would serve werelate better. The efforts of a few can't scale that much if we're relying on software tricks and techniques to help us. The efforts of a lot of users, well....--Jrm03063 00:20, 7 September 2008 (EDT)

Can it really be true? [6 November 2008]

I don't believe it, but when I went to "My Relate" and selected "Show duplicates," I received this screen message:

No possible duplicates found.

I don't believe it.

Jillaine 19:50, 2 November 2008 (EST)

You were right not to believe it :-). You must have visited during a short period of time when the duplicates list was being re-built. Normally this happens early in the morning, but over the past few days I've been re-building it sometimes during the day as I've discovered bugs. Anyway, you have a little over 100 duplicates.

A quick note: you will notice that a few of the compare-duplicates screens list a "red link" page that doesn't exist, so you can't merge it. This is a result of a page that was either deleted or never got created, but which shows up as an alternate husband/wife to one of your own pages on a family page. You can just ignore these; I'll take care of them.--Dallan 08:15, 5 November 2008 (EST)

Yeah, I figured that something like that had happened because I went back the next day and had a slew of dupes. I knew it was too good to be true. Thanks for the explanation. I'm trying to do a few per night. I hope I'm not messing things up. But it seems like the system you've written is pretty good, Dallan. Nice work. Jillaine 19:30, 5 November 2008 (EST)

Merging blues [20 January 2009]

I don't know what happened but there seems to have been a minor flurry of GEDCOM uploading over the Christmas break. I have spent about an hour a day recently trying to merge away duplicates created by GEDCOM uploads having no sources, obscure references to personal databases, and one child per family. I hate to see the time that will be required when WeRelate starts getting heavy use, all just to tread water basically, since few of these uploads add anything useful.

Merging is a dangerous activity and I am afraid all this activity will cause me to make a serious error, if it hasn't already. One doesn't see/read the Personal History section, and often ends up making snap decisions about whether I should save this unsupported date, or that unsupported date, or both. One gets pulled into family members one may not be all that familiar with. Then the propagated changes to the associated family sends out all sorts of notifications to other users, all caused by people thinking they are doing a useful service by uploading unsourced family trees.

--Jrich 09:40, 5 January 2009 (EST)

I feel your pain. While progress has been made merge-wise, there is an aspect of "just treading water" that is a bit disheartening. It's unfortunate that the addition of new material, which is something we are generally all hoping for, is also a cause for disappointment.
Dallan is trying to get the merge on upload stuff together, and I seem to recall he thought that was a prerequisite for the "official rollout"/non-beta release (whatever that means).
Please don't let worry about making a mistake bog you down too much though. If you're looking at pages that are basically unsourced (that is, nothing that a reasonable person will have access to), and you make a mistake, the problem will get sorted out when the pages start to get the benefit of auditable/verifiable sources. Even if you just get 95% of the merges right, the situation is probably still better.
I keep proposing upload size limits for newbies, but the idea hasn't gained traction. Perhaps it would not be needed in an environment of merging-on-upload, but I have my doubts that will work. Stuff that's just wrong is apt to be wiped away and then recreated an indefinite number of times. To my mind, it makes sense to force new users to reduce the size of their upload to something that they reasonably might take care of, or at least, gives them enough to work with that they can start to understand what working on werelate is really all about. If they stay the course, accumulate a bunch of hand edits, and then later want to upload larger stuff, they could ask to get their limit size bumped. Anyway, if the merging-on-upload takes too much longer to arrive, perhaps we can persuade Dallan to take an intermediate step which (I think) would be easy to do, while protected the harried group of active merge volunteers from being overwhelmed by new redundant content.--Jrm03063 10:19, 5 January 2009 (EST)

Merge-on-upload and a merge-review screen with an "unmerge" button are my highest priorities right now. Both projects have been started and should be done by the end of the month. I'm not sure why we've had so many new users this past week. The Allen County Public Library held a class on WeRelate, but that's the only thing I'm aware of. And I did put in upload limits for newbies - it's just that the limit is 5,000 people. Even with that limit I still get people disappointed that they can't upload larger trees. If the merge-on-upload doesn't work out as expected, I'll reduce it further.--Dallan 15:23, 5 January 2009 (EST)

Well, great!
BTW, will the {{source-wikipedia|<wikipedia page>}} template be refreshed more often than quarterly? I do like more immediate gratification...!!!--Jrm03063 15:39, 5 January 2009 (EST)

I plan to update the source-wikipedia templates on a weekly basis, but it has to wait until unmerge, merge-during-upload, and gedcom export are finished. I'm hoping they'll all be done by early February. Then I can modify the wikipedia refresh to do a source-wikipedia search each week.--Dallan 14:46, 8 January 2009 (EST)

Dallan, as far as new users go, I subscribe to hard copy "FamilyTree Magazine" and they had a big article on WeRelate in the last edition, which included "how-to" pages for adding people, and such at WeRelate. So I am sure that is probably one reason for your additional traffic. --Kristy 14:10, 16 January 2009 (EST)
Thank-you for this information! Can you tell me which issue is it in?--Dallan 11:03, 18 January 2009 (EST)
Yes, it is in the March 2009 issue. --Kristy 08:17, 20 January 2009 (EST)

Merges lost when GEDCOM updated? [18 March 2008]

Here are questions I have after working 4+ hours doing a HUGE merge (which is still in progress). There has been discussion about the ability to "re-upload" an existing GEDCOM to update it with new information obtained. And the way to do that would be to match the Reference Number created by the GEDCOM itself from past and present uploads.


  • Say I merger PersonA from GedcomA uploaded by John Doe into PersonB of GedcomB uploaded by me.
    • Once completed, PersonA doesn't "really" exist, but for a redirect.
  • Meanwhile John Doe is busy at home adding a bunch of new goodies to his database. And in fact puts several notes (that will show up in the "history" section) of PersonA. Along with some new events as well
    • He then uploads a new improved GedcomA to replace the now outdated one here at WeRelate.

What now happens with PersonA?

  • Will John Doe's notes overwrite the redirect located in the "history" section?
    • if yes, thus cause a bunch of orphaned pages for his spouse and children?
    • if no, what happens to his new goodies he found and entered there?
  • Will any new "history" information entered at home in a database and then included in future uploads, in fact erase every bit of the merge information I worked these past 4+ hours on, where I merged PersonA into PersonB.
  • What happens to the pages where I merged B into A (which I did if personA had more events that PersonB). Will that effect links with pages?
  • What happes to new events added by John Doe to PersonA, will they not be available since PersonA is a redirect?
  • I know for a fact my John Doe places Obituaries as he finds them in a section that shows up in the "history" section upon upload.

Maybe the "redirect" information should have a different place to be placed? But that would not totally solve the problem. I feel there is still potential of loosing any new events John Doe may have added to PersonA at home in his database before an upload to update his GEDCOM --Msscarlet1957 11:45, 14 March 2008 (EDT)

In order to support GEDCOM re-upload, a new source citation will have to be added to each person & family in their tree. This source citation will contain a "permanent link" (URL) to the specific version of the page that they last uploaded. We'll add these sources directly to the uploaded GEDCOM and make this modified GEDCOM available as a download. People will have to import this modified GEDCOM back into their desktop genealogy software in order to be able to re-upload their GEDCOM later. Since we'll add the sources directly into the uploaded GEDCOM file, there shouldn't be the information loss that you usually get when going from one GEDCOM format to another.

People will have to import this modified GEDCOM back into their desktop genealogy software in order to be able to re-upload their GEDCOM later. I am going to be one with a big problem with this part. There is no way I can import a GEDCOM from WeRelate back into my TMG (The Master Genealogist software). At present your software (or anyone's software for that matter) does not support all of TMG's abilities. The main difference being the ability to add a "witness" or Witnesses to any tag. For example: a 1920 census I add the information to the Mom and the Dad of a family, with "principle" rolls in that event. Their children are added with "witness" rolls, and the sentence structure for their participation for that event is totally different when a report is created to be published. For Example: "Ralph John Kuhn appeared on the 1910 Federal Census of Hopewell twp., Seneca Co., Ohio in the household of his parents Daniel Charles Kuhn and Lillian Sophie Kuhn." Whereas the Principles to this event have this sentence: "Daniel Charles Kuhn and Lillian Sophie Kuhn appeared on the 1910 Federal Census of Hopewell twp., Seneca Co., Ohio, enumerated 06 May 1910, renting their home. They only have two children at this time: Ralph and Gertrude. Lillian lists she is the mother of 3 children with 2 living. They are living next door to John F and Victoria Kuhn" Once I upload a GEDCOM to WeRelate, the principle information is there but all witnesses are lost. This does not bother me, because it happens anywhere I go. However I am unable to "import" my own gedcom back into my program. Instead, as I am doing this huge merge, I am hand entering any additional information I find in John Doe's file into my own database, on my machine as I go along. Anyone that uses TMG will also be unable to import their own GEDCOM from WeRelate, if they had any witnessed events.
I don't think I was clear on this. The only change that we would make to the GEDCOM you uploaded would be to add a new source citation to every individual and family. Otherwise it is exactly the same GEDCOM that you uploaded. We're not talking about doing a GEDCOM export from the wiki pages here; we're talking about modifying your uploaded GEDCOM directly - inserting source citations but otherwise keeping everything else the same (precisely to avoid the problem you mention). There would be a function for you to download your modified GEDCOM that would be separate from exporting a GEDCOM from the wiki pages. Most genealogy programs (I assume TMG is included) can export a GEDCOM and then re-import that GEDCOM without losing any information. And it shouldn't be that difficult for us to process an uploaded GEDCOM and insert new source citations but otherwise keep every other line the same.--Dallan 12:07, 18 March 2008 (EDT)

So when the person re-uploads the GEDCOM, we'll know which pages they're updating, what each page looked like when they last uploaded their GEDCOM, and what each page looks like now by following any redirects to get the current version. Using this information we can determine

  • exactly what the uploader changed on each page by comparing what the page looks like in their new upload to the version of the page that was current as of their previous upload,
  • exactly what changes others have made to the page by comparing the current version of the page (which may have a different title if the page has been redirected), with the version that was current as of the uploader's previous upload.

If the changes made by the uploader are to different fields than the changes made by others, we apply the changes made by the uploader to the current version of the page. Changes made by others don't get erased; changes made by the uploader show up as changes to just those specific fields. The uploader must now download the new GEDCOM with source citations containing permanent links to the now-current versions of each page.

Suppose the uploader and others modify the same field. There are two ways we could go with this; I'm thinking about going with the second:

  1. We don't automatically modify these fields, but send the uploader an email telling them about the conflict (i.e., the changes they made and the changes made by others to the same fields), and asking the uploader to modify the conflicting fields by hand.
  2. Instead of modifying the field, we add the uploader's conflicting edit as an "alternate" piece of information, and send the uploader an email telling them about the conflict and that we added their change as an alternate.

Another issue is what happens if the new GEDCOM doesn't contain all of the person & family pages that are in the tree. Rather than trying to delete the missing pages, I'm thinking we should send the uploader an email with links to all of the pages in their tree that weren't in the newly-uploaded GEDCOM, and let them decide if they should be removed or not.

I think this covers all the bases. Does this answer all of your questions?--Dallan 01:41, 17 March 2008 (EDT)

I think that at least I can see that the information would not be lost IF John Doe where to follow the directions, but I do not see that happening either. At WorldConnect when you want to update your gedcom, you just check the box that this is an update and all works like a whiz, no additional efforts needed. There needs to be more effort made to cause WeRelate to be "easier" not more difficult that other websites.
So I see two problems ahead in your process:
  1. Any member using TMG will be unable to ever update their GEDCOM, they will have to delete their file and upload again, as is set up now, and thus loose all set up links to photos uploaded into the image section. And any changes made during any merges.
  2. Members may not want do updates because of the complicated process.
I know you are working very hard, Dallan, and I appreciate that. I really think WeRelate is a great site. Maybe somehow, someway there could be some other way to implement the update process? --Msscarlet1957 10:04, 17 March 2008 (EDT)

I think I understand where Dallan is going with this. Matching an arbitrary GEDCOM against the huge universe of werelate is really impractical. Trying to make a program smart enough to know what is a good enough match and what isn't is essentially unsolvable. What can be done though, is to attach a source reference that tells werelate specifically that a person somewhere in a gedcom absolutely is a certain person in the werelate universe. Generally speaking, the easy way to get your home system in sync w/werelate would be to obtain a fresh werelate GEDCOM download, which will have the appropriate tags in place for all the people - but you wouldn't have to. I presume that the "werelate" designator source/tag will have a format that allows you to directly enter it into your home genealogy program where appropriate.

There is a place for the sort of guessing/probable matching in werelate - it's when we have a feature that allows the system to browse for potential matches in the werelate universe. That does not result in automatic merging though, but instead, in a set of candidate matches that the next researching coming alone can review. If the human is persuaded by the match, then the human can perform a merge or request a default merge procedure. But combining detection with actual merging logic seems to me extremely perilous (take a look at ancestry.com's "one world tree").

I appreciate that it's not totally "hands off", but it's going to yield a far better data base.--Jrm03063 13:19, 17 March 2008 (EDT)

Yes, you could enter the source citations into you desktop genealogy program yourself. They'll be a human-readable citation with a URL in the citation text field. But with the ability to download what is essentially the same GEDCOM that you uploaded except with source citations added, you shouldn't have to.

As Jrm03063 points out, matching is problematic. Even if we're 99.5% accurate on matching re-uploaded people to previously-uploaded people, it means we'll either incorrectly-match or not match 25 people in a re-upload of a 5,000-person GEDCOM. That's too many.

There is another approach we could take. Some desktop genealogy programs store a unique identifier (UID) for every person. This identifier is included in the GEDCOM's they export, so that a person has the same UID in the GEDCOM file every time. If the GEDCOM includes UID's, then we could record the person's UID and the page version with which it is associated, so that the next time you upload a GEDCOM with the same UID's we could know what page versions to match. The problem is that only 42% of the people that have been uploaded to WeRelate to date have UID's. But when UID's exist, they could potentially be used in place of downloading a modified GEDCOM.

One advantage of downloading a modified GEDCOM is that you could share your modified GEDCOM with a cousin, and if they incorporated your GEDCOM into their genealogy and then generated a combined GEDCOM to upload into WeRelate, the system could recognize that some of the people in their GEDCOM already exist at WeRelate and they wouldn't have to go through the match+merge process for those people. The system would just apply whatever changes they had made to those people, just as if you were re-uploading your GEDCOM. If we were instead relying upon UID's, when your cousin uploaded their GEDCOM they would probably have to go through the match+merge process for the people that were also in your GEDCOM, since I don't know if we could assume that the UID's would remain the same your cousin's GEDCOM.--Dallan 12:07, 18 March 2008 (EDT)

It's great to be flexible Dallan, but I think you'll make yourself crazy trying to support weird ID/UID stuff. Unless the value is from a reasonable third party (say an ancestral file number, or whatever the successor strategy may be) I don't think an id-based alternative (to the primary werelate url) is wise for trying to figure out who matches who. The url approach that you've mentioned is the sort of thing we want anyway, since a downstream consumer of the GEDCOM may very well be interested in the contents of the associated werelate page. Making the source do double-duty as a tag at re-import time is a really fortuitous coincidence that reinforces good practice. I don't think you want to clutter the story with identifiers that just can't work as well and (often) will not survive a merge. A url to a merged page will usually redirect to somewhere useful...--Jrm03063 13:40, 18 March 2008 (EDT)

Losing "Watcher" during a merge [26 April 2009]

I am merging pages today, and after the merge, the person who was watching the page I merged is NOT now listed with me on the new merged page. I don't want to lose connections. I think this may be a new bug, as it used to always carry both folks onto the merged pages. Or did I miss it somewhere that this action was dropped? --Kristy 21:52, 15 February 2009 (EST)

I'm pretty sure that the bug is that the list of watchers is correct but is not displayed correctly right away. The merged page still shows the pre-merge list of watchers for several hours after the merge. The problem is that the page gets cached at the server before the watcher is added. You can verify that the list of watchers is out of date by going to the URL line in your browser and adding "?action=purge" (without the quotes) to the end of the URL and pressing enter. This causes the page to be re-cached. Fixing this bug is on my ToDo list for next month. I'm sorry about the confusion it causes. In this does not fix the problem, would you please let me know?--Dallan 11:26, 23 February 2009 (EST)

The Same and Not the Same [26 April 2009]

During a spate of recent Merges, I noticed that items colored in Green are not always strictly equal. But the green boxes have no check marks and you cannot deselect the default choice to select one of the green choices.

In dates, especially, there are borderline significant differences. If I am merging two unsourced pages, and one says 1689, and the other says Abt. 1689 (these are considered equal), I want to pick Abt. 1689 every time as probably more representative of what is known (given no sources). But if it is not the one in the right column I cannot.

There are some other differences with names that occur. I know it is a common practice to capitalize the last name, but given that there is a separate field for surname in WeRelate, this shouldn't be necessary. So again, if the two names are John Smith and John SMITH, they compare equal, and I am stuck going with the one in the right column.

--Jrich 21:32, 25 April 2009 (EDT)

Agreed. This can also happen with sloppy place entering that the system has recognized as equal, but that doesn't look as nice or complete as the other option.--Amelia 12:31, 26 April 2009 (EDT)


Should WeRelate allow downloading GEDCOM's? [5 February 2009]

The question about allowing anyone to download the file needs some serious consideration. I'm concerned about 'harvesters' who gather lots of different charts and then post them as their own work without either checking for errors or giving any credit to the author. An advantage of the tree staying on WeRelate (as opposed to being downloaded by anyone) is that when corrections are needed they can be made on WeRelate where everyone can see them. But if someone else downloads the file and passes it around, if errors are in their downloaded version, they will be perpetuating the errors - they won't know of the corrections made later on WeRelate. I envision pros and cons on this myself so I recognize the need for serious debate and/or consideration of the subject of downloading while it is still in planning stage.--Janiejac 22:47, 13 September 2007 (EDT)

When the question first came up, I didn't see what the big deal was, but you have made an excellent point Janiejac. And yes, there are many pros and cons and even more questions to be asked now. --Ronni 04:13, 14 September 2007 (EDT)

WorldConnect has come up with a very good compromise protocol on this issue of downloading. It gives the author the options of allowing all to be downloaded OR only a couple of geneations, or something like that. You might check that out. Thanks for the serious consideration.--Dr. Bill 22:43, 15 September 2007 (EDT)

I hadn't considered Janiejac's point either -- I think it's a good one. Download isn't scheduled until around the end of the year, so we have time for more discussion.--Dallan 13:04, 18 September 2007 (EDT)

Could someone redirect a portion of this exchange to a new subject called 'downloading discussion'? This has sort of evolved from collaboration to downloading.


I want to keep the subject of downloading current and get others point of view on this while it is still in the planning stages. When I upload a file either to my site or to rootsweb or to WeRelate, I do upload all my notes and sources with it. I do believe in sharing and send anyone who requests it a register starting with the individual they are interested in and including notes and sources. But I don't give away my whole data base, notes, sources and all. I want interested folks to contact me with additions/corrections/suggestions and don't want to find all my data and notes posted on someone else's web site.

If I upload to WeRelate and it gets edited by myself or anybody else, I want to be able to download the whole thing back to my computer to continue to work offline. And I do like Rootsweb/WorldConnect's ability to designate just how much of one's chart can be downloaded. But the ethical question comes to mind - if others can add to or edit the chart - should that entitle them to download my whole data base? I'd appreciate input from others on this issue.--Janiejac 12:43, 29 September 2007 (EDT)

Allowing downloads of GEDCOMs is pretty essential, and an opportunity to boot. As has been observed, some folks like to be able to work on things off-line. Others perhaps want to take material to another system to generate different sorts of reports. I take the view that we need a symmetric capability - if you can upload a GEDCOM, you sure ought to be able to reverse the process. One of the reasons I've lost a lot of interest in ancestry.com isn't the expense, but the crappy GEDCOM they produce (and worse, they can't even fully re-import their own GEDCOM - how embarrassing). It seems that they've been intentionally inept in order to strand data under their proprietary control. The result...I'm looking for an alternative. Besides, if someone was really serious about massive harvesting of werelate data bases, they won't be doing it via GEDCOM, so they could probably do it right now.

A GEDCOM download is an opportunity, because a reasonable GEDCOM will be scattered with note/source links back to the werelate site. Skim an ancestry.com GEDCOM and you'll find dozens of links back to ancestry if the GEDCOM has any sources attached. One of the first things I think I would do with a werelate GEDCOM is to replace my ancestry data with a werelate GEDCOM. Then, if people are sniffing around my open tree and source information, they'll find their way to werelate.

The way that werelate gains credibility and preeminance isn't by taking a proprietry view of information, but by making it so totally accessible and free that there is no real advantage to getting it elsewhere. It's the wiki way. The information equivalent of if you love it set it free.--Jrm03063 14:38, 8 November 2007 (EST)

That's an interesting idea about providing links back to WeRelate in note fields embedded in the GEDCOM. We would have to do something like that anyway in order to satisfy the attribution requirement of our license. Please keep comments coming on this topic. We won't get to GEDCOM download until after match+merge, so we have some time to get comments from everyone.--Dallan 18:47, 8 November 2007 (EST)

I think downloading a GEDCOM is a very important feature, and should not be restricted. Even though I intend to do my work primarily in WeRelate going forward, I'd like to be able to download GEDCOMs for various reasons, including ability to put it into other software to generate various pretty-printed reports I can't do here, and as a "back up" of the work I do here. While I appreciate the various degrees of control that ancestry.com gives you when you upload a GEDCOM, there's a significant difference between WeRelate and Ancestry (or most other places like it). On Ancestry, when you upload a GEDCOM, it remains your tree. Here, when you upload a GEDCOM, it becomes your contribution to the ongoing wiki, which other people may add to, link to, correct, etc. From the moment you upload a GEDCOM here, it is no longer your tree, and it wouldn't make any sense for you to be able to dictate who could subsequently download it, especially after it has been enhanced by the work of others.

I appreciate the concern about careless people who might download your work, pass it around, and you lose the opportunity for updates. But I'm not sure we can solve the problem of careless people. :-) I for one keep track of where I got valuable information, and always like to keep in touch with those I've collaborated with on common lines. I think the suggestions that the downloaded GEDCOMs have back-links to WeRelate where appropriate are good ones.

That's my $.02. --TomChatt 01:56, 9 November 2007 (EST)

I've gone back and forth on this issue (i.e., no restrictions vs some restrictions). JRM's comment about embedded links to WeRelate is a very good idea. Tom's comment about "my tree" now being "our tree" needs to be reiterated because it is essentially what WeRelate is all about. That idea alone is one that I think still isn't completely understood when someone starts putting their data online here. I have observed that "misunderstanding" several times in the last few months. If we understand the concept of what is mine is now ours in regards to WeRelate, then restrictions on GEDCOMs would be few if any at all. --Ronni 04:38, 9 November 2007 (EST)

I agree with Ronni and TomChatt that the community aspect of the data on WeRelate demands that we have a Gedcom download. If the purpose of wikifying genealogy is to get the best information out there, we must have a way for it to get off of WeRelate into the "wild." But in order to keep supporting the mission of producing high-quality data, it is crucial that downloaded gedcoms be sourced properly. I imagine a download where the sources are all the source page "tites" on WeRelate. That would be bad. It would badly degrade the quality of source citation in any properly sourced database, and would create a tremendous amount of work to replace any links back to the WeRelate pages with the actual publication and date information that would allow me to locate the source. I don't object to links back to the source pages, which do contribute useful information, but the downloaded sources should be as complete as possible (using the fields filled out on the source page, I would imagine).

On a separate but related issue, what do we do about the licensing requirements, particularly if someone chooses not to download (or import) sources? Perhaps some explicit statements and instructions during the process about the attribution requirements if people redistribute (I know they can do this now, but it's going to be a much bigger problem once downloading is permitted).

And that reminds me of a technical issue we (uh, you, Dallan) need to be sure to solve -- imbedded links in notes that go to other places on WeRelate need to be rendered as full links that are intelligible when imported into a genealogy program. --Amelia.Gerlicher 14:11, 9 November 2007 (EST)

I'm thinking that a downloaded GEDCOM would include information from the Source/MySource pages on WeRelate as source records in the GEDCOM. I agree that we'll have to include some explicit statements on the download page about needing to attribute. We could put the attribution links to WeRelate on notes attached to each person/family, or on a source record that is cited by every person/family -- any thoughts on which is best? Your comment about turning embedded wiki links to HTML links is a good reminder -- I'll make a note of that.--Dallan 11:29, 16 November 2007 (EST)

Hi, new contributer. Beginer level genealogist. Consider this a comment from the man on the street.... Yes you should allow downloads. But people will need "help" to avoid pitfalls, whether it is an upload or a download. For example, I am one of those careless people who hasn't paid proper attention to how I entered information in our Family Tree Maker. My wife and I have bastardized our usage of the fields so that when I load it up into Werelate, data shows up where it should not. If someone were to download what I loaded they will have to sort through some strange stuff. I need to improve my discipline in managing info in the FTW. (sources, events, and notes fields) I also need to convince my wife that her approach to puting data where she wants is not going to work in the long run. (for example I can't get her to not put Rev. or DR. in the name field...)

I plan on maintaining my own database (FTW) as my primary repository on my home computer and "contribute" to Werelate by publishing what I want to share. (Probably everthing I have as I like to share) But, I will not use Werelate as my primary repository.

A page on gedcom file format and pro's and con's about how people have used genealogy programs incorrectly and the problems this causes as people get more invested into their data repositories would be good...if it doesn't already exist.

(PS I take back any negative comments about my wife, she just handed me tea and home made cookies...) PPs is there a spell checker?

Thxs --PeterP 18:48, 26 November 2007 (EST)

Hi Peter, one of our big challenges is going to be making the GEDCOM export good enough so that you can incorporate the new material that others have added to your tree into your home database on FTW, so that you don't lose what others have added. As you've seen with your GEDCOM, using the fields in FTW for purposes other than what they're for makes the GEDCOM output look funny. I'm not sure about the different oddities that typically occur, but feel free to add any of your observations to this page. And no, there's no spell checker, but Firefox has one built in.--Dallan 17:13, 4 December 2007 (EST)

My vote is a definite yes for allowing downloads of gedcoms; no restrictions. I suggest that you communicate this to new users when they register. Require new users to check a box that the user understands that gedcoms can be downloaded with no restrictions. There are plenty of sites with restrictions; not what I wish for this site.

I would also like the ability to download images or is this already possible? --Beth 10:45, 14 December 2007 (EST)

It sounds like the general consensus is that we should allow GEDCOM downloads. There's already a statement on the GEDCOM import page and on every edit page that "All contributions to WeRelate are released under the GNU Free Documentation License 1.2 (GFDL)." and that "Others can add to, edit, and redistribute your contributions." I just bolded the first part on the GEDCOM import page to highlight it. We could require people to check a box, but unless it becomes a problem it's not as high of a priority as other things.

You can currently download images (one at a time -- right-click on the image to save it to your local disk). Some images are uploaded under fair-use though, so you may not be able to do certain things with those images (possibly not upload them to a commercial site).--Dallan 00:07, 16 December 2007 (EST)

I think that is fantastic news Dallan. Glad to know that I can also download images. I hope every user understands the concept of WeRelate including the GNU Free Documentation License. Call me a pessimist but I envision some users getting upset about this or that and deciding to remove "their" tree as has happened on Ancestry and Rootsweb and probably other sites as well. I removed my tree from a site, but that was because I used the merge feature of their software and the file was so messed up that I gave up and removed it. Anyway just thinking that a statement in "plain English" may save some future woes. Thank you and all of your volunteers for your hard work and dedication to WeRelate.

--Beth 18:22, 17 December 2007 (EST)

I switched the bolding in the gedcom upload text to emphasize the phrase that describes what others can do with your contributions (add to, edit, redistribute) and added "download" as another specific possibility. Hopefully this will make things clearer.--Dallan 17:03, 18 December 2007 (EST)

Thanks Dallan,

I noticed an option to delete one's family tree in the FTE; can the user delete their family tree? --Beth 07:42, 19 December 2007 (EST)

Yes, you can delete the pages in your tree so long as nobody else is "watching" them. If another user is watching one of your pages (which happens if they add the page to their own tree, or if they edit the page and leave the "watch this page" box checked, or if they click on the "Watch" link at the top of the page), then that page does not get deleted.

A problem caused by this approach is what happens when you are watching one member of a family that someone else has uploaded, but have forgotten to watch the other family members, and the original uploader removes the tree. The page that you watched is still there, but the other family members have been deleted. I can restore them if this happens, but one of the things on the todo list for next quarter is a screen that will tell you where your "off-tree" links are -- pages in your tree that link to pages not in your tree -- and give you a chance to add those pages to your tree.--Dallan 12:08, 21 December 2007 (EST)

I am happy to share. However I am concerned about harvesters who then may put the money on for profit sites. It might be a bit friendlier to have the person just contact the submitter. That way they can make contact, chat and then share information as they wish.--Sheri 20:06, 5 June 2008 (EDT)

Hi Sheri, I am not sure that I understand your statement: It might be a bit friendlier to have the person just contact the submitter. Friendlier than what? What exactly happens if a harvester puts the money on a for profit site? The information will still be on WeRelate and people can view the information here with no charge. Please clarify your concerns. --Beth 22:10, 5 June 2008 (EDT)

Again coming late to the party after a long absence. I initially left WR because I could not easily update my uoaded data -- in part due to lack of gedcom download. But since then I've realized that I was expecting of WR something it is not. It is NOT a genealogy program. It's a wiki -- a place for community-edited content. As someone who works in a sector that is prone to mission-creep, I would encourage the WR to get very clear about t developing a clear and shared mission/purpose that then guides decision-making about such topics as downloadable gedcoms.

Another way of saying this is use the right tool for the right job. For me, anyway, WR is not the right tool for me to use to keep MY data updated, but it is the right tool for me to share and to work collaboratively woth others on areas of shared interest in order to improve the quality of genealogical info available to all.

From that perspective, downloadability of gedcoms has a different emphasis.

Jillaine 10:25, 2 November 2008 (EST)

Dowloading -uploading to update a Gedcom [10 April 2008]

All genealogy software is not the same. I can only speak for "The Master Genealogist" TMG as I am a user. For me to "import" a GEDCOM into TMG that I had previously uploaded to WeRelate causes me concerns.

  1. I do not upload my whole database, because it is so large (currently at 58,571 members). I break it down, choosing an ancestral surname, beginning with the oldest member of that family and including all descendants, spouses and parents to the spouses to include in a small GEDCOM segment. I then upload that segment to WeRelate. Since I am a descendant of each of these ancestors, I am in each segmented GEDCOM, as are many of my kin. This ends up creating duplication from one segment to another, a necessary evil. I currently have seven segments on WeRelate and plan on adding many more.
  2. TMG does in fact assign a UID to each person created (that is how I always know the number of folks in my database at a glance). TMG also assigns a UID dataset number. My main dataset being 1:1 thru 1:58571. When I import a GEDCOM all the folks from an import get a UID and a Dataset UID. so if I had uploaded to WeRelate my HELLER GEDCOM of 2703 members, and then have to re-import that same bunch, they will appear in my database as members 2:1 thru 2:2703. This means if I would import even my own gedcom all those people show up in my database as duplicates with new UID's. I feel this then creates total havoc in my database! Doing a merge within TMG would take me weeks of work to straighten it all out.
  3. I do not include all tags when I create a GEDCOM for upload. TMG has ability to create unlimited number of tags, beyond birth, death, burial, baptism, etc. TMG has the ability to filter out any of these which I do not want to include when creating a GEDCOM. I have created tags called "CorrespondenceIn", "CorrespondenceOut", and "Research" None of which is included when I upload. Then if I were to re-import that GEDCOM which did not contain all the tags, I would effectively be loosing all data in those tags.

Yes I could hand enter some code into each individual, but even that is unrealistic, as just ONE of my segments contains 2703 people, I have many segments I would like to place on WeRelate as I get them ready. At Rootsweb's WorldConnect, you just click on the name of the GEDCOM you want to update, and choose the gedcom to upload and it's done. Maybe you could create a small Gedcom to upload there, so you could see how the process works and maybe implement something similar at WeRelate. I believe they do use the UID's. I need something more user friendly. If I uploaded my whole Gedcom (once your system can accept such a large file) importation and then upload for updating would still be an issue, because of my filtered out tags upon Gedcom creation. --Msscarlet1957 15:31, 18 March 2008 (EDT)

I feel your pain. There are always data conversion/transfer headaches whenever one system is first hooked up with another. I agree that tagging your home data set would be a daunting process, even on a subset basis. On the other hand, you get a couple of important things for your effort - a very important source for your own data base and a way to safely interact with the werelate community data base.
Even an operation as large as ancestry/TGN, with all their resources, often makes quite a muddle of things. I've already untangled a few of their "one world tree" matches that worked their way into werelate. In fairness as well, I don't think roots web is really a reasonable comparison. I could be wrong, but my skim of their stuff suggested that they aren't really merging the uploaded GEDCOMs into a common data base. I think they are just saving them as discrete sets and offering tools that graze over them without actually creating a permanently merged result.
Maybe dallan can put in a restricted merge/upload that only merges those people that are correctly tagged and produces a report of the names seen but ignored? That might help you in performing a piecemeal update...
The other thing that should be stressed is that we all need to fundamentally change the way we've been doing genealogy. Your large data base is a major accomplishment, but how about a data set of 100K? 1M? More? As highly refined and capable a tool as TMG is, the model of individual research is fundamentally limited. There's just only so much of you to go around! Taking the hit of getting your data synced up with werelate isn't just a chance to transfer your data from system A to system B, but a chance to get lots of help and to help others. Your work also has a real chance to live on and be built upon. You can certainly protect your data from being lost to the world of research by tossing it over the wall to the LDS or world connect, but that's not nearly as "alive" as data that's under constant review and improvement as things are on a wiki.
Please, keep the faith!--Jrm03063 21:01, 18 March 2008 (EDT)

I thank you for your encouragement, Jrm03063, obviously you can read between the lines and see that I feel quite discouraged just now. In fact last night I actually sat in front of the boob tube and watched HGTV all night and could not even think about going to my computer after dinner. (I am one to easily work until midnight on my genealogy). Yes you are correct in your assumption that Rootsweb does no merging, but my reference was to the way they streamline their member's updating process, not the way they work in general. I agree that wiki's and genealogy are a wonderful concept, that is why I had been devoting 3-4 hours a day to improving and merging my pages here on WeRelate. But then we don't know for sure WeRelate will be around for any length of time, as it is so difficult to learn and use. As long as I am here on a regular basis, it becomes easier and easier, but if I am away from it for a length of time, I have to go thru the tutorials again. but that is another topic altogether. For now I am just not willing to "Take the hit of getting my data synced with WeRelate" I am not interested in importing any Gedcom, not even my own from WeRelate, that will cause me so much work and/or data loss. In fact I rarely import Gecoms at all, I open them in a separate TMG database, shown on my second monitor and decide from there who to hand enter, using my own entry conventions.
For now I shall continuing to upload gedcom segments, without thought of updating them, this will at least be "a chance to get lots of help and to help others". However, I have been unsuccessful in convincing any cousin to join WeRelate to collaborate, (which contributes to my discouragement). But I do invite them, usually one or two a day. --Msscarlet1957 09:56, 19 March 2008 (EDT)

I hadn't considered that people would upload only a portion of their GEDCOM, but in retrospect it makes sense. I really do want to make the upload+download process easy for people who want to continue to use their desktop genealogy program because I believe that probably half of our users will want to operate that way.

While writing this response I realized there is a flaw in my proposal. After subsequent uploads of a GEDCOM file, we can't assume that the person data in the uploaded GEDCOM is the same as the person data on the updated Person page, because the Person page might have information that you have not incorporated into your GEDCOM file. So to determine the updates you have made we'll compare your current GEDCOM against your previous GEDCOM. And to determine conflicts we'll compare the current version of the page also against your previous GEDCOM. If your current GEDCOM has a different value for a field than your previous GEDCOM, then you have updated that field since your last GEDCOM upload. If the the current version of the page also has a different value for that field than your previous GEDCOM, then we'll say there is a conflict, and your updated value will be stored as an "alternate" name/event instead of updating the primary name/event, and you'll get an email telling you about it.

We still have to associate each person in your desktop genealogy program with the page title that was generated for that person. We can do this in one of three ways:

  • Ask people to download a modified GEDCOM with source citations added and then re-import that GEDCOM into their desktop genealogy program. I'd like to encourage this approach because if you share the citation-enhanced GEDCOM with someone else, they'll be able to upload a combined GEDCOM without having to make match decisions. It also allows you to see from within your desktop genealogy program the URL of the WeRelate page associated with each person.
  • If your GEDCOM has UID's, we'll store the UID to page title mapping at the server (we're doing this already). Then when you re-upload your GEDCOM we'll be able to figure out which pages titles to update. The UID to page title mapping is used only if the same user is uploading the GEDCOM though, so if you share your GEDCOM with your cousin and they upload it, they'll have to go through the match process as if they were uploading a GEDCOM from scratch. To avoid this, you may want to download the citation-enhanced GEDCOM if you want to share something with your cousin. (Jrm03063, if we limit UID matching to just UID's that have been previously uploaded by the same user I think we'll be ok. The UID's are more or less 36-character random strings, so the possibility of accidental overlap is pretty low. We may even be able to consider matching UID's from different users someday, though I'm less certain about this.)
  • If neither of the above options works for some reason, you can manually add source citations with the WeRelate page URLs into your desktop genealogy program, although hopefully the cases where this is necessary are minimal.

This is off-topic, but as you have suggestions for improving usability, or if you find out what keeps your cousins from joining, please send me an email or leave a message on my talk page.

I was using the incremental GEDCOM upload procedure too, but only because there were a few lines that I wanted to flesh out with census records while I still had an ancestry subscription. I've allowed that to expire for the time being, and expect in the future only to download GEDCOMs for backup and reporting purposes. I don't expect to actually record any research off-line.

In fact, if I thought that werelate was going away, without a wiki alternative, I would probably see about putting together my own server to run it.

The only real home-based local stuff that I might do, would be something to keep track of the living. Of course I understand why we need to keep the living off a public genealogy system, but it's a bit of an aggravation to have to think about things in two layers. I have thought about what might be involved in creating a hybrid environment so that I could record information about the living and have it just reside locally on my machine (some sort of pass through for the folks who've shuffled off to werelate), but I just don't have time to do anything with that right now....--Jrm03063 17:09, 1 April 2008 (EDT)

I think the ideal would be a desktop genealogy program with a "synchronize" button that synchronizes the non-living people with WeRelate (upload changes you've made and download changes from the website) but keeps living data private. Someday I plan to write this -- it will make uploading data to WeRelate a lot easier than uploading GEDCOM's -- but not this year.--Dallan 09:45, 2 April 2008 (EDT)
Oh my, weren't you there for the "thou shalt not duplicate data lecture"? Maybe you gave that lecture!  :) Anyway, I actually think that the desktop genealogy program ___is werelate___. I think it's the same code base that you have right now, but with a flag indicating whether it's running stand alone or in distributed (local & backing global server) mode. If you write a special purpose program, at best you duplicate what werelate is already doing at the expense of an additional code base. Ick. It seems like all that should be needed is a piece of code that maps inbound URLs through a table such that locally known PERSON pages are served up and saved locally (maybe you just look for "living" in the person page name). Any page not known locally gets passed on to "actual" werelate. Mind you, I don't know exactly how to do this sort of thing, but it seems like something that ought to be possible... --Jrm03063 Wed Apr 2 10:09:01 EDT 2008
That could be done, but I think that there are also cases where someone will want to download their tree into a program that they can run disconnected from the Internet on their laptop, make changes in that program, and then upload the changes (and receive any changes made by others) when they re-connect. Not everyone will want to work that way, but many will. Eventually we'll need to make that process easier than downloading and uploading GEDCOM's.
Having said that, it would also be possible to modify the Family Tree Explorer to save living pages locally on your hard disk. If we do that, then it's just a question of whether the Family Tree Explorer always reads pages for non-living people from the server, or if it also had the ability to save non-living pages locally and synchronize them with the server, which would allow people to run disconnected occasionally.--Dallan 15:21, 3 April 2008 (EDT)
I don't think I disagree particularly. What I'm suggesting is a full, yet local, version of werelate. The local version would have some additional smarts to be able to sort out what is local, what is out on the net, what is shadowed on a temporary basis, what is maintained strictly locally (living people), etc. I know it's a tall order to try to figure out how to package all the necessary pieces up so they work nicely together, but it can't be worse than trying to rewrite a local version of werelate. Presumably, what you would want to do (with a local werelate) is to work in terms of wiki pages. That's going to need some level of wiki software support and I'll bet it becomes a slippery slope really quick. I quite agree that upload/download exchange based on GEDCOM is not apt to be very satisfying. Literally and figuratively, I think a lot would get lost in translation.
It's either write a desktop application with limited wiki support (i.e., the app probably wouldn't display tables), or modify the wiki software to make it possible for people to create pages for living people that weren't readable by others (or were readable by only certain people). That's also not a trivial undertaking.--Dallan 15:32, 10 April 2008 (EDT)
My thinking is that this should be able to be pulled together on Linux/Mac first. I've seen wiki servers for windows so there is hope...
The problem is every wiki software app is different -- they have different syntax, and the extensions I've written for MediaWiki would have to be rewritten. This is also not a trivial undertaking.--Dallan 15:32, 10 April 2008 (EDT)
BTW, have you considered dropping source archives on the digital library? --Jrm03063 Mon Apr 7 15:45:38 EDT 2008
Not sure what you mean by "dropping source archives"?--Dallan 15:32, 10 April 2008 (EDT)

GEDCOM download status [5 February 2009]

I know that there is a desire to see GEDCOM download as part of a download-upload-repeat process, but I'm not really interested in that cycle. I would like to be able to download my tree so that I can save and report the content using other software.

Where do we stand on GEDCOM download?

Thanks...--Jrm03063 16:38, 7 July 2008 (EDT)

The GEDCOM programmer has been taking a few months off while he finishes another project, but I think the other project will be done soon. If GEDCOM export were done about the end of the year, would that be too late?--Dallan 00:47, 15 July 2008 (EDT)

That's a little farther our than I was hoping for, but volunteer projects run at the rate they run. I was just sort of hoping there was something of a preliminary nature available, so that I could get a snapshot for backup and reporting purposes (I don't work anywhere but werelate).--Jrm03063 09:20, 15 July 2008 (EDT)

It is the end of the year. How is the gedcom download capability progressing? I also have been entering my work exclusively on WeRelate for my one-name study and now am concerned that this was a big mistake on my part and possibly I should reenter the data in my genie program and stop entering new information on WeRelate until there is a definite timetable established for the completion of this function. --Beth 19:17, 9 December 2008 (EST)

I'm sorry - due to the downturn in the economy I spent the last month working on another project. The other project is launched now, so I'm back to working on WeRelate. But everything is now a month behind. I've spent the last couple of days starting to integrate merging into the GEDCOM upload process as well as supporting GEDCOM re-upload, so that project is progressing. GEDCOM export should be ready by the end of January.--Dallan 17:56, 27 December 2008 (EST)

Dallan, Please note that I added my viewpoint to this GEDCOM issue as an advisory "Warning" below. BTW, thanks for your efforts -- I know you're getting it from all sides. --BobC 5 February 2009

Warning: For & Against Downloading [9 February 2009]

From the sound of the different notices and discussions lately, it sounds as if the download feature is almost ready for implementation. Not knowing what the final product will be, what limitations will be in put in place, or when it will become available, I feel I need to have my say and share with you a couple of my experiences to let you know why I feel this way. I acknowledge that contributors to the discussion above have made good points on all ends of the sprectrum, whether in support of total openness (i.e. full download capability by all users), limited availability (i.e. owner determined) or the status quo (i.e. no download access).

Basically, I am –
·FOR download capability
·AGAINST it being applied openly and liberally

I think the download capability here at WR will be an important function -- particularly to the "owner" (or should I say "primary developer") of any particular family site. Although in a true collaboration site, such as this proposes to be, idealistically everyone who shared in its content would share "ownership," and everyone would work in the collective for the betterment of the community. But as in any idealistic commune put into living practice, there will always be a mischievous malcontent in the group who will take advantage of the generosity and contributions of others for personal gain. Such is the case in the genealogy world -- be it privately or corporately.

Having been an avid family historian and genealogical researcher for over 30 years, I started this hobby in the early-1970s with pencil and notebook, voraciously copying everything I could find (either because photocopiers were not readily available late-1970s and early-1980s or I was too poor to use the somewhat primitive copiers that were available back then). These were trips to the National Archives, various libararies, churches, graveyards, courthouses, genealogical & historical societies, and relatives' houses to copy handwritten personal collections held by them. In ten years I collected what I thought was a massive amount of information by personal, firsthand, on-the-spot research. Notebooks became files, files became binders, and a few thin binders begat many thicker binders.

I got my first computer in 1985, and even before the LDS Personal Ancestry File (PAF) program became available for personal use at home, I spent many nights sitting in a local Family Research Center inputting data and saving it to my floppy disk. As soon as PAF was released I sent for it and spent many more months typing in much more data and saving it to my computer at home. Eager to share the results of my research, I contributed my collection of 5000 or so names to the LDS Ancestry File (or whatever they called it back then), hoping that I would benefit likewise from people also willing to share and collaborate on the same ancestral lines or collateral lines. When it first appeared on the LDS Ancestry File CDs (it was not made available on-line for a number of years) I could tell I was the only one who had researched and uncovered most of the data.

Unfortunately the LDS program was good at accepting basic vital statistical data; not too capable in collecting sources or references. Although I experimented with Dollarhide’s Everyone’s Family Tree (EFT) for a few years in the early 1990s for it’s creative organizational format and story-writing ability, I settled in on Buzbee’s Family Origin’s program primarily for its masterful ability to allow users to input sources. I spent years updating my sources on that program, page by page, name by name, fact by fact. When Buzbee was forced to abandon his creation, I followed him to his new program, RootsMagic, which retained and exhanced source recording and referencing capability. I still use RM today as my primary genealogical software program.

Going back to the early years, although I obtained a few bites of information from generous near and distant relatives in the first few years after my contribution to the LDS genealogical library, I witnessed many more unknown pirates taking the data as their own, not attributing the data to me or my research, and making it available on other databases as their own original work. Even today when I search for names at various website, I see my unattributed work, copied repeatedly unsourced and unrecognizable as my original data (except by me).

As I said, this applies to private collections as well as the corporate collections. On the private side, the few inquiries I make are usually unanswered. When answered the replies are usually not helpful, primarily because most of these people are name collectors, either uninterested in the source of the information or because it was not identified where they found it. On the corporate side, it’s even more frustrating. The genealogical corporate giants (you know who I mean) collect this information from individual contributors and from other corporate collectors and then regurgitate and sell the information through annual subscription memberships or through sales of genealogical collections on compact disk.

My point is – I take immeasurable pride in my work, take immense care in researching, recording and qualifying my sources of information, and want to share the information with others – but not without limits. I don’t want to make it easy for an individual or corporate "tomb raider" to pilfer my collection and try to sell it back to me and to my curious but unsuspecting relatives.

If you allow unlimited downloading capability here at WR without any internal control, I can guarantee you that the work of many sincere, honest, hardworking WR collaborators will be collected, edited, repackaged, and marketed for sale elsewhere without attribution to those that put it together. I would rather have no download capability here than to see it given freely and openly without controls.

That’s my two pence worth…

--BobC 01:51, 5 February 2009 (EST)

This is a concern that others have noted as well. One idea for addressing this concern is to add a source citation or a note to every downloaded page containing the title of the WeRelate page that it came from. We have to do this in order to comply with the Creative-Commons license terms (i.e., attribution of the original authors is required). If people remove this source citation, they're violating the terms of the license. We would make this clear during the download process - that the material can be used in any form, both commercially and non-commercially, so long as the WeRelate source citations are kept, and that "derivative works" of the material also be licensed under the same open-content license. Those two points are what our Creative-Commons license requires.

This is currently the approach I'm thinking about. Another benefit of this approach is that if the source citation or note isn't removed (I believe most people won't remove it simply because it would take a lot of time to remove it from every person/family), and the downloader publishes the GEDCOM somewhere, recipients will see the link to WeRelate and be able to view the latest version of the page, not just the static outdated one in the GEDCOM. A third benefit of this approach is that if a recipient later uploaded this GEDCOM to WeRelate, we'd be able to link the people/families in their GEDCOM to the correct pages in WeRelate.

Alternatively, we could say that only the author of the tree could export it, but this still doesn't stop people from copying the data by hand into their desktop genealogy program (or adding the pages to their own tree and exporting it). And if they copied the data by hand, they'd be much less likely to include the WeRelate source citation giving credit to the authors.--Dallan 13:17, 9 February 2009 (EST)

Naming Conventions

Naming conventions for wikipedia people [1 December 2008]

The lack of ordinary given/surname forms in the medieval period is a bit of a problem for our medieval merge effort. I don't really have a great idea how to solve it completely, but I think we can pass the buck to wikipedia in many of these cases. Many of these folks have backing wikipedia pages, and/or are named in wikipedia pages for their immediate relatives. The names that appear in those places are apt to be the most common forms in use in genealogical research. I'm therefore adopting the convention, at least for medieval nobility, that the werelate page will share the same name as a shadowing wikipedia page (or at least, use a name that appears somewhere in wikipedia if possible). The primary name appearing on the person page itself will also be a form that is close, if not identical, to the name used for the page. Other name forms for the person would follow as needed. Since I've only just started doing this, there are hundreds of person pages out there that don't yet follow this convention. Feel free to comment or jump in and help.--Jrm03063 12:00, 13 November 2008 (EST)

I think this is a great idea.--Dallan 22:05, 29 November 2008 (EST)

It looks like to me that this breaks the general naming convention by including titles (like "Duke") in the page title. 1) Is this right? and 2) How do explain when to use this rule? A year cutoff? Certain kinds of nobility? It can't be that all pages should be the same as their Wikipedia pages, because that will cause all kinds of havoc with modern pages (maiden names and the disambiguations come to mind as some problems).

I think the convention of using the wikipedia names absolutely makes sense for genealogy before modern and reliable given/surname conventions arise. While I didn't really like allowing titles to creep into the names, my thinking is that the names we see on wikipedia represent what the most active researchers expect to use, so we should adopt that. In any case, since I'm expecting to source the information directly from wikipedia, I think the shared page name makes that point more explicitly.

I'm not completely sure about the idea of renaming the page when modern naming conventions are present. On the other hand, the same rationale as above applies - the page name in wikipedia is what folks expect to reference the page by. We should probably stay consistent. Also remember - this is the page name for wiki purposes. It doesn't mean that other names can't - or shouldn't - be represented as valid alternatives. I believe the wikipedia name should appear as an alternative there as well - the primary alternative if there's no reason to prefer something else - but as an alternative name at least.--Jrm03063 23:54, 29 November 2008 (EST)

Since my current project is First Ladies, I can say that there are no naming conventions at Wikipedia, and therefore I don't think we can say that they are the names WeRelate users will "expect." Some pages use the married name, some use First Maiden Last, some use First First Spouse Second Spouse. I'm not sure any use the just the maiden name as genealogists expect. I suppose Wikipedia goes by how people are "known", which is fine if we don't have a better alternative, but where people actually do have a first and last name, I don't think we should be assuming Wikipedia knows better -- they have a different purpose.--Amelia 00:03, 30 November 2008 (EST)

Since this discussion, I've adopted the convention of limiting use of the wikipedia page name to people living before modern given name and surname conventions (essentially just medieval nobility). Still, maybe there's value in having a redirect, using the wikipedia page name, that goes to whatever our page name may be. --Jrm03063 14:32, 1 December 2008 (EST)

More for wikipedia people [1 February 2009]

I've been merging through our medieval people with a focus on attaching to wikipedia whenever there's a wikipedia page for the person in question. For each such page, I'm doing the following:

  • Rename the page to use the identical string as the page used in wikipedia. The name is about as good as any, certain to be relatively unique, and may very well be the name most commonly used by those studying medieval genealogy. It may also help in identifying direct werelate <-> wikipedia linkages, which may be important as we look at page management.
  • I add the wikipedia page as a source, if it has not already been done.
  • If the page has any sources worth salvaging, I toss them in unchanged as the title of an untyped source. See Person:William I of England (1) for an example.
  • Rarely do I preserve uploaded body content if such a page has any. The wikipedia content is probably better, and we really want research on such notables to be based on wikipedia and sourced over to wereleate - not separately duplicated on werelate.
  • Assuming that the page has no body content, or none worth saving, I create a body consisting of two template references. The first is a template that will be updated from wikipedia whenever the refresh job gets run. The second is a template that indicates that the above content is from wikipedia. See Person:Alix de Montmorency (1) for an example. With respect to the Template page Template:Wp-Alix de Montmorency, I don't bother adding the text from wikipedia. I rely that it will be refreshed at some reasonable time and move on.
Let's simplify this by having you add just a single template to the page: {{source-wikipedia|title of the Wikipedia article}}. When the refresh job is run I'll have it look for pages with this particular template and replace this template with the other two template references that you're adding now. The refresh job will also create the template page.--Dallan 22:05, 29 November 2008 (EST)
So you're saying that all anyone has to do is add that "source-wikipedia" template to the page, and the refresh will do the rest of the work?--Amelia 23:15, 29 November 2008 (EST)
Right.--Dallan 17:56, 27 December 2008 (EST)
  • Assuming that the page has body content that I need to preserve, I still create a template to collect the contents of wikipedia. Instead of locating the template reference in the body of the person page though, I place it as the body of the corresponding wikipedia source. For example, Person:Henry II of England (1).--Jrm03063 18:10, 19 November 2008 (EST)
You'll have to continue doing this the way you're doing it now. The {{source-wikipedia|title of the Wikipedia article}} template won't work inside a source reference.--Dallan 22:05, 29 November 2008 (EST)
Would it be possible some way to see an example of what these are going to look like when the content is refreshed? Is it just the text? Or the sidebars? The links? The nav boxes? Pictures?--Amelia 23:16, 19 November 2008 (EST)
As I understand it, we'll get the first section of text from the wikipedia article. Links in the wikipedia article will be modified so that they still work from within werelate (internal links get turned into external links). There won't be a complete copy of the article (unless it's a single text section), and there also won't be pictures. I think this is the same mechanism that's used to get place infromation from wikipedia, so it has the same limitations. Dallan originally told me that he runs the refresh about once a year, but I've now done a bit over 500 of these, so I asked him (nicely I hope) if he would run the refresh soon. --Jrm03063 10:27, 20 November 2008 (EST)
This is a good question. For Place pages, the refresh copies only the introductory text of the wikipedia article up to the first heading. But on the few wikipedia pages that I looked at, there is very little content before the first heading. So alternatively, we could possibly copy the entire article, but we still wouldn't include wikipedia templates or pictures. I've edited Person:Alix de Montmorency (1) to show the first approach, and Person:Simon de Montfort, 5th Earl of Leicester (1) to show the second. It might be helpful to compare these pages to the corresponding Wikipedia pages. I'll run the refresh soon; I've gotten a bit sidetracked for the past few weeks.--Dallan 22:05, 29 November 2008 (EST)
I prefer the short version. Some wikipedia pages are really long. We don't need to duplicate that level of detail when it's just a link away. Having that much text that's not really supposed to be edited seems like it might be confusing. The intro paragraphs are usually good summaries, and then the page can be edited to add the highlights if that makes sense. Alix de Montmorency, for example, may be short, but the rest of the Wikipedia article is genealogical information that [should] already be in WeRelate.--Amelia 23:15, 29 November 2008 (EST)
Of the two approaches, can we add copy constructs (the thingies that fit in the template) that grab the first section specifically and a separate one that does the whole thing specifically? I don't think there's a one-size-fits all approach to this, even though that's how I'm starting the effort.
By the way, I'm now up over 750 wikipedia source references. --Jrm03063 14:37, 1 December 2008 (EST)

I have now gone over 1000 wikipedia template inclusions. Overwhelmingly, these pages are related to medieval nobility. Such pages, prepared along the lines I previously described, have proved extremely helpful as a basis around which the extensive medieval duplication can be consolidated.

With respect to the question of the extent of wikipedia page inclusion, I think we need ways to explicitly select only parts of a wikipedia source article, or the whole. Neither situation will fit all circumstances, so we will need copy directives that support all or parts only of a wikipedia source page. I think the default however, should be the whole wikipedia article. Besides the situation of some very short wikipedia pages, and pages that lack convenient initial sections, we should encourage scholarship on such "wikipedia people" to continue to be based on wikipedia whenever possible. In any case, since this is the choice of a default behavior for a routine procedure, if inclusion of entire wikipedia pages proves to be overkill, the next refresh can be done more selectively.

I would very much appreciate it if the wikipedia refresh would go forward, so that the results can be reviewed.

Thanks...--Jrm03063 00:27, 14 December 2008 (EST)

I had to make some changes to the wikipedia refresh code and it's taking awhile to debug, but it's just about ready. I'm going to be out for a few days, but I'll start it running when I return on Thursday.

You can use {{source-wikipedia|wikipedia title}} as a short-cut if you want. If you add this template to your page, the refresh will create a template for the wikipedia text, replace the source-wikipedia template with a reference to this newly-created template, and will add a wikipedia-notice template to the bottom of the page.--Dallan 17:56, 27 December 2008 (EST)

I've seen the werelate agent kick on again today, and it seems to be chugging away at the wikipedia refresh. I think the results look pretty good, but it made me think a bit more about the previous discussion of including the initial section only (of a backing wikipedia person page) or the whole thing. It occurred to me that there was an underlying (and somewhat sensitive) question to discuss, that perhaps helps us with that. Put most simply:

  • How do we get the most bang for the buck, engaging pages (existing information AND active researchers, peer reviewers, editors, etc.) that already exist for wikipedia?

An even more sensitive question:

  • How do we discourage adding to werelate, information that properly belongs on an existing wikipedia page?

I offer the following suggestions.

  • We should generally include the complete wikipedia page - not just the opening section. This would serve both as an implicit discouragement against redundant information and provide encouragement to add information to wikipedia (instead of just werelate) when appropriate.
  • We should create some explicitly stated (yet general) guidelines about how to handle pages for people that exist in both wikipedia and werelate. Among these guidelines should be suggestions that any content should be considered for inclusion on the wikipedia page first. Only when information does not appear on the backing wikipedia page and is inappropriate for addition to the wikipedia page, should it be considered for addition to the werelate page.
  • The werelate include template, besides noticing that information is sourced from wikipedia, should point at the guidelines page for information on people represented in both wikipedia and werelate, warning against adding to werelate information that properly goes on the wikipedia page. werelate should be a way to engage and obtain peer review - not dodge it.
  • We should create a template, analgous to the various wikipedia page defect warnings, that allows specifically warning that a page appears not to follow the guidelines for a wikipedia/werelate common person page.

    --Jrm03063 13:24, 2 January 2009 (EST)
I don't have a holistic response to this as yet, but a few thoughts... first, the page I checked out to see the results of the refresh was Person:Franklin Roosevelt (1). Nice summary. But nothing on his personal life. And the backing Wikipedia article is immense. I would rather prefer to just see a summary with the link to Wikipedia if I want more -- particularly if this content is going to be static for six months between refreshes. (My minimalist would be happy with Person:George Washington (6)).
Another thought: it's been my experience working with the Presidents and First Ladies that it's the First Lady's page that discusses their family life -- how they met, when they married, what happened to the children. The President's page sometimes includes this information, but typically not in the same detail, and whatever their is is buried in the long article. Which leads me to a couple observations:
  • What is it we think won't be proper on Wikipedia that would be here?
  • What about cases where the information is not just in one Wikipedia article (i.e. family information), and where the best information is not in the correlating article?
On another related note, in my experience Wikipedia does a poor job with genealogy questions. The sources are usually websites of some kind, often unsourced themselves and sometimes the type of individual person's website that we expressly discourage use of here. Dates are rarely sourced to anything remotely primary source material. So it does seem that there's an opportunity here to help Wikipedia -- and conversely we need to avoid relying on it too heavily as something more than a placeholder source.--Amelia 13:58, 2 January 2009 (EST)

I don't have definite answers, but here are a few thoughts:

  • In addition to the introductory section of an article, you can include other sections of the article by creating additional Template pages, one for each section of the article you want to include. The Template pages should have a {{copy-wikipedia|article title#section title}} (note the section title following a pound sign) template at the top of the page. Then reference these template pages later in the text of the Person page. See Place:Aichi, Japan for an example. The section title must refer to a level-2 section header in the wikipedia page, with two ='s on either side of it.
  • You can include templates containing text from multiple wikipedia pages on the same WeRelate page. So you could copy a section on family history from the first lady's wiki page and reference it in the president's wiki page (as well as the first lady's wiki page). Just add multiple wikipedia-notice templates to the bottom of the page - one for each wikipedia page that you're copying information from.
  • The trick of using a "#section title" to refer to a particular section of a wikipedia page works in the copy-wikipedia template but not in the source-wikipedia template. If you want, I could add this.
  • Even if we didn't copy the entire wikipedia page into WeRelate but just copied certain sections, we may still want to encourage people to add information to the wikipedia page, say under a "History" section, rather than to the WeRelate page, and then to include the history section in the WeRelate page. We could tell people how to do this with a link to a "guidelines" help page in the wikipedia-notice template. The guidelines page could explain how to use the source-wikipedia, copy-wikipedia, and wikipedia-notice templates and could encourage people to edit either the introductory or History sections of the wikipedia page if what they wanted to add was appropriate for Wikipedia. I have to admit though that I'm not sure what kind of genealogical information would or would not be appropriate for Wikipedia. We would need to find this out eventually.

--Dallan 15:46, 3 January 2009 (EST)

BTW, the Wikipedia refresh finished over the weekend.--Dallan 15:23, 5 January 2009 (EST)

How does it do that? [3 February 2009]

I noticed after the latest refresh that Fear Brewster's wikipedia content has a link to the WeRelate page on her husband Isaac, but has a Wikipedia link to her son Isaac Alterton Jr., who is also on WeRelate here. The template shows no human intervention. How did it find the husband's page automatically, and is there a way to get it to recognize the son's in a way that does not require repeatedly editing the template after each refresh?--Amelia 18:25, 1 February 2009 (EST)

Cool, huh? I think the agent is able (at least sometimes) to look at the set of wikipedia page inclusions, associated with werelate pages, and figure out the association. You'll see the same behavior on place pages, but there are, apparently, some situations where it can get faked out.
I'm presently working on a wikipedia usage guideline, that specifically talks about creating such wikipedia to werelate page associations, so that the behavior you observe will happen systematically.--Jrm03063 18:57, 1 February 2009 (EST)

I know why the Isaac Jr isn't connected - there's no inclusion from Wikipedia that lets the Agent realize that one person page is associated with another. Also, the association of William Brewster probably didn't take because the wikipedia reference points at a wikipedia disambiguation page. I'll fix that on wikipedia so that it gets done right next time.--Jrm03063 19:01, 1 February 2009 (EST)

Here's what happens: The system uses the copy-wikipedia template in Template:Wp-... pages to create a correlation between the Template:Wp-... pages and wikipedia pages. It also uses the Wp-... template reference within a Person or Family (or Place, etc.) page to create a correlation between the WeRelate pages and the Template:Wp-... pages. These two correlations are then used together to convert links in wikipedia text into WeRelate links.

A source-wikipedia reference within a WeRelate page creates entries in both correlations at once, because it causes the Template:Wp-... page to be created.

The reason that Person:Isaac Allerton (2) isn't linked to from the wikipedia text for Person:Isaac Allerton (1) is because Person:Isaac Allerton (2) doesn't contain a Template:Wp-... reference. It would need one (or a source-wikipedia reference) in order to create the necessary correlation.

And if the Wikipedia article links to a disambiguation page, or if the Template:Wp-... links to a disambiguation page (which can happen if the wikipedia title used to be a real article but was later changed into a disambiguation page) the correlation is broken.--Dallan 22:51, 3 February 2009 (EST)

Wikipedia People Page Naming Conventions [7 January 2009]

Folks - do we agree or not agree that, unless there is a good reason to the contrary, a person with a wikipedia page should have their werelate page named identically with the wikipedia page? It's a convention I've been observing for over 1000 people in the medieval nobility space, and I've recently seen folks paving that over here and there. If there's a better idea out there great, but we ought to discuss it...--Jrm03063 16:36, 6 January 2009 (EST)

As a general rule, absolutely do not agree. It doesn't work for women or people with common names (who have disambiguation info after their names). If you want to state the rule as, a person with no last name or a person born before X or nobility, then that's fine. --Amelia 21:48, 6 January 2009 (EST)

Ok, I'm generally talking about folks that are medieval nobility, prior to modern naming conventions, where the werelate conventions as they are don't work anyway.--Jrm03063 22:25, 6 January 2009 (EST)

I'm fine for using it for people without surnames.--Dallan 14:46, 8 January 2009 (EST)