ViewsWatchers |
[add comment] [edit] Merging[add comment] [edit] Merging trees? [12 March 2008]Anybody else undertaken merging of trees? I've recently been doing manual work on the Slafter family of colonial New England, creating pages by hand. But I discovered that just a couple days ago, somebody did a GEDCOM upload that duplicates a portion of the tree I'd been working on. I looked at the Help topic for merging pages, and have been following that procedure. But I'm realizing that it's quite complicated if you're talking about whole dup branches of a tree, and not just a duplicate page. When you start redirecting pages that are in a tree, then you start getting disconnected bits of tree and "orphan" pages. I ended up making a manual inventory of all the pages with the same names, and which ones needed to get merged/redirected with which. Not for the faint of heart (nor for the non-methodical)!
Also, most of the GEDCOM uploads I've noticed are associated with User pages that don't exist. Does that mean they've un-registered from WeRelate? Or does that just mean they never bothered to create a "user home page"? Is it okay to create their User/talk page in order to leave a message there? Will they get notified? ---TomChatt 05:02, 29 September 2007 (EDT)
A follow-up to Amelia's comment, and perhaps by way of clarification, I think that bad/unreliable/flawed sources absolutely have a place on our pages - and an important one at that. They should be cited and noted as bad/unreliable/flawed and, ideally, the research that established them as such should be noted. Otherwise, folks just keep rediscovering previously discredited information. This is particularly true of folks like me, who don't have a huge background so we don't instantly know that some sources are not trustworthy. For example, I understand that the Mayflower passenger Peter Brown is the origin of a large number of discredited genealogical lines. At some point or other he was attributed as having had a son. This was later discredited, but chaos has remained in this area for something beyond 100 years. This site in particular, provides a good chance to document both the accepted and discredited research.--Jrm03063 14:48, 21 February 2008 (EST)
[add comment] [edit] Asking Permission [23 December 2007]While we are on the topic of merging trees, a new user asked me if we should ask permission from the other user we want to merge with before actually merging into their tree. What's everyone's opinion on this? All manners and politeness aside, you could wait weeks or months for a response or never get "permission" to merge with another tree. This particular topic goes along with the "Downloading GEDCOM" topic as well, in that I see it as understanding what happens to your data once it's put online at WeRelate. I realize this has the potential to be a touchy issue with some, so I'm curious as to everyone's thinking or understanding on this. --Ronni 10:50, 17 November 2007 (EST)
I vote "no" as well, but I do as a practice look to see if the User I'm about to do a "major" merge with is active on WR by looking at their contributions. --Ronni 03:49, 21 November 2007 (EST) I too vote no. The whole point here is collaboration and there can't be collaboration as long as there are duplicate trees. --Trevorallred 14:40, 23 December 2007 (EST) [add comment] [edit] Merge Strategy [10 April 2008]Here's another question along these same lines: merging two overlapping trees might involve merging hundreds (or even thousands) of individual person and family pages. With lots of pages to merge, most people aren't going to take the time to analyze each pair of pages to merge very carefully. So we need to have a pretty reasonable "default" merge strategy. What should that strategy be? For the text we can put the text from one page after the text from the other page. For the events we can list differing birth/marriage/death events from one page as "alternate birth/marriage/death" events on the merged page. Similarly for differing names -- one name can be listed as an "alternate" name. But which events/names should be the "main" events/names and which should be the "alternate" ones? I can think of two possible approaches; maybe there are more?
Any thoughts?--Dallan 22:29, 17 November 2007 (EST)
I think that any technique for merging John Doe (i) and John Doe (j), must preserve all the information from both "i" and "j", and must clearly indicate that it is an automatic merge (and the provenance of the contributions). What parts are more "believable" or better sourced is going to be easy for a human to understand but pretty tough to make a program understand. Better to be sure you lose nothing and hope a human will clean things up. Besides merging existing people, I was also wondering whether there is a way to perform a less than complete GEDCOM upload (thereby avoiding the need to merge common individuals). Can we imagine a reasonable UI that would break GEDCOM import into a two-step process? Instead of a one shot load the whole batch, a two step process that would build up a list of names (from the GEDCOM) that already appear to be present on werelate? The user would then be free to pick whether those names are uploaded as new individuals or whether the existing werelate individual is substituted for a particular person.--Jrm03063 18:44, 18 November 2007 (EST)
I think it's also important to remember that the problem of merging two overlapping trees actually is two problems: matching, then merging. It's much easier to first match the names, then queue them up for merging. At that point we "could" have the computer do the actual merging using some really good hueristics that Dallan has already eluded to. I think it's impossible for the computer to do the matching automatically. There are simply too many variables at this point to trust a machine to match.--Trevorallred 14:53, 23 December 2007 (EST) I think that matching on a tree-by-tree basis probably isn't a great idea. It's not "work efficient" (for you CS folks out there). Matching shared genealogy is the heart and soul of what werelate is about, and I'm struck that it should be sort of fundamental to the way werelate works. What if we think about matching as more of a re-indexing process, where a set of match candidates is associated with any person or family. Whenever a person or family changes, it gets marked as needing to be recomputed for purposes of matching. When the match index for a person is recomputed, the previous match set becomes a starting point of families and people to rule in and out first. People who drop out of the match set or get added to the match are themselves considered changed so they are marked for being matched again. Of course this means that matching is a continuous and ongoing process, and that changes will have the effect of generating work for the matching engine (or robot, or whatever you call it) - but so what? That's what werelate is here to accomplish. It also means that matching work needs to be queued in a way that prevents any single cluster of related names from hanging up the match robot so that other areas of the werelate data base go begging. Maybe some sort of oldest unindexed page first process...--Jrm03063 17:28, 4 January 2008 (EST) As individual people and families change, we'll certainly match just that one person/family, and that will be a continuous process. When I talk about tree matching, I'm thinking about GEDCOM uploads. If you upload a GEDCOM containing say 2,000 people, and 200 of them match someone else's tree, I don't think you want to be presented with 200 different match questions one at a time. It seems that it would be a better experience to present it as a two-step decision: e.g., "200 people appear to match tree A, and 50 people appear to match tree B". In the first step you click on the link that takes you to the list of matching people in tree A, and in the second step you decide which of the 200 match candidates from tree A you want to merge with. As you're checking boxes to determine which pairs of people to merge, the system let's you know which of the remaining matching pairs are related to people you've already decided to merge and which are not. So for GEDCOM uploads, it seems that making the matching decisions up-front will be a better experience than making them one at a time.--Dallan 11:34, 7 January 2008 (EST) As long as we're talking about GEDCOM upload in particular, I completely agree, a two step process on a per tree basis is essential. I can imagine, for example, if everyone who has a Stephen Hopkins mayflower line were to upload it, after someone had gone to the trouble of creating a really comprehensive and nicely done Stephen Hopkins page - ick! My remarks were in the context of looking for matches within existing data as it is changed and updated in ordinary use.--Jrm03063 12:28, 7 January 2008 (EST)
That's a good point. We ought to have a "disregard my information" merge option for both individual-matches and also for tree-matches.--Dallan 18:20, 8 January 2008 (EST) Regarding merging, at least in respect to an imported GEDCOM, I think the submittal process ought to require the person to identify one person in the GEDCOM as an existing person in werelate. Then follow the relationships stored in the GEDCOM to match up the others. You can't rely on name matches or birthdate matches. Only persons connected to the anchor person get merged. Dangling trees are ignored. What if someone's GEDCOM has no matches? They manually enter the anchor person, then import. Conflicting data could be scored, though how you score between two different sources is beyond me. A source that is an ancestral file or ancestry.com is hardly a source since half of those are people's opinions, not reflections of real sources as indicated by the number of people out there propagating known errors. Precision might be a valid criteria, for example 24 Mar 1789 might be allowed to replace 1789 though it is not as clear if they are inconsistent as in 24 Mar 1789 and 1792. So I believe extra weight goes to the first comers. If you want to change the data that is there, you have to do it manually. Not that first comers necessarily represent better data, but it ensures thoughtful overwriting. To err is human, but to really screw up requires a computer. It means more manual editing, but I think that is necessary to avoid the damage caused by somebody importing a GEDCOM they downloaded from who knows what website or similar scenarios. The goal is accuracy first and as all experienced genealogists know, accuracy is not easy nor straightforward. --Jrich 13:33, 4 April 2008 (EDT) Before we ask people to identify the anchor person manually, I'd like to try an idea I have on finding anchor points automatically. I'll hopefully have something ready to try by June/July.--Dallan 11:48, 7 April 2008 (EDT) I don't think I envy you your task. Within the scope of a single person, I have much difficulty thinking that a computer can do a reliable match. However you choose to weight the different facts when looking for a match, it will be wrong in some cases. And that's assuming the GEDCOM has enough facts in it to start with. How many times have I seen where many individuals are represented by no more than a name? Or the same person called Mary here, Polly there. Or a town with four grandchildren all named after the same honored grandparent and all born in a very short timespan. So there is not much hope unless you take into account earlier and later generations. Then it becomes far more likely to be accurate. However, remembering that many disagreements are exactly over who the parents were, or how many children, etc, will this work? What if the GEDCOM being merged has a string of 10 generations but right in the middle of it it has different parents from the person being merged with. So now does the computer create alternate parents and now reanchor the remaining subtree to the newly proposed parents? If the new parents are brand new to the database, then does that imply all their ancestors are new too? Maybe this GEDCOM is proposing a heretofore undocumented parent between matching grandparents and grandchildren. This argues for the previously discredited (by me) person-by-person matching. How do you decide what scope to use? I'm almost inclined to suggest that you "punt". The secret weapon of your website is time. Over time the data will become better and better quality. The potential damage of computer-generated mistakes will get worse and worse. Speeding up data entry is not necessarily the top priority. (Enabling collaboration to arrive at a higher quality of data than is achievable by oneself is, IMHO.) The facts we are entering are no longer changing so there is really no rush to enter them. Over time there will be less data entry and more comparison anyway as the database gets better populated. Computer-aided data entry probably means the user has not taken the time to see if their input is needed, nor have they discovered that, "Heck, look at this! Somebody has some information I wasn't aware of! Who would have thought that was possible?"
Addendum: example: I created Mary Wheeler-134 the other day. If I search for the given name Mary and surname Wheeler in Namespace People and Families, I get 1012 pages. If I add Person: Mary Wheeler to the keywords I get 662. If I put "Person: Mary Wheeler",) i.e., with quotes, I get 113, which is probably how many Mary Wheelers there currently are remaining in the system. On the list of the 113, the displayed blurb shows no useful information except for her name in about half the cases. Quite hard to tell if any of them are the one I want. 20 years from now, how many Mary Wheelers will there be? --Jrich 14:36, 7 April 2008 (EDT) Perhaps a separate topic, but using the "Browse Pages" function and comparing the other 133 or so Mary Wheelers isn't going to work. It's been discussed in the past about using date ranges in the title to help make distinctions between Persons of the same name and since I use the "Browse Pages" feature quite a bit, I'm inclined to agree more and more that we need to come up with a better way of quickly identifying the Mary Wheelers we are really interested in. --Ronni 19:49, 7 April 2008 (EDT) This probably does go somewhere else, but it builds on the previous comments. The Browse Pages does help. It is non-intuitive that browsing is more focused than searching, but I guess that part of the learning curve. The titles are still a problem. It doesn't seem like it would be hard on the browsing page, assuming it is a most common problem there, to insert something to take the returned title, recognize certain namespaces in the title (i.e., Person:), dig into the page and build a more descriptive replacement. Although maybe digging into the page would be too costly? It would be nice if the internal link button on the edit page caused a popup version of the Browse Pages page so you could search for and select your link instead of typing it. Am I just in need of more learning here?
--Jrich 12:44, 10 April 2008 (EDT) The new search functionality will have a "match" function that will return results in relevancy-ranked order, and the search result list will include data elements like birth date&place and death date&place. I'm working on this now. Matching is do-able, it just takes time to develop. Several years ago I worked on a matching algorithm that found 95% of the possible matches and picked the correct match 95% of the time. There will always be cases where the computer guesses wrong. Making the final match decision does not play to a computer's strengths. What a computer is good at is bringing the probable matches to a human's attention, which can significantly reduce the amount of time you have to spend searching for them yourself (unless you want to, which the new search functionality will allow). Once we get the new match functionality working, I'll list probable matches when people try to to add new Person or Family pages so that they can choose to link to an existing page rather than create a new one.--Dallan 15:32, 10 April 2008 (EDT) [add comment] [edit] Trouble ahead? [25 February 2009]I am all for merge but I can't help but wonder how all this will work in the best interest of WeRelate and keep users happy. I feel the idea of merge is to MERGE.. and not to be picky about who gets merged or not into "my tree". If there are two individuals that ARE the same person they would be merged, whether they are in fact relation or not.. such as the parents of an in-law or the parents of that parent's of the in-law. Specifically because "we Relate". I don't have a problem with this however I can see where other's might be offended. It is difficult to explain my point... On Wiki Pedia nobody "owns" any wiki page there.. and all members can contribute and edit and those pages are permanent. Here on WeRelate folks are worried about THEIR databases... getting cluttered with unwanted people via a merge. So what happens after a merge and someone deletes their gedcom off of WeRelate??? what happens to pages for those folks that were in that Gedcom that is now gone? are they thereafter floating out there as orphans? Example: Sally Snodgrass uploads "Gedcom A"; Bill Smith uploads "Gedcom B" and sees many if not most of his people match with "Gedcom A" and he spends hours merging and making it all look pretty.. then Sally Snodgrass sees that Bill merged the parents of an in-law with one of her Aunt's husbands and doesn't want so many people in her tree.. and in fact decides to just remove her tree altogether because she is miffed about Bill's work. You can please some of the people some of the time but not all the people all the time. People are protective about their work. I don't think this would be an issue if folks did not want to download their gedcom with edits back onto their home machines, but I think there has been discussion about members hoping to be able to do that. Why would WikiPedia have three pages that tell about Napoleon Bonepart?? Why would we choose not to automatically merge John Smith born 1903 died 1940 in the same place and who has a high "score" thus being a match? Just because? so if Sally Snodgrass does have John Smith and chooses NOT to merge him with an exact match.. but along comes Bill Smith who sees the obvious and goes ahead and matches these two up.. and therefor links up all the ancestors to John Smith, people who are NOT of any interest to Sally Snodgrass what will happen? and even if we call in counselors to have Bill and Sally be nice, what about Mr. Newbie that comes along and sees the same match and starts merging on his own as well? I know some of this is available to happen now, as we can merge, but as it stands it is so daunting to merge that I am guessing few would bother. However once it is automated there could be conflicts. I myself am excited about the prospect of automated merging.. I feel this will help tremendously because my database has 56,000+ people in it. I have to break it down into SO many small gedcoms. I go to one of my immigrant ancestors and begin with him and include all his descendants, and repeat that process over and over, and thus I have all kinds of duplicates on WeRelate as a result, especially since I myself am in each of the gedcoms.. and so are my parents... and so are most of my grandparents and all their siblings, etc! Once automated I can just merge all the "mini gedcoms" into one big family. But will this BIG family then be too large to work at WeRelate? will a huge merged file cause the FTE to slow to a crawl? --Msscarlet1957 23:12, 24 January 2008 (EST) I get the impression that merging is going to come in two flavors. One is to simply avoid or suppress upload of portions of a GEDCOM that are already present on werelate. I believe it's been described as a two-step process, where the entire GEDCOM is uploaded to some temporary space, compared against the overall content of werelate, and then somehow the results will be presented to the user allowing him to pick and choose what is actually conveyed into the general werelate space as new person/family pages. The second form is after-the-fact of upload - recognizing the different copies of Napoleon Boneparte. I believe there's a vision for a tool that will allow a user to say that "Person:Napoleon Boneparte (27)" and "Person:Napoleon Boneparte (28)" should be automatically consolidated to "Person:Napolean Boneparte (27)", and "Person:Napoleon Boneparte (28)" becomes a redirection, but you can do that sort of thing right now manually. As for tree implications, I think that anyone with either Boney (27) or Boney (28) will still have those references, and they would just jump to Boney (27) when they go to view that part of their tree. I don't think deleting a tree has any real global significance. I think it just amounts to a page of references - the pages for person, family, source - don't know the difference. I've been doing lots of merges manually. I'm struck that this is really the point of it all. If someone wants to maintain their work in isolation, werelate just isn't a tool that they're going to like.--Jrm03063 07:37, 25 January 2008 (EST) These are all good points. This is why merging is actually much more difficult to get right than matching. Here are some thoughts. Merging is going to come in two flavors: a tree-based merge when you first upload your tree (we'll also have to do something like this for existing trees), and after-the-fact mergers for people that have been entered or edited on-line. I agree that people getting offended because someone merged or edited their tree is going to happen. I also agree that WeRelate isn't the place for people who want to work in isolation. There are other websites for that. Hopefully it won't happen too often, but if someone does delete their tree, the parts of their tree that have been merged into someone else's tree (and so someone else is also watching those pages) won't be deleted. I'm thinking that we'll also need an "unmerge" function. It would be pretty frustrating if merge were a one-way street. The reason that we have the concept of a "tree" is so that you can limit the number of people that you care about. People in your tree can link to people that are in someone else's tree but not in yours. So if someone else merges their tree into yours, chances are that some of the newly-merged people link to people in their tree that are not in yours. You should be able to add those people to your tree, but it's your choice who from their tree you want to add into your tree. It's ok to leave them outside of your tree and just have people in your tree link to them. I'm reluctant to automatically merge anyone, especially at the beginning. I'll keep a log of who people chose to merge and who people chose not to merge, and the score associated with each pair. If after awhile we see that if the score is above X people choose to merge 99% of the time, then we could consider doing an auto-merge in those cases. But I know of other genealogy databases that did auto-merging and people weren't too happy about it. I'm slowly making the FTE better able to handle large trees. It's much better than it was a few months ago, but I don't think it's ready for a 56,000 person tree yet :-). But it should be by the end of the Summer; certainly by the end of the year. As I get closer to implementing merge, I'll post more ideas here and ask for feedback.--Dallan 15:25, 28 January 2008 (EST) I've been assuming that trees are just a table of references, and that deleting a tree has no particular implications for the person, family, source, image, or other pages referenced. Is that incorrect? Is there some sort of implied delete of a person page performed if a particular page is referenced by no other tree??? What if the page is referenced by another person or family page?--Jrm03063 15:37, 28 January 2008 (EST)
I am probably going to be one of the users that causes trouble regarding automatic merges. While I like the concept of the idea in general and do wish to link to other families; some of the familes on WeRelate are not ones that I consider properly researched and sourced. WFT #233 or research of John Doe tells me nothing. There are at least six trees on RootsWeb that cite one source; which I have proven to be incorrect on the family that I am now entering. These people's trees are incorrect because they did not bother to do actual research. A simple check of census data, in this case, would have eliminated the problem. If I have no choice regarding the merging of my file with another one; I am not sure that I would continue to use WeRelate. Perhaps you can explain to me how you envision this concept of automatic merging to work? --Beth 18:50, 28 January 2008 (EST)
--Amelia.Gerlicher 20:05, 28 January 2008 (EST)
I am for Merging when it is proven to be the same person if not sure leave it be.....When I began work on My Family I only added the In-Law them self, but then I began to think that this is a family tree not just my tree, so for the sake of my brothers and sisters children and my cousins I began to add anything I could find as long as they connected to someone in my family...I have several thousand names...on my hard drive I keep the trees divided as my mother's family, my father's family, my wife's mother's family, and her father's family, the reason for that is my cousins on my mother's side is not interested in my wife's family or my father's family etc....but on here I love the fact that they all tied together....this to me is a proving ground for my work...a place it can be look over by others and added to or corrected...I am more interested in facts than thinking I am beyond mistakes, in which I make more than my share...If everyone could realize what this site can bring about...Facts not just guess work on our family history... when I began I took everyone's work as being right it did not take long to fine that was wrong...I found one line on another site that connected to my line, it had the father listed, then for his father it had his son again and then it went back about 6 generation repeating the same father and son over and over...some time it takes other viewing our work to catch our mistakes.....I uploaded 6 trees and at the bottom of each page when I edit it give me the chose of checking which tree or trees I want this page added to...--Dlbradley1 14:16, 25 February 2009 (EST) [add comment] [edit] Merge Video will be needed... [7 February 2008]Dallan, I just wanted to suggest that once you get the Merge thingy up and running.. we will definitely need a "how-to merge video" :-) --Msscarlet1957 08:38, 7 February 2008 (EST)
[add comment] [edit] Hand-merging isn't so bad...and maybe it could be a whole lot better [25 March 2008]I've spent the last few days doing a lot of merging, and the process isn't really that awful, especially once you develop a few practices that keep you from losing track of where you are. But it also seemed to me that it could be made a lot easier without anything in the way of UI changes. All that is required is being a bit smarter about what happens when a redirection page is checked in. Consider a situation of two family pages that nominally represent the same family. At present, before redirecting A to B, I copy record guts from B to A, attach all the children on B to A, then merge the parents. This leaves me with a family that often contains duplicate children, but I havn't lost anything and it's easier to just work on resolving such duplicates on the target family page. Could the check-in of a "#redirect" on a family page be jiggered such that:
That leaves you with a consolidated family page that presents the needed merge in a more obvious way. It also prevents the situation of inadvertantly cutting off a line in the merge of a family. Finally, it is upwardly compatible with current practice (it would be very strange to use a family redirect as the way to cut away a particular incorrect line). There's probably a corollary procedure for merging people - taking the union of the parent and spouse relationships and making sure the unique set is preserved in the redirection target. Of course this leaves you to merge the page contents proper, but that's the easy bit anyway - just open both pages and keep them both alive until the redirection target has all the content of the source. If you stop in the middle of that, nothing is lost. Thoughts?--Jrm03063 17:28, 10 March 2008 (EDT) That seems like a really good idea! I don't see any downsides to it, and it would be pretty easy to implement. So when you redirect a family, the husband, wife, and children would be added automatically to the target family, and when you redirect a person, the parent and spouse families would be added automatically to the target person. What do others think about this? If there are no concerns, I can implement it the end of this week or early next week.--Dallan 11:07, 12 March 2008 (EDT) I like it! I had actually started doing it this past week on a couple of merges. No one got lost, everything was tidy and easy to keep track of and it created a "bookmark" of sorts that I could come back to to finish up. I've thought about it for a couple of days now and haven't come up with a downside yet. So, if I understand this correctly, the steps involved for doing a merge this way would be:
--Ronni 12:06, 12 March 2008 (EDT)
I'm glad to hear that this might be easy to do. It's essentially the mechanical practice that I follow in working through a merge so that I don't lose a connection. If the check-in of a "#redirect" had this additional behavior, I could move a lot faster and more safely.--Jrm03063 12:12, 12 March 2008 (EDT)
Ok, I'll add it in the next couple of days and leave a message here when it's ready. Thanks for the suggestion!--Dallan 01:41, 17 March 2008 (EDT) It's a week later (I've spent too much time fixing bugs in the digital library), but the #redirect suggestion is working now. If you edit a page and make it a #redirect to another page, the people/families and images that the page links to will be added automatically to the redirect target. I tested it and everything went well, but if you run into problems please let me know. It's a great suggestion!--Dallan 21:56, 24 March 2008 (EDT) This works wonderfully Dallan! And I agree... great suggestion! --Ronni 23:31, 24 March 2008 (EDT) I'm glad this is working out. It's working great for me too. It's only possible though because whoever thought about the data structure up front was careful enough to allow for this. Whoever had the wisdom to allow for alternate parent connections and alternate husband/wife connections deserves thanks. Now everyone, go forth and MERGE!--Jrm03063 13:02, 25 March 2008 (EDT) [add comment] [edit] Where do we stand on merging? [6 May 2008]I'm very happy with the improvements in "redirect" behavior, as they make merging a much simpler business. I wanted to give werelate a shove in the direction of the large connected community tree that it's meant to support, so I've spent the last month or so merging through early New England. It's been generally smooth going, and very satisfying too - a lot of information bubbles up when you bring different user contributions together. Still, I'm afraid I'm somewhat alone in this endeavor. I'm struck that until werelate gets a reputation for having a really large and well connected community tree - not just a bunch of GEDCOMs that live in the same pool - werelate won't take off. So why hasn't merging become important for the masses? I think there are a few reasons:
Without a passion to merge the werelate space, I think we're losing the strongest feature of the site's design. How can we get there?--Jrm03063 12:21, 2 May 2008 (EDT) I'm there with you, Jrm (as you may have noticed). One think that strikes me is the almost complete lack of reaction I see when I merge pages -- perhaps there's no need because the merge isn't controversial, but am I really that good? ;-) I think you're right that usability is an issue. Merging isn't hard if you understand both the general wiki concept and how that applies to genealogy - but that's a small group of people. I think search is certainly a big problem -- the tricks I use to find duplicate people are almost all things I wouldn't want to explain to my grandmother. Luckily that's under construction. I think the current search not only makes merging difficult, but it makes using the site for regular research almost impossible, so hopefully more people will stick around in the future, which will lead them to be more interested in merging. There's something else that might be a problem, however -- notification. I just found this entry by looking at my Watchlist, something I do every so often to appease my curiosity. I didn't get an email, nor have I gotten one about the pages I'm watching that you changed in the last few days. They're not in my junk filter, either. (But I did get notifications for two other pages, so it's not totally broken). If this is happening on a widespread scale, it means that people are not only unaware of (new! exciting!) changes to their own tree, but they may not even be aware that a merge is possible, neither of which does much for the communal editing. (And, whether it's related or not, I have one page in particular that used to have at least 10 people watching it and now has two. So either notification worked too well and they decided they didn't like getting emails, or something is weird.)--Amelia 08:56, 4 May 2008 (EDT)
Well, I have been busy entering pages and have not checked recently for possible merges. I have too many pages to check on a page by page basis. I am waiting for Dallan to implement his merge feature. This morning I entered the surname Coker and location United States and searched. On result page 131-140 I actually found a duplicate page. However this page has no sources. It looks like the entire tree is sourced by so and so's gedcom. Well I really don't wish to merge my page with an unsourced tree. This tree seems to be on the maternal line of my page not the Coker line and I do not intend to research the maternal line. So how are y'all handling this? If the tree was sourced, I would consider it a wonderful opportunity to combine the pages but since it is not I am less than enthusiatic. I am not sure that I wish to have unsourced trees on WeRelate. These trees probably already exist on Ancestry and Rootsweb. --Beth 09:10, 4 May 2008 (EDT) I don't believe many people are using this site because it does not seem ready. I have added some comments to Talk pages suggesting errors and giving sources and have gotten zero response. After two weeks, which seemed like a fair time, given vacations and all, I tried changing one as I had suggested on the Talk page and even that got no response. Assuming that the notification is working, I think a lot of the early people were just trying it out and got discouraged or were just curious, not serious. I personally have stopped entering data (see my previous comments on the difficulties of determining if there are duplications) until I think the effort will be worthwhile (either a reasonable automated merging, about which I have at best a wait and see attitude, but more likely no faith in it working, or better searching that makes identifying likely duplicates better.) Unsourced trees do not bother me. If I want to merge with an unsourced tree, I will add my sources and the important part will no longer be unsourced. My concern is that someone will come along and either merge over, and ignorantly overlay entries refined through years of discussion and collaboration, without providing sources or paying attention to past discussions. This is why I think changes should be "proposed" and then voted on by all people watching a page. A much more conservative, but still democratic approach. I know it is not a good selling point to suggest that making data entry hard to do is a feature, but I think it is. (Personally, I don't see tons of use in uploading gedcoms because I would have to clean mine up anyway to merge.) I would be happy to gradually enter my data and sources a little bit at a time over years, in exchange for participating in discussions with other *interested* persons. Likewise, if I can only propose a change and then must wait for it to get approved, or rejected, then I can wait. The time is not important, since the data is not going anywhere, and if I disagree, I will continue to keep my version of the truth locally, knowing I must find more evidence to convince the jury. --Jrich 09:58, 4 May 2008 (EDT)
--Ronni 12:35, 4 May 2008 (EDT)
Hey Ronni; I believe I pretty well have the concept of it is not my tree anymore; but I cannot wrap my head around merging with an unsourced tree. It may not be "my tree" but my work will then be associated with it. What I enter on WeRelate will stay; whether I do or not. You misunderstood one point that I attempted to make, I think. I did not intend for one to believe that I was disinterested in the pages. I am interested. I just do not have the time to edit or a research this maternal line; a family that married into the Coker line fairly recently. So if I merge the page therefore the trees, I don't have time to source the unsourced pages so unless someone else does they will remain unsourced.
About gedcoms, I initially thought that that the gedcom capability was a must; but have since changed my mind. I am not uploading any gedcoms but I have decided to manually enter all of my data. If we disallow gedcoms then most of the problems with quality will disappear. I believe that quality over quantity is desirable. You are less likely to attract serious researchers; the more junk you allow. When the merge feature is enhanced; I suppose one can attempt to merge my page and I will protest and the majority will rule. --Beth 12:54, 4 May 2008 (EDT) I don't want to stifle enthusiasm for merging, but I don't think that WeRelate ready yet for large-scale use. There are too many known issues: search, match, merge, etc. that make it difficult to use for most people. So we haven't been actively promoting it, nor have we been doing things like issuing newsletters or whatnot to encourage existing users to come back to the website. (Although the lack of notification does seem to be a bug.) What we have is in some ways right now an ideal situation: a group of dedicated people who care about genealogy and how to make collaboration work, who can help decide how WeRelate ought to function. I know that development progress is slow (believe me), but as I step back and look at where we are in comparision with other websites I think we're headed in the right direction - a much better direction than I or most others could have come up with on their own. With a couple of notable exceptions :-) we really haven't explored merge. Merge is something that will get more attention this summer. I agree that merge is at the heart of the promise of WeRelate, and it is frustrating to me that we're not seeing more collaboration. But I think that the website is too complex for most people. We've got to make it easier to use before calling most people back. While GEDCOM upload has some benefits, I agree that it also has some downsides. I think we need to add a step to the upload process to help ensure that the upload doesn't just generate a bunch of duplicates - perhaps something that requires the submitter to look at the submitted GEDCOM and make merge decisions or else the GEDCOM gets removed. And I think we should allow people to request that an unsourced, abandoned tree that's getting in their way be deleted. I imagine as things progress this summer we'll have more ideas around how merge ought to work, and how to reduce abandoned GEDCOM's. For now I'm still working on search. It's obviously taking longer than I anticipated. The good news is that I have hired a student to work over the summer so progress will hopefully pick up.--Dallan 15:09, 6 May 2008 (EDT) [add comment] [edit] Colonial Merge Wrap-Up [6 May 2008]I'm just about reaching the end of a personal merge campaign/vendetta involving early New England settlers. I found that I could find stuff to merge more-or-less mechanically, by recognizing that family names with sequence numbers above "1" typically flag a family that is duplicated. Over the course of my work, I would say that I only hit perhaps two or three families where no part of the name was "Unknown" and there were two or more real families present. Even in those cases, I suspect that the trees I was merging actually had errors, but no matter. Over the last six or so weeks I've redirected something over 1900 family pages, so there's something like a %99 chance that a duplicated page name (again, without "unknown" appearing somewhere) represents an actual duplicated family. Using the knowledge that duplicated family names are hallmarks of probably duplicated tree fragments, I was able to come up with some approaches to systematic detection of duplicates. Starting with my own family tree page, I looked for every example of a family page with a sequence number above "1". I would go to the page in question, skim the contents, then directly change the URL entry to point at the page associated with sequence number "1" (or "2", or whatever). Assuming a duplicate was found, I would always merge the family page down to "1". Merging of family pages no longer cuts away family relationships that exist in the redirection source, so the result of merging one or more family pages down to page "1", is to create a superset of the family relationships - from the various duplicated pages - in the single target page. Working within the single consolidated family page, I would then merge duplicated father, mother, and child pages as appropriate. The result of this operation is the creation of superset person pages, that will often contain subsequent duplicate spouses or parents. I recommend avoiding the urge to move on to redirecting the next layer out, before completing the first layer - for a large merge, you can get pretty confused. After merging the various parents and children in a family, if there are more than a trivial number of duplicated family or spousal relationship pages pointed at by the person pages of the starting family, you may wish to write those down (or write them into a "to do" user page. From there, pick the next duplicated family and repeat the process. After resolving all the duplicate family pages on my family tree page, I moved on to my "watchlist". As you merge family and person pages, your watchlist will grow and additional pages to merge will become apparent. Finally, after clearing away all the potential merge candidates in my watch list, I moved on to an ad-hoc search for matches (still exploiting the property that duplicated family pages generally indicate actual duplication). I wanted to be able to find family pages that were created after the pages that I was looking at in my tree and watch list. Initially, I did this by opening my watch list and selecting the text associated with all of the family pages found there. I built a table for this material where the first column is the family name, the second contains clickable links to the instances of the page that I know about. The third contains a hypothetical link to the page name that would be "next" after the pages that I know about. By watching the list of hypothetical links, to see if any of the named files are present, I can find possible duplicates across my entire set of family pages. Eventually, creating this table by hand became too much of an aggravation, so I started copying my watchlist of family pages into a local file on my system. Using a script that I had written for the purpose, I then automatically generate the wiki "table" content I need. My current version of this table can be seen at User:Jrm03063/Family Overview Table.--Jrm03063 16:37, 6 May 2008 (EDT) Wow, that's amazing! I'm really glad to hear that the family-centric match + merge approach is working well, because I've been planning to focus on matching family pages to find duplicates in the automated process as well. Your table is very cool as well!--Dallan 17:44, 6 May 2008 (EDT) [add comment] [edit] Generalizing Warnings, Including Duplication Detection [30 May 2008]I've developed this material since it first appeared here, and it can now be found at User:Jrm03063/A Functional Specification for Consistency Verification in WeRelate. I'm viewing warnings about possible duplication as essentially the same thing as warnings about more traditional sorts of genealogy database integrity issues (birth after death, etc...). --Jrm03063 29 Jun 2008 Seems like a good idea. We could update the warnings when the page was saved, or when it was indexed 10-60 minutes later. I'd probably want to make the warnings list non-editable, so not a regular "wiki" page, but a "special" page that just retrieves the warnings for the page from a database table and displays them. One issue with delaying warning re-generation is that you wouldn't get notified that the page had warnings when you saved the page. It might be better to re-generate the warnings after every page save in order to give people immediate feedback.--Dallan 19:36, 27 May 2008 (EDT) I'm convinced that anything that moves us toward a larger, more consistent, and more correct data base can only be to the good. Perhaps it doesn't matter how thin or flawed data is at the time it starts life in werelate, as long as the evolutionary path is toward better and more complete data over time. I think that something like a warning infrastructure, that just happens to also suggest merge candidates, would help give things a healthy shove.--Jrm03063 20:04, 27 May 2008 (EDT)
[add comment] [edit] Merging pages [8 September 2008]It would be nice to have the ability to merge the sources also when merging a page. The Help page on merging has not been updated to show the latest changes.--Beth 20:18, 25 August 2008 (EDT)
--Jrm03063 I think there are two situations:
After writing this I can see that your approach of one revision including everything from both pages, followed by a second revision omitting certain material would be pretty useful. The side-by-side merge screen would be a convenient way to exclude certain info and what some people have come to expect after using other programs. Perhaps the merge screen should create two revisions if some information from the merging page is excluded: the first adding all info from the merging page, the second removing excluded information.--Dallan 13:35, 26 August 2008 (EDT)
Solveig promises to update the help pages when the kids are back in school. I do want to try merging sources automatically, but I agree that we'll have to do something to drop junk and duplicative sources before merging.--Dallan 23:16, 25 August 2008 (EDT)
An issue is that in the case of merging in a new GEDCOM, where you might have hundreds of to-be-merged pages, I want the "default" settings to do the right thing most of the time, without requiring follow-on edits of the merged pages. So instead of putting the merged information in a block at the end of the narrative text, I'd rather add unique dates and places (if not already recorded on the page) as additional events by default. Same with sources -- if they're not duplicative and not pure junk, add them as well. Same with relationships. The problem is under this inline approach, once they've been added it becomes more challenging to remember which names/events/relationships were added by the merge and which ones where there before if you want to edit the page to remove some of them. This is where I think a merge screen would be useful -- to show people what is going to be merged into the page. I'm all for keeping this merge screen simple though.
This seems like a good idea. Let me see what I can come up with. How often does the "duplicates in a single family" come up? If not too often, then I'll probably just handle that case with a "Person" merge screen.--Dallan 18:44, 2 September 2008 (EDT)
I have a couple related ideas to offer:
In reading Dallan's comments, I was struck that he was thinking about pages that are new/unworked/immature. Amelia, on the other hand, was concerned about pages that are highly mature. My own thinking, about adding merged information as a block at the end of a page fell somewhere in the middle. So essentially, we're talking about doing the right sort of merge based on the maturity development of a page. Here's the idea:
Given these levels of maturity in a merge target, an appropriate merge strategy could be used.
If anyone asked me, I would be totally happy with only two modes - essentially the forms I describe as "developed" and "mature". I think it would be simpler code to write and leave information in a state where a history inspection would yield understandable results. But if there's a determination to do an interleaved page merge, I don't think that creates problems if it's restricted to immature pages.--Jrm03063 11:14, 4 September 2008 (EDT) I'm currently thinking that an automated merge is not a good idea, so I'm leaning toward as simple a manual merge process as possible:
I think that multiple merge modes would make the system more difficult for newcomers; I'd rather come up with an approach that works well for all cases. To satisfy the approaches described by Jrm03063, the merge screen could also add all of the information from the merged pages:
My preference is option four, because I believe that a common case will be merging an immature page into a developed or mature page, where the immature page doesn't have anything different to add. In that case I'd like to not update the merge target at all so that watchers of that page don't get notified every time someone merges a page that doesn't have anything new to add to the page. I could add an "Edit (expert)" button to the "Merge" screen (in addition to the "Save" button) that would follow option 1 -- add all information from the merge sources into a block at the end of the merge target and open up an edit screen on the merge target so that you could edit it by hand. But I'm not sure how much this would buy you vs. being able to check boxes to say which data elements from the merge sources end up in the merge target. I may be missing something though; you have done much more merging than I have.--Dallan 11:28, 5 September 2008 (EDT) Are you still planning on supporting some sort of auto-merge (once associations have been established) on upload?--Jrm03063 11:59, 5 September 2008 (EDT) I've been thinking lately that as much as I try to have the system not add "junk" from GEDCOM pages in an automated merge, it's going to happen, and an automated merge will give GEDCOM uploaders an easy way to update possibly hundreds of pages at once, and most of the updates will be junk. Others will receive emails about these updates, and seeing that the updates are mostly junk they'll tend to stop reading their change notification emails. So I've been thinking about an alternative: make GEDCOM uploaders merge each family one at a time, but give them an option to exclude from the upload matching ancestors of people who have already been merged. People who are excluded from the upload won't have to be merged, which will save the uploader time. This should encourage people who have long lines of ancestors that they don't care that much about to choose to just go with whatever is already at WeRelate for those ancestors rather than taking the time to merge their information into those people.--Dallan 23:42, 6 September 2008 (EDT) I have a couple of questions/thoughts -
[add comment] [edit] Can it really be true? [6 November 2008]I don't believe it, but when I went to "My Relate" and selected "Show duplicates," I received this screen message: No possible duplicates found. I don't believe it. Jillaine 19:50, 2 November 2008 (EST) You were right not to believe it :-). You must have visited during a short period of time when the duplicates list was being re-built. Normally this happens early in the morning, but over the past few days I've been re-building it sometimes during the day as I've discovered bugs. Anyway, you have a little over 100 duplicates. A quick note: you will notice that a few of the compare-duplicates screens list a "red link" page that doesn't exist, so you can't merge it. This is a result of a page that was either deleted or never got created, but which shows up as an alternate husband/wife to one of your own pages on a family page. You can just ignore these; I'll take care of them.--Dallan 08:15, 5 November 2008 (EST)
[add comment] [edit] Merging blues [20 January 2009]I don't know what happened but there seems to have been a minor flurry of GEDCOM uploading over the Christmas break. I have spent about an hour a day recently trying to merge away duplicates created by GEDCOM uploads having no sources, obscure references to personal databases, and one child per family. I hate to see the time that will be required when WeRelate starts getting heavy use, all just to tread water basically, since few of these uploads add anything useful. Merging is a dangerous activity and I am afraid all this activity will cause me to make a serious error, if it hasn't already. One doesn't see/read the Personal History section, and often ends up making snap decisions about whether I should save this unsupported date, or that unsupported date, or both. One gets pulled into family members one may not be all that familiar with. Then the propagated changes to the associated family sends out all sorts of notifications to other users, all caused by people thinking they are doing a useful service by uploading unsourced family trees. --Jrich 09:40, 5 January 2009 (EST)
Merge-on-upload and a merge-review screen with an "unmerge" button are my highest priorities right now. Both projects have been started and should be done by the end of the month. I'm not sure why we've had so many new users this past week. The Allen County Public Library held a class on WeRelate, but that's the only thing I'm aware of. And I did put in upload limits for newbies - it's just that the limit is 5,000 people. Even with that limit I still get people disappointed that they can't upload larger trees. If the merge-on-upload doesn't work out as expected, I'll reduce it further.--Dallan 15:23, 5 January 2009 (EST)
I plan to update the source-wikipedia templates on a weekly basis, but it has to wait until unmerge, merge-during-upload, and gedcom export are finished. I'm hoping they'll all be done by early February. Then I can modify the wikipedia refresh to do a source-wikipedia search each week.--Dallan 14:46, 8 January 2009 (EST)
[add comment] [edit] Merges lost when GEDCOM updated? [18 March 2008]Here are questions I have after working 4+ hours doing a HUGE merge (which is still in progress). There has been discussion about the ability to "re-upload" an existing GEDCOM to update it with new information obtained. And the way to do that would be to match the Reference Number created by the GEDCOM itself from past and present uploads.
What now happens with PersonA?
Maybe the "redirect" information should have a different place to be placed? But that would not totally solve the problem. I feel there is still potential of loosing any new events John Doe may have added to PersonA at home in his database before an upload to update his GEDCOM --Msscarlet1957 11:45, 14 March 2008 (EDT) In order to support GEDCOM re-upload, a new source citation will have to be added to each person & family in their tree. This source citation will contain a "permanent link" (URL) to the specific version of the page that they last uploaded. We'll add these sources directly to the uploaded GEDCOM and make this modified GEDCOM available as a download. People will have to import this modified GEDCOM back into their desktop genealogy software in order to be able to re-upload their GEDCOM later. Since we'll add the sources directly into the uploaded GEDCOM file, there shouldn't be the information loss that you usually get when going from one GEDCOM format to another.
So when the person re-uploads the GEDCOM, we'll know which pages they're updating, what each page looked like when they last uploaded their GEDCOM, and what each page looks like now by following any redirects to get the current version. Using this information we can determine
If the changes made by the uploader are to different fields than the changes made by others, we apply the changes made by the uploader to the current version of the page. Changes made by others don't get erased; changes made by the uploader show up as changes to just those specific fields. The uploader must now download the new GEDCOM with source citations containing permanent links to the now-current versions of each page. Suppose the uploader and others modify the same field. There are two ways we could go with this; I'm thinking about going with the second:
Another issue is what happens if the new GEDCOM doesn't contain all of the person & family pages that are in the tree. Rather than trying to delete the missing pages, I'm thinking we should send the uploader an email with links to all of the pages in their tree that weren't in the newly-uploaded GEDCOM, and let them decide if they should be removed or not. I think this covers all the bases. Does this answer all of your questions?--Dallan 01:41, 17 March 2008 (EDT)
I think I understand where Dallan is going with this. Matching an arbitrary GEDCOM against the huge universe of werelate is really impractical. Trying to make a program smart enough to know what is a good enough match and what isn't is essentially unsolvable. What can be done though, is to attach a source reference that tells werelate specifically that a person somewhere in a gedcom absolutely is a certain person in the werelate universe. Generally speaking, the easy way to get your home system in sync w/werelate would be to obtain a fresh werelate GEDCOM download, which will have the appropriate tags in place for all the people - but you wouldn't have to. I presume that the "werelate" designator source/tag will have a format that allows you to directly enter it into your home genealogy program where appropriate. There is a place for the sort of guessing/probable matching in werelate - it's when we have a feature that allows the system to browse for potential matches in the werelate universe. That does not result in automatic merging though, but instead, in a set of candidate matches that the next researching coming alone can review. If the human is persuaded by the match, then the human can perform a merge or request a default merge procedure. But combining detection with actual merging logic seems to me extremely perilous (take a look at ancestry.com's "one world tree"). I appreciate that it's not totally "hands off", but it's going to yield a far better data base.--Jrm03063 13:19, 17 March 2008 (EDT) Yes, you could enter the source citations into you desktop genealogy program yourself. They'll be a human-readable citation with a URL in the citation text field. But with the ability to download what is essentially the same GEDCOM that you uploaded except with source citations added, you shouldn't have to. As Jrm03063 points out, matching is problematic. Even if we're 99.5% accurate on matching re-uploaded people to previously-uploaded people, it means we'll either incorrectly-match or not match 25 people in a re-upload of a 5,000-person GEDCOM. That's too many. There is another approach we could take. Some desktop genealogy programs store a unique identifier (UID) for every person. This identifier is included in the GEDCOM's they export, so that a person has the same UID in the GEDCOM file every time. If the GEDCOM includes UID's, then we could record the person's UID and the page version with which it is associated, so that the next time you upload a GEDCOM with the same UID's we could know what page versions to match. The problem is that only 42% of the people that have been uploaded to WeRelate to date have UID's. But when UID's exist, they could potentially be used in place of downloading a modified GEDCOM. One advantage of downloading a modified GEDCOM is that you could share your modified GEDCOM with a cousin, and if they incorporated your GEDCOM into their genealogy and then generated a combined GEDCOM to upload into WeRelate, the system could recognize that some of the people in their GEDCOM already exist at WeRelate and they wouldn't have to go through the match+merge process for those people. The system would just apply whatever changes they had made to those people, just as if you were re-uploading your GEDCOM. If we were instead relying upon UID's, when your cousin uploaded their GEDCOM they would probably have to go through the match+merge process for the people that were also in your GEDCOM, since I don't know if we could assume that the UID's would remain the same your cousin's GEDCOM.--Dallan 12:07, 18 March 2008 (EDT) It's great to be flexible Dallan, but I think you'll make yourself crazy trying to support weird ID/UID stuff. Unless the value is from a reasonable third party (say an ancestral file number, or whatever the successor strategy may be) I don't think an id-based alternative (to the primary werelate url) is wise for trying to figure out who matches who. The url approach that you've mentioned is the sort of thing we want anyway, since a downstream consumer of the GEDCOM may very well be interested in the contents of the associated werelate page. Making the source do double-duty as a tag at re-import time is a really fortuitous coincidence that reinforces good practice. I don't think you want to clutter the story with identifiers that just can't work as well and (often) will not survive a merge. A url to a merged page will usually redirect to somewhere useful...--Jrm03063 13:40, 18 March 2008 (EDT) [add comment] [edit] Losing "Watcher" during a merge [26 April 2009]I am merging pages today, and after the merge, the person who was watching the page I merged is NOT now listed with me on the new merged page. I don't want to lose connections. I think this may be a new bug, as it used to always carry both folks onto the merged pages. Or did I miss it somewhere that this action was dropped? --Kristy 21:52, 15 February 2009 (EST) I'm pretty sure that the bug is that the list of watchers is correct but is not displayed correctly right away. The merged page still shows the pre-merge list of watchers for several hours after the merge. The problem is that the page gets cached at the server before the watcher is added. You can verify that the list of watchers is out of date by going to the URL line in your browser and adding "?action=purge" (without the quotes) to the end of the URL and pressing enter. This causes the page to be re-cached. Fixing this bug is on my ToDo list for next month. I'm sorry about the confusion it causes. In this does not fix the problem, would you please let me know?--Dallan 11:26, 23 February 2009 (EST) [add comment] [edit] The Same and Not the Same [26 April 2009]During a spate of recent Merges, I noticed that items colored in Green are not always strictly equal. But the green boxes have no check marks and you cannot deselect the default choice to select one of the green choices. In dates, especially, there are borderline significant differences. If I am merging two unsourced pages, and one says 1689, and the other says Abt. 1689 (these are considered equal), I want to pick Abt. 1689 every time as probably more representative of what is known (given no sources). But if it is not the one in the right column I cannot. There are some other differences with names that occur. I know it is a common practice to capitalize the last name, but given that there is a separate field for surname in WeRelate, this shouldn't be necessary. So again, if the two names are John Smith and John SMITH, they compare equal, and I am stuck going with the one in the right column. --Jrich 21:32, 25 April 2009 (EDT)
[add comment] [edit] Downloading[add comment] [edit] Should WeRelate allow downloading GEDCOM's? [5 February 2009]The question about allowing anyone to download the file needs some serious consideration. I'm concerned about 'harvesters' who gather lots of different charts and then post them as their own work without either checking for errors or giving any credit to the author. An advantage of the tree staying on WeRelate (as opposed to being downloaded by anyone) is that when corrections are needed they can be made on WeRelate where everyone can see them. But if someone else downloads the file and passes it around, if errors are in their downloaded version, they will be perpetuating the errors - they won't know of the corrections made later on WeRelate. I envision pros and cons on this myself so I recognize the need for serious debate and/or consideration of the subject of downloading while it is still in planning stage.--Janiejac 22:47, 13 September 2007 (EDT)
WorldConnect has come up with a very good compromise protocol on this issue of downloading. It gives the author the options of allowing all to be downloaded OR only a couple of geneations, or something like that. You might check that out. Thanks for the serious consideration.--Dr. Bill 22:43, 15 September 2007 (EDT) I hadn't considered Janiejac's point either -- I think it's a good one. Download isn't scheduled until around the end of the year, so we have time for more discussion.--Dallan 13:04, 18 September 2007 (EDT) Could someone redirect a portion of this exchange to a new subject called 'downloading discussion'? This has sort of evolved from collaboration to downloading.
I want to keep the subject of downloading current and get others point of view on this while it is still in the planning stages. When I upload a file either to my site or to rootsweb or to WeRelate, I do upload all my notes and sources with it. I do believe in sharing and send anyone who requests it a register starting with the individual they are interested in and including notes and sources. But I don't give away my whole data base, notes, sources and all. I want interested folks to contact me with additions/corrections/suggestions and don't want to find all my data and notes posted on someone else's web site. If I upload to WeRelate and it gets edited by myself or anybody else, I want to be able to download the whole thing back to my computer to continue to work offline. And I do like Rootsweb/WorldConnect's ability to designate just how much of one's chart can be downloaded. But the ethical question comes to mind - if others can add to or edit the chart - should that entitle them to download my whole data base? I'd appreciate input from others on this issue.--Janiejac 12:43, 29 September 2007 (EDT) Allowing downloads of GEDCOMs is pretty essential, and an opportunity to boot. As has been observed, some folks like to be able to work on things off-line. Others perhaps want to take material to another system to generate different sorts of reports. I take the view that we need a symmetric capability - if you can upload a GEDCOM, you sure ought to be able to reverse the process. One of the reasons I've lost a lot of interest in ancestry.com isn't the expense, but the crappy GEDCOM they produce (and worse, they can't even fully re-import their own GEDCOM - how embarrassing). It seems that they've been intentionally inept in order to strand data under their proprietary control. The result...I'm looking for an alternative. Besides, if someone was really serious about massive harvesting of werelate data bases, they won't be doing it via GEDCOM, so they could probably do it right now. A GEDCOM download is an opportunity, because a reasonable GEDCOM will be scattered with note/source links back to the werelate site. Skim an ancestry.com GEDCOM and you'll find dozens of links back to ancestry if the GEDCOM has any sources attached. One of the first things I think I would do with a werelate GEDCOM is to replace my ancestry data with a werelate GEDCOM. Then, if people are sniffing around my open tree and source information, they'll find their way to werelate. The way that werelate gains credibility and preeminance isn't by taking a proprietry view of information, but by making it so totally accessible and free that there is no real advantage to getting it elsewhere. It's the wiki way. The information equivalent of if you love it set it free.--Jrm03063 14:38, 8 November 2007 (EST) That's an interesting idea about providing links back to WeRelate in note fields embedded in the GEDCOM. We would have to do something like that anyway in order to satisfy the attribution requirement of our license. Please keep comments coming on this topic. We won't get to GEDCOM download until after match+merge, so we have some time to get comments from everyone.--Dallan 18:47, 8 November 2007 (EST) I think downloading a GEDCOM is a very important feature, and should not be restricted. Even though I intend to do my work primarily in WeRelate going forward, I'd like to be able to download GEDCOMs for various reasons, including ability to put it into other software to generate various pretty-printed reports I can't do here, and as a "back up" of the work I do here. While I appreciate the various degrees of control that ancestry.com gives you when you upload a GEDCOM, there's a significant difference between WeRelate and Ancestry (or most other places like it). On Ancestry, when you upload a GEDCOM, it remains your tree. Here, when you upload a GEDCOM, it becomes your contribution to the ongoing wiki, which other people may add to, link to, correct, etc. From the moment you upload a GEDCOM here, it is no longer your tree, and it wouldn't make any sense for you to be able to dictate who could subsequently download it, especially after it has been enhanced by the work of others. I appreciate the concern about careless people who might download your work, pass it around, and you lose the opportunity for updates. But I'm not sure we can solve the problem of careless people. :-) I for one keep track of where I got valuable information, and always like to keep in touch with those I've collaborated with on common lines. I think the suggestions that the downloaded GEDCOMs have back-links to WeRelate where appropriate are good ones. That's my $.02. --TomChatt 01:56, 9 November 2007 (EST) I've gone back and forth on this issue (i.e., no restrictions vs some restrictions). JRM's comment about embedded links to WeRelate is a very good idea. Tom's comment about "my tree" now being "our tree" needs to be reiterated because it is essentially what WeRelate is all about. That idea alone is one that I think still isn't completely understood when someone starts putting their data online here. I have observed that "misunderstanding" several times in the last few months. If we understand the concept of what is mine is now ours in regards to WeRelate, then restrictions on GEDCOMs would be few if any at all. --Ronni 04:38, 9 November 2007 (EST) I agree with Ronni and TomChatt that the community aspect of the data on WeRelate demands that we have a Gedcom download. If the purpose of wikifying genealogy is to get the best information out there, we must have a way for it to get off of WeRelate into the "wild." But in order to keep supporting the mission of producing high-quality data, it is crucial that downloaded gedcoms be sourced properly. I imagine a download where the sources are all the source page "tites" on WeRelate. That would be bad. It would badly degrade the quality of source citation in any properly sourced database, and would create a tremendous amount of work to replace any links back to the WeRelate pages with the actual publication and date information that would allow me to locate the source. I don't object to links back to the source pages, which do contribute useful information, but the downloaded sources should be as complete as possible (using the fields filled out on the source page, I would imagine). On a separate but related issue, what do we do about the licensing requirements, particularly if someone chooses not to download (or import) sources? Perhaps some explicit statements and instructions during the process about the attribution requirements if people redistribute (I know they can do this now, but it's going to be a much bigger problem once downloading is permitted). And that reminds me of a technical issue we (uh, you, Dallan) need to be sure to solve -- imbedded links in notes that go to other places on WeRelate need to be rendered as full links that are intelligible when imported into a genealogy program. --Amelia.Gerlicher 14:11, 9 November 2007 (EST) I'm thinking that a downloaded GEDCOM would include information from the Source/MySource pages on WeRelate as source records in the GEDCOM. I agree that we'll have to include some explicit statements on the download page about needing to attribute. We could put the attribution links to WeRelate on notes attached to each person/family, or on a source record that is cited by every person/family -- any thoughts on which is best? Your comment about turning embedded wiki links to HTML links is a good reminder -- I'll make a note of that.--Dallan 11:29, 16 November 2007 (EST) Hi, new contributer. Beginer level genealogist. Consider this a comment from the man on the street.... Yes you should allow downloads. But people will need "help" to avoid pitfalls, whether it is an upload or a download. For example, I am one of those careless people who hasn't paid proper attention to how I entered information in our Family Tree Maker. My wife and I have bastardized our usage of the fields so that when I load it up into Werelate, data shows up where it should not. If someone were to download what I loaded they will have to sort through some strange stuff. I need to improve my discipline in managing info in the FTW. (sources, events, and notes fields) I also need to convince my wife that her approach to puting data where she wants is not going to work in the long run. (for example I can't get her to not put Rev. or DR. in the name field...) I plan on maintaining my own database (FTW) as my primary repository on my home computer and "contribute" to Werelate by publishing what I want to share. (Probably everthing I have as I like to share) But, I will not use Werelate as my primary repository. A page on gedcom file format and pro's and con's about how people have used genealogy programs incorrectly and the problems this causes as people get more invested into their data repositories would be good...if it doesn't already exist. (PS I take back any negative comments about my wife, she just handed me tea and home made cookies...) PPs is there a spell checker? Thxs --PeterP 18:48, 26 November 2007 (EST) Hi Peter, one of our big challenges is going to be making the GEDCOM export good enough so that you can incorporate the new material that others have added to your tree into your home database on FTW, so that you don't lose what others have added. As you've seen with your GEDCOM, using the fields in FTW for purposes other than what they're for makes the GEDCOM output look funny. I'm not sure about the different oddities that typically occur, but feel free to add any of your observations to this page. And no, there's no spell checker, but Firefox has one built in.--Dallan 17:13, 4 December 2007 (EST) My vote is a definite yes for allowing downloads of gedcoms; no restrictions. I suggest that you communicate this to new users when they register. Require new users to check a box that the user understands that gedcoms can be downloaded with no restrictions. There are plenty of sites with restrictions; not what I wish for this site. I would also like the ability to download images or is this already possible? --Beth 10:45, 14 December 2007 (EST) It sounds like the general consensus is that we should allow GEDCOM downloads. There's already a statement on the GEDCOM import page and on every edit page that "All contributions to WeRelate are released under the GNU Free Documentation License 1.2 (GFDL)." and that "Others can add to, edit, and redistribute your contributions." I just bolded the first part on the GEDCOM import page to highlight it. We could require people to check a box, but unless it becomes a problem it's not as high of a priority as other things. You can currently download images (one at a time -- right-click on the image to save it to your local disk). Some images are uploaded under fair-use though, so you may not be able to do certain things with those images (possibly not upload them to a commercial site).--Dallan 00:07, 16 December 2007 (EST) I think that is fantastic news Dallan. Glad to know that I can also download images. I hope every user understands the concept of WeRelate including the GNU Free Documentation License. Call me a pessimist but I envision some users getting upset about this or that and deciding to remove "their" tree as has happened on Ancestry and Rootsweb and probably other sites as well. I removed my tree from a site, but that was because I used the merge feature of their software and the file was so messed up that I gave up and removed it. Anyway just thinking that a statement in "plain English" may save some future woes. Thank you and all of your volunteers for your hard work and dedication to WeRelate. --Beth 18:22, 17 December 2007 (EST) I switched the bolding in the gedcom upload text to emphasize the phrase that describes what others can do with your contributions (add to, edit, redistribute) and added "download" as another specific possibility. Hopefully this will make things clearer.--Dallan 17:03, 18 December 2007 (EST) Thanks Dallan, I noticed an option to delete one's family tree in the FTE; can the user delete their family tree? --Beth 07:42, 19 December 2007 (EST) Yes, you can delete the pages in your tree so long as nobody else is "watching" them. If another user is watching one of your pages (which happens if they add the page to their own tree, or if they edit the page and leave the "watch this page" box checked, or if they click on the "Watch" link at the top of the page), then that page does not get deleted. A problem caused by this approach is what happens when you are watching one member of a family that someone else has uploaded, but have forgotten to watch the other family members, and the original uploader removes the tree. The page that you watched is still there, but the other family members have been deleted. I can restore them if this happens, but one of the things on the todo list for next quarter is a screen that will tell you where your "off-tree" links are -- pages in your tree that link to pages not in your tree -- and give you a chance to add those pages to your tree.--Dallan 12:08, 21 December 2007 (EST) I am happy to share. However I am concerned about harvesters who then may put the money on for profit sites. It might be a bit friendlier to have the person just contact the submitter. That way they can make contact, chat and then share information as they wish.--Sheri 20:06, 5 June 2008 (EDT)
Again coming late to the party after a long absence. I initially left WR because I could not easily update my uoaded data -- in part due to lack of gedcom download. But since then I've realized that I was expecting of WR something it is not. It is NOT a genealogy program. It's a wiki -- a place for community-edited content. As someone who works in a sector that is prone to mission-creep, I would encourage the WR to get very clear about t developing a clear and shared mission/purpose that then guides decision-making about such topics as downloadable gedcoms. Another way of saying this is use the right tool for the right job. For me, anyway, WR is not the right tool for me to use to keep MY data updated, but it is the right tool for me to share and to work collaboratively woth others on areas of shared interest in order to improve the quality of genealogical info available to all. From that perspective, downloadability of gedcoms has a different emphasis. Jillaine 10:25, 2 November 2008 (EST)
[add comment] [edit] Dowloading -uploading to update a Gedcom [10 April 2008]All genealogy software is not the same. I can only speak for "The Master Genealogist" TMG as I am a user. For me to "import" a GEDCOM into TMG that I had previously uploaded to WeRelate causes me concerns.
Yes I could hand enter some code into each individual, but even that is unrealistic, as just ONE of my segments contains 2703 people, I have many segments I would like to place on WeRelate as I get them ready. At Rootsweb's WorldConnect, you just click on the name of the GEDCOM you want to update, and choose the gedcom to upload and it's done. Maybe you could create a small Gedcom to upload there, so you could see how the process works and maybe implement something similar at WeRelate. I believe they do use the UID's. I need something more user friendly. If I uploaded my whole Gedcom (once your system can accept such a large file) importation and then upload for updating would still be an issue, because of my filtered out tags upon Gedcom creation. --Msscarlet1957 15:31, 18 March 2008 (EDT)
I hadn't considered that people would upload only a portion of their GEDCOM, but in retrospect it makes sense. I really do want to make the upload+download process easy for people who want to continue to use their desktop genealogy program because I believe that probably half of our users will want to operate that way. While writing this response I realized there is a flaw in my proposal. After subsequent uploads of a GEDCOM file, we can't assume that the person data in the uploaded GEDCOM is the same as the person data on the updated Person page, because the Person page might have information that you have not incorporated into your GEDCOM file. So to determine the updates you have made we'll compare your current GEDCOM against your previous GEDCOM. And to determine conflicts we'll compare the current version of the page also against your previous GEDCOM. If your current GEDCOM has a different value for a field than your previous GEDCOM, then you have updated that field since your last GEDCOM upload. If the the current version of the page also has a different value for that field than your previous GEDCOM, then we'll say there is a conflict, and your updated value will be stored as an "alternate" name/event instead of updating the primary name/event, and you'll get an email telling you about it. We still have to associate each person in your desktop genealogy program with the page title that was generated for that person. We can do this in one of three ways:
This is off-topic, but as you have suggestions for improving usability, or if you find out what keeps your cousins from joining, please send me an email or leave a message on my talk page. I was using the incremental GEDCOM upload procedure too, but only because there were a few lines that I wanted to flesh out with census records while I still had an ancestry subscription. I've allowed that to expire for the time being, and expect in the future only to download GEDCOMs for backup and reporting purposes. I don't expect to actually record any research off-line. In fact, if I thought that werelate was going away, without a wiki alternative, I would probably see about putting together my own server to run it. The only real home-based local stuff that I might do, would be something to keep track of the living. Of course I understand why we need to keep the living off a public genealogy system, but it's a bit of an aggravation to have to think about things in two layers. I have thought about what might be involved in creating a hybrid environment so that I could record information about the living and have it just reside locally on my machine (some sort of pass through for the folks who've shuffled off to werelate), but I just don't have time to do anything with that right now....--Jrm03063 17:09, 1 April 2008 (EDT)
[add comment] [edit] GEDCOM download status [5 February 2009]I know that there is a desire to see GEDCOM download as part of a download-upload-repeat process, but I'm not really interested in that cycle. I would like to be able to download my tree so that I can save and report the content using other software. Where do we stand on GEDCOM download? Thanks...--Jrm03063 16:38, 7 July 2008 (EDT) The GEDCOM programmer has been taking a few months off while he finishes another project, but I think the other project will be done soon. If GEDCOM export were done about the end of the year, would that be too late?--Dallan 00:47, 15 July 2008 (EDT) That's a little farther our than I was hoping for, but volunteer projects run at the rate they run. I was just sort of hoping there was something of a preliminary nature available, so that I could get a snapshot for backup and reporting purposes (I don't work anywhere but werelate).--Jrm03063 09:20, 15 July 2008 (EDT) It is the end of the year. How is the gedcom download capability progressing? I also have been entering my work exclusively on WeRelate for my one-name study and now am concerned that this was a big mistake on my part and possibly I should reenter the data in my genie program and stop entering new information on WeRelate until there is a definite timetable established for the completion of this function. --Beth 19:17, 9 December 2008 (EST) I'm sorry - due to the downturn in the economy I spent the last month working on another project. The other project is launched now, so I'm back to working on WeRelate. But everything is now a month behind. I've spent the last couple of days starting to integrate merging into the GEDCOM upload process as well as supporting GEDCOM re-upload, so that project is progressing. GEDCOM export should be ready by the end of January.--Dallan 17:56, 27 December 2008 (EST)
[add comment] [edit] Warning: For & Against Downloading [9 February 2009]From the sound of the different notices and discussions lately, it sounds as if the download feature is almost ready for implementation. Not knowing what the final product will be, what limitations will be in put in place, or when it will become available, I feel I need to have my say and share with you a couple of my experiences to let you know why I feel this way. I acknowledge that contributors to the discussion above have made good points on all ends of the sprectrum, whether in support of total openness (i.e. full download capability by all users), limited availability (i.e. owner determined) or the status quo (i.e. no download access). Basically, I am – I think the download capability here at WR will be an important function -- particularly to the "owner" (or should I say "primary developer") of any particular family site. Although in a true collaboration site, such as this proposes to be, idealistically everyone who shared in its content would share "ownership," and everyone would work in the collective for the betterment of the community. But as in any idealistic commune put into living practice, there will always be a mischievous malcontent in the group who will take advantage of the generosity and contributions of others for personal gain. Such is the case in the genealogy world -- be it privately or corporately. Having been an avid family historian and genealogical researcher for over 30 years, I started this hobby in the early-1970s with pencil and notebook, voraciously copying everything I could find (either because photocopiers were not readily available late-1970s and early-1980s or I was too poor to use the somewhat primitive copiers that were available back then). These were trips to the National Archives, various libararies, churches, graveyards, courthouses, genealogical & historical societies, and relatives' houses to copy handwritten personal collections held by them. In ten years I collected what I thought was a massive amount of information by personal, firsthand, on-the-spot research. Notebooks became files, files became binders, and a few thin binders begat many thicker binders. I got my first computer in 1985, and even before the LDS Personal Ancestry File (PAF) program became available for personal use at home, I spent many nights sitting in a local Family Research Center inputting data and saving it to my floppy disk. As soon as PAF was released I sent for it and spent many more months typing in much more data and saving it to my computer at home. Eager to share the results of my research, I contributed my collection of 5000 or so names to the LDS Ancestry File (or whatever they called it back then), hoping that I would benefit likewise from people also willing to share and collaborate on the same ancestral lines or collateral lines. When it first appeared on the LDS Ancestry File CDs (it was not made available on-line for a number of years) I could tell I was the only one who had researched and uncovered most of the data. Unfortunately the LDS program was good at accepting basic vital statistical data; not too capable in collecting sources or references. Although I experimented with Dollarhide’s Everyone’s Family Tree (EFT) for a few years in the early 1990s for it’s creative organizational format and story-writing ability, I settled in on Buzbee’s Family Origin’s program primarily for its masterful ability to allow users to input sources. I spent years updating my sources on that program, page by page, name by name, fact by fact. When Buzbee was forced to abandon his creation, I followed him to his new program, RootsMagic, which retained and exhanced source recording and referencing capability. I still use RM today as my primary genealogical software program. Going back to the early years, although I obtained a few bites of information from generous near and distant relatives in the first few years after my contribution to the LDS genealogical library, I witnessed many more unknown pirates taking the data as their own, not attributing the data to me or my research, and making it available on other databases as their own original work. Even today when I search for names at various website, I see my unattributed work, copied repeatedly unsourced and unrecognizable as my original data (except by me). As I said, this applies to private collections as well as the corporate collections. On the private side, the few inquiries I make are usually unanswered. When answered the replies are usually not helpful, primarily because most of these people are name collectors, either uninterested in the source of the information or because it was not identified where they found it. On the corporate side, it’s even more frustrating. The genealogical corporate giants (you know who I mean) collect this information from individual contributors and from other corporate collectors and then regurgitate and sell the information through annual subscription memberships or through sales of genealogical collections on compact disk. My point is – I take immeasurable pride in my work, take immense care in researching, recording and qualifying my sources of information, and want to share the information with others – but not without limits. I don’t want to make it easy for an individual or corporate "tomb raider" to pilfer my collection and try to sell it back to me and to my curious but unsuspecting relatives. If you allow unlimited downloading capability here at WR without any internal control, I can guarantee you that the work of many sincere, honest, hardworking WR collaborators will be collected, edited, repackaged, and marketed for sale elsewhere without attribution to those that put it together. I would rather have no download capability here than to see it given freely and openly without controls. That’s my two pence worth… This is a concern that others have noted as well. One idea for addressing this concern is to add a source citation or a note to every downloaded page containing the title of the WeRelate page that it came from. We have to do this in order to comply with the Creative-Commons license terms (i.e., attribution of the original authors is required). If people remove this source citation, they're violating the terms of the license. We would make this clear during the download process - that the material can be used in any form, both commercially and non-commercially, so long as the WeRelate source citations are kept, and that "derivative works" of the material also be licensed under the same open-content license. Those two points are what our Creative-Commons license requires. This is currently the approach I'm thinking about. Another benefit of this approach is that if the source citation or note isn't removed (I believe most people won't remove it simply because it would take a lot of time to remove it from every person/family), and the downloader publishes the GEDCOM somewhere, recipients will see the link to WeRelate and be able to view the latest version of the page, not just the static outdated one in the GEDCOM. A third benefit of this approach is that if a recipient later uploaded this GEDCOM to WeRelate, we'd be able to link the people/families in their GEDCOM to the correct pages in WeRelate. Alternatively, we could say that only the author of the tree could export it, but this still doesn't stop people from copying the data by hand into their desktop genealogy program (or adding the pages to their own tree and exporting it). And if they copied the data by hand, they'd be much less likely to include the WeRelate source citation giving credit to the authors.--Dallan 13:17, 9 February 2009 (EST) [add comment] [edit] Naming Conventions[add comment] [edit] Naming conventions for wikipedia people [1 December 2008]The lack of ordinary given/surname forms in the medieval period is a bit of a problem for our medieval merge effort. I don't really have a great idea how to solve it completely, but I think we can pass the buck to wikipedia in many of these cases. Many of these folks have backing wikipedia pages, and/or are named in wikipedia pages for their immediate relatives. The names that appear in those places are apt to be the most common forms in use in genealogical research. I'm therefore adopting the convention, at least for medieval nobility, that the werelate page will share the same name as a shadowing wikipedia page (or at least, use a name that appears somewhere in wikipedia if possible). The primary name appearing on the person page itself will also be a form that is close, if not identical, to the name used for the page. Other name forms for the person would follow as needed. Since I've only just started doing this, there are hundreds of person pages out there that don't yet follow this convention. Feel free to comment or jump in and help.--Jrm03063 12:00, 13 November 2008 (EST) I think this is a great idea.--Dallan 22:05, 29 November 2008 (EST) It looks like to me that this breaks the general naming convention by including titles (like "Duke") in the page title. 1) Is this right? and 2) How do explain when to use this rule? A year cutoff? Certain kinds of nobility? It can't be that all pages should be the same as their Wikipedia pages, because that will cause all kinds of havoc with modern pages (maiden names and the disambiguations come to mind as some problems). I think the convention of using the wikipedia names absolutely makes sense for genealogy before modern and reliable given/surname conventions arise. While I didn't really like allowing titles to creep into the names, my thinking is that the names we see on wikipedia represent what the most active researchers expect to use, so we should adopt that. In any case, since I'm expecting to source the information directly from wikipedia, I think the shared page name makes that point more explicitly. I'm not completely sure about the idea of renaming the page when modern naming conventions are present. On the other hand, the same rationale as above applies - the page name in wikipedia is what folks expect to reference the page by. We should probably stay consistent. Also remember - this is the page name for wiki purposes. It doesn't mean that other names can't - or shouldn't - be represented as valid alternatives. I believe the wikipedia name should appear as an alternative there as well - the primary alternative if there's no reason to prefer something else - but as an alternative name at least.--Jrm03063 23:54, 29 November 2008 (EST) Since my current project is First Ladies, I can say that there are no naming conventions at Wikipedia, and therefore I don't think we can say that they are the names WeRelate users will "expect." Some pages use the married name, some use First Maiden Last, some use First First Spouse Second Spouse. I'm not sure any use the just the maiden name as genealogists expect. I suppose Wikipedia goes by how people are "known", which is fine if we don't have a better alternative, but where people actually do have a first and last name, I don't think we should be assuming Wikipedia knows better -- they have a different purpose.--Amelia 00:03, 30 November 2008 (EST) Since this discussion, I've adopted the convention of limiting use of the wikipedia page name to people living before modern given name and surname conventions (essentially just medieval nobility). Still, maybe there's value in having a redirect, using the wikipedia page name, that goes to whatever our page name may be. --Jrm03063 14:32, 1 December 2008 (EST) [add comment] [edit] More for wikipedia people [1 February 2009]I've been merging through our medieval people with a focus on attaching to wikipedia whenever there's a wikipedia page for the person in question. For each such page, I'm doing the following:
I have now gone over 1000 wikipedia template inclusions. Overwhelmingly, these pages are related to medieval nobility. Such pages, prepared along the lines I previously described, have proved extremely helpful as a basis around which the extensive medieval duplication can be consolidated. With respect to the question of the extent of wikipedia page inclusion, I think we need ways to explicitly select only parts of a wikipedia source article, or the whole. Neither situation will fit all circumstances, so we will need copy directives that support all or parts only of a wikipedia source page. I think the default however, should be the whole wikipedia article. Besides the situation of some very short wikipedia pages, and pages that lack convenient initial sections, we should encourage scholarship on such "wikipedia people" to continue to be based on wikipedia whenever possible. In any case, since this is the choice of a default behavior for a routine procedure, if inclusion of entire wikipedia pages proves to be overkill, the next refresh can be done more selectively. I would very much appreciate it if the wikipedia refresh would go forward, so that the results can be reviewed. Thanks...--Jrm03063 00:27, 14 December 2008 (EST) I had to make some changes to the wikipedia refresh code and it's taking awhile to debug, but it's just about ready. I'm going to be out for a few days, but I'll start it running when I return on Thursday. You can use {{source-wikipedia|wikipedia title}} as a short-cut if you want. If you add this template to your page, the refresh will create a template for the wikipedia text, replace the source-wikipedia template with a reference to this newly-created template, and will add a wikipedia-notice template to the bottom of the page.--Dallan 17:56, 27 December 2008 (EST) I've seen the werelate agent kick on again today, and it seems to be chugging away at the wikipedia refresh. I think the results look pretty good, but it made me think a bit more about the previous discussion of including the initial section only (of a backing wikipedia person page) or the whole thing. It occurred to me that there was an underlying (and somewhat sensitive) question to discuss, that perhaps helps us with that. Put most simply:
An even more sensitive question:
I offer the following suggestions.
I don't have definite answers, but here are a few thoughts:
--Dallan 15:46, 3 January 2009 (EST) BTW, the Wikipedia refresh finished over the weekend.--Dallan 15:23, 5 January 2009 (EST) [add comment] [edit] How does it do that? [3 February 2009]I noticed after the latest refresh that Fear Brewster's wikipedia content has a link to the WeRelate page on her husband Isaac, but has a Wikipedia link to her son Isaac Alterton Jr., who is also on WeRelate here. The template shows no human intervention. How did it find the husband's page automatically, and is there a way to get it to recognize the son's in a way that does not require repeatedly editing the template after each refresh?--Amelia 18:25, 1 February 2009 (EST)
Here's what happens: The system uses the copy-wikipedia template in Template:Wp-... pages to create a correlation between the Template:Wp-... pages and wikipedia pages. It also uses the Wp-... template reference within a Person or Family (or Place, etc.) page to create a correlation between the WeRelate pages and the Template:Wp-... pages. These two correlations are then used together to convert links in wikipedia text into WeRelate links. A source-wikipedia reference within a WeRelate page creates entries in both correlations at once, because it causes the Template:Wp-... page to be created. The reason that Person:Isaac Allerton (2) isn't linked to from the wikipedia text for Person:Isaac Allerton (1) is because Person:Isaac Allerton (2) doesn't contain a Template:Wp-... reference. It would need one (or a source-wikipedia reference) in order to create the necessary correlation. And if the Wikipedia article links to a disambiguation page, or if the Template:Wp-... links to a disambiguation page (which can happen if the wikipedia title used to be a real article but was later changed into a disambiguation page) the correlation is broken.--Dallan 22:51, 3 February 2009 (EST) [add comment] [edit] Wikipedia People Page Naming Conventions [7 January 2009]Folks - do we agree or not agree that, unless there is a good reason to the contrary, a person with a wikipedia page should have their werelate page named identically with the wikipedia page? It's a convention I've been observing for over 1000 people in the medieval nobility space, and I've recently seen folks paving that over here and there. If there's a better idea out there great, but we ought to discuss it...--Jrm03063 16:36, 6 January 2009 (EST)
Ok, I'm generally talking about folks that are medieval nobility, prior to modern naming conventions, where the werelate conventions as they are don't work anyway.--Jrm03063 22:25, 6 January 2009 (EST) I'm fine for using it for people without surnames.--Dallan 14:46, 8 January 2009 (EST) |