GEDCOM upload process [8 November 2010]
I have been a little frustrated lately with GEDCOM uploads from the point of view of maintaining data that is already in WeRelate. It seems like every new user that uploads a GEDCOM ends up adding duplicates, changing existing pages with bad data contradicted by sources that are already on the page, adding alternate dates that don't match simply because they are less precise, etc. There seems to be no reading of the pre-existing page by these users before-hand, nor in most cases, any checking to see if the resulting page makes sense. And hardly any of the uploads by new users contain sources, much less sources that indicate any kind of quality genealogy (i.e., trying to locate the primary sources as opposed to believing some stranger's ancestry tree, etc). And of course they never realized they were doing anything wrong, or aren't aware how to respond to the error messages, or don't know how to fix the problems after they have created them. It's not a massive problem, but every hour spent removing some of these errors is an hour not spent doing something a little more constructive. More new users will only create more of this entropy, until there is no longer any forward movement.
To quote a Jlanoux comment: "Gedcom import isn't the first thing we want a newcomer to do." Yet it is invariably the first thing they want to do. This rush to upload, apparently without even looking to see if there is anything useful to learn from what is already in WeRelate, doesn't seem to indicate the attitude of a good collaborative researcher. It seems to indicate that the new user thinks they have stumbled onto another rootsweb or ancestry.com.
So can we slow these new users down a little? These are various thoughts I have had, one or more of which may be possible to implement so as to provide some more protection from "newbie" uploading:
- Perhaps start with a smaller number than 5000, even as low as 100-200 and get that successfully done without too much negative feedback before allowing someone to do 5000.
- Make them demonstrate some minimal degree of proficiency with WeRelate before giving them upload privileges. This could be done by requiring them to have performed certain key processes before giving them upload capability (adding a page manually, adding a source citation, posting to a Talk page, doing a merge, etc.). I don't know if it would possible to develop some kind of computer-guided training where the user is led through a captured session and their responses are analyzed, and corrected with feedback if necessary. The ability to have them do a small GEDCOM upload in the sandbox in this manner would be terrific!
- Find some way to ensure they have read (and understand!) some minimal set of instruction pages before they are given permission to upload a GEDCOM.
- Especially the further back in time you go, and as the pool of affected descendants/researchers gets larger, it would be nice if there could be more restrictions on what an upload is allowed to change. For example, add a warning that people before 1800 require sources. Or even at a more recent date. This would help new users get feedback/guidance about what is desired before they change pages. --Jrich 18:37, 29 June 2010 (EDT)
I've been waiting to see others' opinions on this subject before I voiced my own, but here goes.
- We've had a couple of gedcoms uploaded during the past several weeks that negatively affected existing pages, so I think that a solution to the problem is needed.
- I don't think that requiring manual page adding before a gedcom upload would be very helpful, because the type of editing that you do during gedcom upload is pretty different than the type of editing you do during manual page adds.
- We could require a much smaller number to start, say 100-200, but the risk is that with such a small number the upload doesn't match any or many existing pages, so the uploader hasn't learned proper etiquette before uploading their large gedcom.
- I like the idea of computer-guided training or ensuring that the uploader has read and understood some instructions before being given permission to upload. We could use that opportunity to explain the proper etiquette and procedures for matching and updating existing pages.
Here's what I propose: Would anyone who is interested please edit: Help:Before you import your GEDCOM. This help page would be required reading for anyone to upload their GEDCOM. Ideally this page would teach people how to match and update existing pages. It would be great if this page were a community effort. If others contribute the content of this page I can add screenshots and require that people pass a quiz on the contents in order to upload their GEDCOMs.
What do people think?--Dallan 23:07, 4 July 2010 (EDT)
- I think this is going to get super long and complicated. I agree with Jrich. Even though manual edits may not be the same type of editing required in a gedcom, I think what we most want is users willing to stick around. If the first thing someone wants to do is upload 5000 people, and they aren't willing to learn the ropes first, then this is not the place for them. People who start out by improving pages manually are more likely to understand the basic concepts that will make for better uploads. As for how they learn to upload reasonably, I think having such a page as a resource is useful, but if we're going to do a tutorial, I think we have to figure out how to get the main points across with much less text.--Amelia 15:07, 7 July 2010 (EDT)
I've been watching what happens when gedcom files are uploaded and agree that the time has come to do something. New users are uploading files without having any idea what WeRelate is, that they are creating web pages, or that they are overwriting existing pages. We have to do something to make them stop and think about these things or we will be spending all of our time cleaning up behind them.
I see new users writing to complain about the 5000 people limit. Yet these are the very people likely to have created a tangled mess by importing several gedcoms and never actually looking at the data. It is simply not possible to do an effective job with a large gedcom. For one thing, we're not working in a static environment. It takes me about two weeks of steady work to clean up a gedcom with 1000 people. I don't do that any more.
I don't see it mentioned, but would like to add that we expect users to become part of the community and work with the other members to improve their pages.
I'll leave more specific comments on the talk page. --Judy (jlanoux) 14:02, 5 July 2010 (EDT)
I have been following this discussion with interest as I am another user with a large gedcom. I am conducting a one-name study of KEMP and would love to upload updates to WeRelate but it is too difficult.
Following my initial gedcom upload I now find if I wish to add content it is simpler to edit a page or create a new page. For a 50,000+ gedcom, this is not a method I will be using for updates.
Other collaborative forums such as Geni.com or MyHeritage offer the ability to 'match' entries and with the concept of managers they allow individuals to merge records together.
What I see as lacking here are the basic tools that allow users to find and merge records in a simple manner, both at gedcom upload time and subsequently upon login.
nastrond 18:24, 5 July 2010 (EDT)
- FWIW, gedcom's are matched against existing pages during upload. The problems that people are talking about occur mainly when someone matches an existing page and then merges the information from their gedcom into it in a negative way (removing existing sources, replacing good dates with bad, etc.).
- Also, you can select "Show duplicates" from the MyRelate menu to find probable matches with pages in your watchlist (and merge the duplicate pages) at any time.--Dallan 15:40, 6 July 2010 (EDT)
- Certainly the process, if used right, is not, in and of itself, the problem. Part of the problem is that is people don't understand how WeRelate works, and they perhaps do these things inadvertently. As has been pointed out, uploading a GEDCOM is complicated, much like merging, and I believe, inherently so.
- Another issue is if the expectations that the user has, generated by the term "GEDCOM upload", is that you just press a button and magically all the data shows up in WeRelate. What happens when they find out it is a long, and potentially tedious process? Will they just look for some set of steps that makes all the barriers disappear the quickest, not having an understanding of what they are doing? Are they planning a long-term cooperative relationship with the WeRelate community or do they just think they have found a free way to archive their GEDCOM?
- In several of the GEDCOM uploads I have had to clean up after, the user apparently either did not believe what was shown in WeRelate, or not did not know how to interpret WeRelate's presentation well enough to recognize that their people were already in WeRelate with partial or different data. They then end up creating duplicate pages because they don't merge when they should have, or to preserve their own data on a new page when it disagrees with the existing page. Even if they do merge, they leave all the information checked so their Abt 1690 birthdate shows up as an alternate to the existing birthdate of 8 Aug 1690. Or they keep their source of SMITH.GED when the page already has vital records cited supporting all the dates presented. Or they add children to a family page before the marriage date. All things that indicate they didn't read the page that was presented for their review during the upload process.
--Jrich 17:28, 6 July 2010 (EDT)
The problem is inherent in WeRelate. The purpose of WeRelate is collaboration; the act of working together to produce a piece of work. Uploading gedcoms collected from various family trees collected from internet sources is not working; it is collecting names for your family tree without source citations.
I propose that one is not entitled to upload a gedcom unless it is sourced with legitimate sources. Gedcoms with no sources or sourced by someone's family tree or the Broderbund family trees would not be allowed.
If a user is not allowed to upload their gedcom, they would still have an option of entering their direct line with sources with assistance from a volunteer to help educate the user. They could also be referred to several of the excellent online tutorials on genealogy. Just my 2 cents.--Beth 20:11, 6 July 2010 (EDT)
- I have an idea that if a user is denied the ability to upload because some sources are someone else's tree, I doubt if they will stay around to become a collaborative user. I see the problem, but how to get around it without being so restrictive that it discourages folks? My own data base is a mixture; original research, connecting info from well done roots web charts (not imported GEDCOMs) and material contributed by other family members with no sources except the personal knowledge of that descendant.
--Janiejac 23:46, 6 July 2010 (EDT)
---You are a well respected member of the WeRelate community. My suggestion was intended to restrict gedcom uploads with no legitimate sources. If you look at my pages you will find that some have no sources simply because I haven't found the time to enter the information. I am not suggesting that every page must be sourced. I am just saying the purpose of WeRelate is not to collect combined family trees from Ancestry and Rootsweb when the user has no intent of collaborating but rather is using WeRelate to deposit their collective trees on yet another site. --Beth 00:50, 7 July 2010 (EDT)
- I agree with Janiejac. If we make it so restrictive, it will discourage use. I disagree with the notion that GEDCOMs have to have "legitimate" sources. The point of WeRelate, as you said, is to collaborate. However, I can't collaborate with what's not there. I would *much* rather have someone upload their GEDCOM, even if there are no sources, than not upload it at all. Do some pages end up looking messy with ALT births and deaths? Sure. But the page is there and we can all contribute to it.
- I think that part of the problem with merging during a GEDCOM upload, aside from not being familiar with the process, is that some people aren't really comfortable changing what it already there. WR is so different than the other genealogy sites and they haven't yet wrapped their head around the fact that nobody "owns" a page. They haven't quite grasped the concept of "If you have a better source, add it!" We end up with multiple pages, but (1) we're always going to have multiple pages and (2) at least their data is there and can be merged in the future. -- Amy 07:44, 7 July 2010 (EDT)
- Lack of sources is not a big problem, but it is a symptom of not understanding, which is the big problem. And unfortunately because of the way other sites work, people want to upload GEDCOMs as their first act, when their lack of understanding is greatest. This is why Beth makes the distinction about being a well-respected member: janiejac clearly knows how things work so can be trusted to do things right. The same can not be said of the new user who just got their Welcome message yesterday. Hence the idea of slowing people down by making them pass their OWLs before they get to use the GEDCOM spell (Harry Potter methapor).
- Agreed we need data. I still work my way along various lines to immigrants who have respected books dedicated to their descendants, only to find nothing entered, or a merged-up mess that bears little resemblance to the truth. But that doesn't mean we need AFN-type volume! I have talked to various people about werelate.org, people I meet at family history centers, pouring over films of deeds and parish records, the kind that understand what a good source is, and know what "exhaustive survey of sources" means, and are likely to have GEDCOMs I would love to see added to WeRelate, but none have joined. Perhaps they haven't thought about this type of collaboration and don't appreciate it, but what are the probabilities that the random pages they sample will impress them with their reliability and research? Quality data is at least as important as quantity of data. --Jrich 09:50, 7 July 2010 (EDT)
I would also like to add my 2 cents. I don’t know if it is possible to count the number of times an individual loads a successful gedcom but I would like to see the first 3 or 4 gedcoms severely limited in size to about 20 or 30 individuals. This would force learning the ropes without causing many problems. This comment is based on my personal experience. One of my first gedcoms was too large and now I restrict my size to about 40 or 50 individuals.
Before I create the gedcom I check each individual’s notes for presentation, completeness and because the information may be 2 to 5 years old I check all old and new sources for the latest information. Next I select my individuals, create the gedcom and immediately import it back into my software. I then again review all individuals, correcting if necessary and run a problem check. Meanwhile I will have checked WeRelate for duplicates, obviously sometimes missing some. If things look good I then make the final gedcom and send it to WeRelate, usually without any serious problems.
Once it is on WeRelate, I again try to review all individuals for presentation and, if possible, change MySources to Sources. I then have to start working on Images and Documents that pertain to the individuals. A 40 person gedcom can easily take three months from start to finish working an hour or two on 3 or 4 days a week.
As you can see, I am not likely to ever attempt even a 200 person gedcom. HLJ411
- And here is where I have mixed thoughts about the WeRelate GEDCOM upload process. I see both ends of the spectrum: On the one side, the need for simplicity and user friendliness to encourage users (especially new users) to add their family trees to WeRelate; and on the other side, the need for screening tools, quality checks, and process/data review edits to enhance the quality, reliability and soundness of data uploaded to WeRelate. Trying to determine the point where the degree of difficulty in adding increasing degrees of quality data turns off the general public, thereby resulting in a reduction of available data, is a dilemma.
- A year ago I would have had no problem adding a 500 person family tree through the GEDCOM upload process -- a process which we all agree permitted questionable results, and many agree encouraged quantity over quality. Today, I could hardly image loading a 500-person GEDCOM and doing it right with the current process -- a process which now encourages (some might say, demands) quality over quantity.
- The remarks from other users above show that some would like to restrict uploads even further, but are we willing to accept the resulting inverse proportion of new data coming in (and users joining) the WeRelate community because of those restrictions? How much data will never be added to our Pando tree because of an increasingly difficult, time-consuming and tedious upload process?
- Questions to consider... --BobC 10:34, 7 July 2010 (EDT)
As a new user (who hopes he hasn't messed things up too much learning how the system works), let me put out a few points.
1) I had started with the mistaken impression that when I started, that when I was going through the review process, that all of my entry and additions were being held until I click the final Import button. (after all, all the new data is held up) I figured the review process gave me an opportunity to experiment with options, and have the option at the undo or discard everything if things don't look right.
- Good point. The system can't hold your edits to existing pages until you upload your gedcom, because others may edit the page in the meantime, and then the system wouldn't know how to apply your edits to the changed page. The fact that the system applies your edits to existing pages immediately needs to be explained better.
Note that the site instructions encourage you to do a bigger chunk initially as it warns you that if you later resubmit a GEDCOM you are going to have to rematch everyone. Some people may not be comfortable with how to limit their GEDCOM to parts of their file to do it in chunks. What might help here is some tool to help prune a GEDCOM and be able to remove sub trees quickly. I bet if there was an easy escape method, where if a person finds a point where it becomes hard for them to figure out whee things fit, to be able to abandon the tree past that point, there would be less pages messed up.
Perhaps another suggestion would be to start people off in a beginner mode (which when they learn the system well enough they can find the option to turn this off) which when on, limits what the user can do, especially during a GEDCOM import. For example, a beginner, when merging people from their GEDCOM, could only add information to an existing page, not delete or merge existing information. They can match a person from their file to a single existing person (or to no one to make it new), and perhaps only add alternate events if they include more information then the existing event has (this will require the software to understand more details about dates).
- I could make it so that new users couldn't remove existing information from existing pages; only add information to them. I'm not sure that's generally desirable though. I'd rather teach them (through this tutorial) how to improve the pages through their edits instead of degrading them.
2) A second comment is on style and support for that. For example, my personal file, has locations where US counties ALWAYS have the word county in them (except in Louisiana where they have Parish), and this date is then often suppressed for output to make the sentence for the output cleaner. (The county name is there for the place cross index, but when displayed in the narrative, it normally reads better without. When exporting for here, there is no easy way to adjust this. I can adjust the link to make it link the right place page, but I haven't seen any easy way to make the entry read per the style guide except to try and remember to go back after the import. A related question which I don't see clearly in the style guide is how to deal with place that have changed names or had higher political division change over time. The standard I have been following is to name places as they were at the time of the event, with clarification as to the modern name if needed. The place pages here seem to be set up based on modern names and divisions, should places be recorded as they were are as they are now?
- The current policy is to display places as they are specified in the record and link to a place page whose title represents the place hierarchy as it was around the 20th century (we follow the place hierarchies used in the Family History Library Catalog). During a GEDCOM import, places are kept as-is, and the system tries to link them to place pages using the alternate-name and previously-located-in information in the place pages. I could have the system edit the displayed GEDCOM places for style to remove "type" words like county, expand abbreviations (like USA), and add missing levels (e.g., add the county name when only the city and state are mentioned), etc. That seems like a good idea long term.
3) In looking at an alternative to GEDCOM import, I tried entering a group of people, and found the interface somewhat slow and clunky for bulk entry.
- Yes, this will hopefully be greatly improved in the next few months.--Dallan 16:52, 30 July 2010 (EDT)
First, the GEDCOM based division of the interface into Person Pages and Family Pages feels like it greatly increased the amount of entry tasks. It should be possible to automatically create the family pages from the two Persons, replacing the Find Family page link to a find Spouse link.
Second, I find the entry of sources to be clumsy, especially for a sourced used often. You can try to trust yourself to type the name of the source exactly, or you can use the look-up function, which uses at least two page loads, including a pop-up window, and the search function seems less than ideal in finding some sources (even exactly typing the name of the source doesn't guarantee that the source appears first in the list). I really wish there was some way to quickly find a previously used source without leaving the page.--Richard_Damon 23:12, 13 July 2010 (EDT)
- Richard -- I don't have much to say of substance on your post (as I agree with you), but I just wanted to say that I've seen a number of your edits come through and thought more than once that you were doing an excellent and careful job. Thanks for taking the time to learn the process.--Amelia 00:05, 14 July 2010 (EDT)
First off, I'll admit that I have not read this entire discussion, but I read enough to understand (and agree with) the main issues. I agree with a lot of the suggestions, such as restricting new users to small GEDCOM files and encouraging use of the Sandbox. Even though I had already done some manual editing and merges, I started with a 20-person file in Sandbox before attempting a live upload (of the same 20 person file). I still managed to make an error in the live upload (a known issue with matching a source after matching families) - but that is easily fixed with only 20 people to worry about.
What it appears no one has suggested is a rating system for users. My husband suggested this, based on experience with other collaborative websites such as slashdot.org. I don't know how sophisticated (and complicated) this needs to be, but my husband pointed out that slashdot.org is open-source, so it might be possible to reuse some of that code (following, of course, open-source protocol). What I had in mind was people rating pages for such things as data quality (not having nonsensical data, nonsensical/useless alternates or nonsensical notes) and quality of sources. The problem, of course, would be in determining which user(s) to credit with a page's rating. The last editor should not get credit for good-quality sources if they were added by an earlier editor. Nor should an editor who makes a minor change be necessarily penalized for a poor-quality page that they are not in a position to clean up. But maybe GEDCOM updates at least could be rated. Start with a select community of known collaborators, and as a user achieves a high-quality rating, allow them also to rate other contributors. A new contributor with no rating is allowed to upload only small GEDCOM files (and maybe can ask for them to rated), while a user with a high-quality rating is allowed to upload larger GEDCOM files. A user with a low rating is required to go through a tutorial before submitting additional (small) GEDCOM files. There might even be provision for personalized assistance/training for users with low ratings who are nevertheless interested in "doing it right". Those users who are clearly not interested in improving their collaboration could be banned from submitting GEDCOMs or even from editing pages (in extreme situations) - although we know that some of them will just create new profiles to get around restrictions. These users should, of course, be gently persuaded to use another website that suits their needs better. Any thoughts on instituting at least a simplified rating system (based on GEDCOM submissions)?
One other comment - if we want to encourage the use of Sandbox, we have to make it easier. There has to be a link to the Sandbox from the Import GEDCOM main page (and from the WeRelate home page too, please). And (I hate to complain, but) Sandbox GEDCOMs should be processed in a timely manner (mine took several weeks - maybe someone was on vacation, which I certainly do not begrudge, as I was myself :) ). --DataAnalyst 11:36, 20 August 2010 (EDT)
- Good point. The sandbox doesn't get monitored and if the gedcom uploader crashes nothing alerts me to re-start it. I'll start monitoring the sandbox and add the links you suggest.--Dallan 19:13, 23 August 2010 (EDT)
- Actually the subject was looked at again at a Watercooler discussion recently at Watercooler: Rating System Re-Visited. While slightly difference in focus (it looked more at subject pages produced within WeRelate), you may want to review the comments and concensus (or lack thereof) there.
- The idea of rating contributors based on GEDCOM uploads is a difficult one for me to fathom. Do you base it on quantity, quality, numbers of sources, evaluation of data, connection to existing pages, or some other criteria? Would it contribute to what may be seen and interpreted as an "elite class" here? I fear what you are talking about may be another heavily subjective and manually intensive initiative that would please no one in the end.
- And then to what end? Do you really want an eBay-type rating system assigned to each contributor here? Will someone with four stars behind their name be better than the person with only two stars behind their name? The focus should be on the DATA, not the person contributing the data. Saying it another way, since there is (or should be) no ownership issues here, WeRelate does not (and should not) focus on the individual contributor. This is a community of like-minded collaborators, not an elite society of genealogy stars with their own databases.
- If you've read the complete discussion here, you'll notice the subject of inputting GEDCOMs to WeRelate is almost considered a "necessary evil" here. You yourself admit your experience at uploading your test GEDCOM file was not an easy one -- that is in many ways purposeful. The organizers and majority of WeRelate users do not want it to be another GEDCOM dump.
- In response to some of your other points...
- There is already a lot of personalized assistance and training here for everyone with various skill levels, such as the many Help pages, Watercooler page, New User Support page, and Community Portal page with all its links to other primary portal pages, special interest portal pages, community projects, and various help topics.
- There is already a policing tool that is managed by administrators quite well throughout the process, ranging from direct email contact, to advice given on a user's talk page, to actual blocking of IP addresses for extreme violations.
- So, in case my viewpoint was not expressed clearly enough, my vote is NO to rating contributors. With respect, those are my thoughts. --BobC 17:52, 20 August 2010 (EDT)
- I'm really impressed with a website called StackOverflow and a facebook game called Farmville. StackOverflow went from no users to having roughly 1/3 of all software developers visit it at least monthly in about a year. I watched a video where the founders explained that users were rewarded for doing activities that they (the founders) wanted to encourage on the site. Users could earn points or badges for different things. They found that people really focused on the things that helped them increase their rating. I think the lesson there is to first figure out what we want people to focus on at WeRelate, then how to rate people for doing those activities in a way that most will feel is objective/fair.
- I'm impressed by Farmville because it's kind of a boring game, yet 75M people play it at least monthly. Farmville does this by strongly encouraging people to get their friends to participate. I hope we can learn from this at WeRelate someday as well.
- Both of these things: what do we want people to focus on and how to encourage it, and how to encourage people to invite their friends to participate, are things that I think fall into the "important but not urgent" category -- worth discussing for awhile until we can come up with something that everyone agrees on.--Dallan 19:13, 23 August 2010 (EDT)
how about a pre-upload first-time survey? [8 November 2010]
How about a survey for first-time GEDCOM uploaders that requires them to answer the following questions?:
1 When did you create your WeRelate account?
-- within the last 24 hours (1)
-- within the last month (2)
-- within the last year (3)
-- more than a year ago (4)
2 How would you rate your participation in WeRelate to date?
-- this is my first engagement with WR (1)
-- I've been watching pages for awhile, but have never edited or added information myself (2)
-- I periodically edit and/or add pages (3)
-- I frequently edit and/or add pages (4)
3 How large is the GEDCOM you're considering uploading?
-- more than 1000 people (1)
-- between 100-999 people (2)
-- between 51-99 people (3)
-- less than 50 people (4)
4 What one answer best describes your intentions after uploading your GEDCOM?
-- I just want a place to store it; I do not intend to do any further work on it (1)
-- I want to share it, but won't have much time to work on it (2)
-- I want to share it and actively work with others to improve it (4)
5 What answer best describes how you feel about the content you're considering uploading to WR?
-- I don't want anyone to change anything (1)
-- I want people to talk with me first before they change anything (2)
-- I look forward to people working on this content with me (4)
6 What answer best describes the content you're considering uploading to WR?
-- it's a collection of information I've gathered from other sources; I didn't do the research myself (1)
-- it's a combination of collected from others and my own research (3)
-- it's mostly or only my own research (4)
7 What answer best describes the documentation behind the content you're considering uploading to WR?
-- I don't have any source information for this content (1)
-- I have source information but I am not including it in the GEDCOM upload (1)
-- I have some source information of mixed quality (2)
-- I have mostly source information that quotes original documentation (4)
The online survey then calculates the responses.
Scores equal to or greater than 21 allow them to move to the next stage of GEDCOM upload and marks their account as not requiring future survey-taking before the next GEDCOM upload.
Scores between 14 and 20 disallow immediate GEDCOM upload and point them to a set of recommendations to complete prior to uploading; such people would need to repeat the survey for a subsequent upload until their score hits 21 or higher.
Scores below 14 disallow immediate GEDCOM upload and responds with:
"You may want to reconsider uploading your GEDCOM WeRelate. WeRelate is a collaborative community where all people share and work collaboratively on contributed content (including uploaded GEDCOMs). Your answers lead us to believe that you are seeking another type of service than WeRelate. For more information..."--Jillaine 12:22, 8 November 2010 (EST)
Recommendations for edit of initial draft of this document on 6 July 2010.
Privatizing is bad [6 July 2010]
We have found through experience that it is not good for the user to use privatize functions of their program. These do not prevent the person from being exported. What they do is strip the information from the person. Thus WeRelate has no way to recognize this person as living and he gets a page created.
It is better if the user does NOT use the privatize function. Then WeRelate can recognize that not only is this person living, but also that his siblings, children, parents, etc who may not have dates are also probably living. --Judy (jlanoux) 09:51, 6 July 2010 (EDT)
Source Conversion [9 July 2010]
Needs overhaul. I don't think we should be confusing people with what we think we might do some day. Keep to the point: what they need to do now.
- Understand that there are personal (MySource) sources and Community sources (Source).
- That all of their sources are preserved in MySources.
- As an optional step, they may choose to indicate a matching Community source which will be used instead of a MySource. (see Note)
- Pages can be edited in the future to update sources. MySources can later be redirected to Community Source pages.
Note: Until we are able to indicate the Page Title of a Source to match to, this does not work well enough for even me to use any longer. Playing hide and seek through thousands of matches isn't fun. It is far easier to find a source in a separate window with the real search than it is in the GedReview window.--Judy (jlanoux) 10:07, 6 July 2010 (EDT)
- I completely agree. The source matching isn't worth the trouble of reviewing in an import. -- Amy 08:18, 9 July 2010 (EDT)
- I agree that this is a massive pain right now and the number one time suck. I refuse not to do it, and I really hate to tell people officially not to do it, but the search is often simply unusable even for really common sources. Census in particular is actually impossible to match unless you are already watching the pages you need.--Amelia 13:00, 9 July 2010 (EDT)
Personal identifying information on living contacts [6 July 2010]
If absolutely needed, you can substitute the word "at" (spelled out, with a space before and after the word) for the "@" in the email address.
- I think this sentence should be deleted. I don't care if they think it is needed. I never gave anybody permission to paste my personal email across the internet when I sent them information. You can't rely on people uploading gedcoms to have good judgement. Many of them don't.--Judy (jlanoux) 10:12, 6 July 2010 (EDT)
Gedcoms as sources [9 July 2010]
Rename or remove such sources as "SMITH.GED" or "JONES.FTM.
- I continue to disagree strongly with this policy on WeRelate. If someone got their information from Jones.ged, I need to know that. It is what lets me determine how to assess the information. Trying to make the site look legit by removing the source labels on iffy data is virtually fraud. --Judy (jlanoux) 10:15, 6 July 2010 (EDT)
- Judy has finally convinced me that removing gedcom sources is a bad idea. I don't like them, but it's better to have them than to not have anything. The system won't automatically try to exclude gedcom sources from an upload, and I don't think we should tell people to remove them. Asking people to add the gedcom origination information, as already suggested, is a good idea.--Dallan 16:06, 6 July 2010 (EDT)
- But, if there are good sources, then the Jones.ged is redundant and should be removed. I just don't like removing them when there is no other source. --Judy (jlanoux) 20:58, 6 July 2010 (EDT)
- I don't understand how the fact that something comes from Jones.ftm tells you anything more than the information being unsourced. Both are equally untrustworthy, and the former is 1) ugly and 2) implies that it is an acceptable source, which it is not.--Amelia 15:00, 7 July 2010 (EDT)
- I disagree. All gedcoms are not created equal -- and should not be disregarded as equally. Even the data and information within those GEDCOMs may have varying degree of reliability. Those bits of information without supporting source references should probably be considered "unreliable" (that's your Quality Check when recording it as a source). A GEDCOM downloaded from the notorious World Family Tree or One World Tree may border on near-junk genealogy, while facts in a GEDCOM sent to me by my second cousin based upon first-hand knowledge and family records may be "questionable" or "secondary" in the quality check block. All of it is data, a place to start, a foundation upon which to build, document, substantiate, prove or disprove. Yes, some files may be ugly, some may be unsourced, and some may be untrustworthy, but it is still information. The problem for us here at WeRelate as we grow and draw a larger audience is not really the addition of new, unsourced, so-called ugly information contained in poorly sourced GEDCOMs, but is having novice, untrained genealogy buffs overwrite more well-researched information already in our database with lesser quality data (i.e. junk) out of ignorance or misplaced uneducated enthusiasm (or, God forbid, malicious intent). --BobC 17:13, 7 July 2010 (EDT)
- I agree with everything you say BobC (except for the use of the 'quality check' which I think is used differently by everyone and is thus a pretty useless field for us to have here). But I digress. My objection was to the objection to removing sources such as Jones.ged that are already on a page -- not to the data they are attached to. You, as the uploader, may know the difference between Jones.ged (from your sister) and Smith.ged (from WFT), but I don't. Unless it's noted, and then it might be an actual useful distinction that would be worth keeping. But with no more information, there's just no point to keeping the ugly, misleading labels to non-existent, non-traceable sources. I would never recommend removing data just because it's un- or poorly sourced (in the absence of properly sourced more reliable information), but saying that the information comes from someone's gedcom tells the community precisely nothing useful, and may tell novice users that it's okay to have such sources in their data.--Amelia 12:57, 9 July 2010 (EDT)
Instructions for creating a gedcom [6 July 2010]
These should be reviewed before we tell people to use them. Some of them were obsolete last year when I joined. Some are wrong in light of the gedcom review changes. Things like "don't privitize" need to be added where appropriate. --Judy (jlanoux) 10:19, 6 July 2010 (EDT)
What is gedcom [6 July 2010]
I think the definitions of gedcom can be set aside. Someone who wants to upload knows he has the file. But I think we should address the "Why do we make you do this stuff?" question.
- We Relate is different because:
- We're not just storing your file in a gedcom dump
- We are creating a web page for each person and each family and each source in your file
- Your data is merged into a shared, public database, you become part of a community. It's not about "you" anymore. The other users rely on you to have reliable information that is in the standard format.--Judy (jlanoux) 10:27, 6 July 2010 (EDT)
Incorporated your ideas [6 July 2010]
Thanks much for the thoughtful and common sense recommendations above. I've tried to incorporate most of them into the draft. Please review and comment when you've had a chance to look at the page again.--BobC 13:29, 6 July 2010 (EDT)
- Thanks for doing that. I've been ill and didn't trust myself with the draft. I'll give it another look tomorrow when my head is clearer. --Judy (jlanoux) 21:16, 6 July 2010 (EDT)
Matching existing pages [6 July 2010]
I added a few notes to this section; I hope others will elaborate. I'm hoping that if we can give clear instructions here on how to update existing pages, we'll save ourselves a lot of re-work from people making bad updates.--Dallan 17:05, 6 July 2010 (EDT)
Source references rewrite [10 July 2010]
Just to explain a little, I rewrote the sources section because it's completely irrelevant to the upload process how Source pages are titled. There is no reason to freak people out with those complicated page title discussions, particularly when there is no benefit to renaming sources. The optional matching process is based on searching, and will find sources that use similar words regardless of whether the title is precisely correct or not. What is far more important is that people be able to tell what the source is trying to cite, and as long as that information is in there, I think it's low priority for us to care about format.--Amelia 17:20, 10 July 2010 (EDT)
Overall rewrite [17 July 2010]
I've been busy with a book project and have only just had a chance to read the page and the "Talk" comments. I think it's pretty good as it is now. I've gone through and rewritten lightly, for smoothness and to fix grammar and such (and because I'm presently in editorial mode and can't help it). There's also an extended chunk on the details of family matching down toward the end of the page which seem badly out of place, and which I've temporarily commented out behind the scenes. That block of text needs to be woven into the shorter section higher up the page on the same subject. (What kind of "test" is going to be involved here? Essay or multiple-choice?) --MikeTalk 19:29, 11 July 2010 (EDT)
- Excellent job of wordsmithing the help topic, Mike. Thanks. --BobC 01:07, 17 July 2010 (EDT)
Additional requirements [30 July 2010]
It looks like this page is just about ready! The only thing that I think remains is to give users direction about how to not degrade existing pages when they edit them -- don't remove existing good sources, etc. User:Jrich and others, would you like to take this on?
I'm thinking that I would add one or more multiple-choice questions after each section. The reader would have to answer all questions correctly in order to upload their gedcom.
In addition to passing the quiz, we could require one of the following. I don't think we can require both:
- The user must enter one of the families from their GEDCOM by hand. This would result in a match when they uploaded their GEDCOM, so we could encourage them to match this family first.
- We could require the user to upload a small GEDCOM (say less than 50-100 people) before uploading a large GEDCOM. Uploading the large GEDCOM later will require them to match everyone in their small GEDCOM.
Thoughts?--Dallan 16:52, 30 July 2010 (EDT)
Asking people to edit matched pages after import [19 August 2010]
Judy and I have a proposal to ask people to edit matched pages after the GEDCOM import is complete. This would represent a significant change to how people import gedcoms, so I'm posting it on the Werelate talk:Watercooler. I'd love to get feedback.--Dallan 18:13, 19 August 2010 (EDT)