GEDCOM Review Procedure [14 March 2010]
Help:Administrators' guide/Review GEDCOM has some how-to info. Since the process was new, I've had to feel my way so here are some guidelines based on my experience:
Thanks for any help you can offer. This class I'm teaching will have me tied up through the end of March. --Judy (jlanoux) 20:06, 14 March 2010 (EDT)
Sunday night update [22 March 2010]
Jennifer, I updated the log to add some new files and got the log arranged in the same order as the admin gedcom page. I sent a few messages. I'll try to check in tomorrow night when I get home. I notified Dallan and Solveig of the flood of files. Holler if you need more help. --Judy (jlanoux) 23:54, 21 March 2010 (EDT)
Major changes to the gedcom upload process [16 Nov 2010]
16 Nov: Major changes to the gedcom upload process:
Gedcom's currently in-process
Gedcom's currently in-process will have to be given some special consideration. All warnings generated by the old process are being counted as warnings (1 point), even if they are really alerts (0 points) or errors (2 points) when we calculate the warning rate. Also, people before 1750 are not marked as excluded and definite duplicates are not computed, because all of these things happen only in the new uploader. Finally, edits done by the user are not taken into account when counting warnings, so you may see a page where the warning has already been fixed by an edit. (For example, Crystalw already corrected their one error, but the system doesn't recognize that -- I've left them a message already.) Users won't be able to edit going forward, so this won't be a long-term problem. But it's something to be aware of for the next couple of weeks.
It's quite likely that there are some bugs with the new process. If you notice any, please let me know. Thank you!
Questions about gedcoms in the queue and warnings [17 December 2010]
Dallan, I have a few questions/comments so far based upon your recent changes
When a person is excluded on the Person Tab, it seems to also exclude their Family as well. This does not work in the following scenario: There is a person in the gedcom who is Unknown (but with a page) and married to somebody who is known. When you exclude the Unknown person (so that you don't create an empty page), the Family he/she belongs to is also excluded. However, their spouse and other family details may be known. --Jennifer (JBS66) 14:59, 16 November 2010 (EST)
Sorry, another comment... you say "People can no longer edit pages during upload. It's easier for them to simply edit the information using their desktop genealogy program and re-upload.". In the Dutch gedcoms I've processed recently, making simple edits at gedcom upload stage is common. I will review their file and note small errors, like a patronymic name incorrectly in the surname field (which would produce an incorrect title). This can be quickly fixed here - rather than asking the user to make the edits and reupload. --Jennifer (JBS66) 15:03, 16 November 2010 (EST)
RIN numbers are now being ignored (it appears that _UID was already being ignored). If you see tags that are showing up in the text when they shouldn't, please let me know.
Excluding families has worked that way for a long time -- as soon as one spouse is excluded, the entire family is excluded. But I agree that it should work the other way -- a family is excluded only if both spouses are excluded. I just fixed it.
It's possible when both spouses are excluded that the family may still have included children. In this case the family page doesn't get created, so there's no way to know that the children are siblings. But the alternative, creating an "Unknown and Unknown" family page containing just children, seems odd to me. I could be talked into it though, and change the rule so that a family page gets excluded only if all family members are excluded.
I don't mind opening up edits for admins. But users have complained that editing here and then editing on their desktop was a pain. (Why they didn't figure out for themselves that editing on their desktop and re-uploading was easier I don't know.) Also, the warnings don't take edits into account - after you edit, the warning is still there. So I'd rather not open up editing for end-users. Let me know if you'd like me to open them back up solely for admins.--Dallan 16:33, 16 November 2010 (EST)
I'd like to respond to the comment "People can no longer edit pages during upload. It's easier for them to simply edit the information using their desktop genealogy program and re-upload." The 2 times I have uploaded (or tried to upload) a significant GEDCOM (about 250 records each), it must have taken me about 2 hours to match sources. This is the first step I do, because in my experience with my first small test file, it didn't make sense to review individuals until they had been matched, which only occurs as part of family matching, and you have to do sources before family matches or you risk having a lot of MySource records created and used in updating existing families. Hence, I find the safest thing to do is first check warnings, and then match sources. Once I have invested 1-2 hours in that, I'm sure not going to start over again to fix one or two minor editing problems. If I can't edit the GEDCOM, I'll make a note and fix them later - maybe not a huge deal, but I wouldn't say that it is easier to go back to the desktop software. It all depends on how much time you have already invested in matching sources, places and families - it could be hours if you are being careful.
Please don't anyone respond by suggesting that I upload smaller files. 250 records is less than 5% of my database, and is hard enough to separate that from an entangled family tree. I was hoping to progress to files of 500 records at a time, not regress to smaller files.
On a related note, I look forward to the day when source matching is somewhat automated (maybe leveraging an improved source searching algorithm). At a minimum, if I matched a GEDCOM source description ABC in a previous load, it would be nice if it were automatically matched to the same WeRelate source in subsequent GEDCOM loads. --DataAnalyst 22:12, 4 December 2010 (EST)
Regarding warnings [16 November 2010]
Dallan, can you take a look at senn.ged? It has a warning level of 2.2%, which causes the user to not be able to match pages.
The 4 Errors that = 8 points out of his 9 error points are based solely upon 1 error. In his file, one person has 4 OCT 970 as the death date instead of 4 OCT 1970. The errors generated as a result of this 1 mistype are: Birth is after death (which makes no sense to me here), An event occurs before birth, An event occurs more than a year after death, and Marriage occurs after the death of husband. Is this how strict you intend this to be? --Jennifer (JBS66) 20:35, 16 November 2010 (EST)
Watching pages before gedcom is uploaded [22 November 2010]
Dallan, User:Ekjansen observed the following with the new uploader: a user is marked as a watcher of the pages that appear on the Family Matches tab before any matches are made, and before the file is uploaded. Prior to uploading his file, he shared 6 pages with a certain user. After uploading his test ged last night, that number jumped to 97. At that point, he did not match pages or upload the file.
He has since removed that file, uploaded another, and is matching his families. It would make more sense if Watching a page happened at this stage, otherwise anybody uploading a gedcom will watch pages they may not even accurately tie into. --Jennifer (JBS66) 07:30, 17 November 2010 (EST)
Problem with warnings-alt events [23 November 2010]
Dallan: I'm looking at RuthL's file. She's one that is caught in the transition. There are a lot of warnings (13%) and thus she's marked as unable to import. This puzzled me as I had noted it as a pretty good file on initial review. The problem is that there are a lot of alt birth and alt death events triggering the event before birth and event after death warnings. I don't think alt dates should do this.
She's done about half of the Family Matches already so I don't know what we should tell her to do. I'm not sure if she has continued to work on the file. Review no longer shows what was edited. I'm more inclined to import rather than reject at this point, but we would need to save the warnings so she can work on them and also tell her to work her duplicate list after import (if she's still around.) --Judy (jlanoux) 11:17, 17 November 2010 (EST)
Why does the font spontaneously change like this? I don't do anything different that I'm aware of.--Judy (jlanoux) 11:18, 17 November 2010 (EST)
Child births less than 9 months apart [26 December 2010]
Is the system producing this warning for twins/multiples as well? --Jennifer (JBS66) 11:57, 17 November 2010 (EST)
Dallan, can you tweak the formula for this? As well as ignoring same-day births, can you extend that to births one day apart? Some twins are born on consecutive days. I just ran across an instance of this, that produced the Child births less than 9 months apart error. --Jennifer (JBS66) 07:50, 20 December 2010 (EST)
Cut-off Date 1750 [23 November 2010]
In my opinion this date is too general. It depends on the experience and trustability of the hobby-genealogist how far back the cut-off date is really reducing the quality of the data. My Dutch genealogy is going before this data on nearly all my pedigree-lines. I am not at all willing to type those data once again into WeRelate, due to inadequate data of other users.
Using Christening date to calculate warnings [24 November 2010]
It appears the christening date is being used to calculate a warning: "Husband died more than nine months before X was born". In a recent gedcom, husband died, but son was christened 2 years later (there is no birth date listed for the son). Should this error only be calculated on birth and not christening date? --Jennifer (JBS66) 17:22, 20 November 2010 (EST)
Pre-1750 Family Matches [23 November 2010]
Two people (spouses) in a recent gedcom were excluded due to being "early". However, they were matched to their family page already on WR and the user was allowed to perform the Family Match and update the pages. Is that what you intended? --Jennifer (JBS66) 18:52, 20 November 2010 (EST)
Trusted gedcom uploaders list [25 November 2010]
There is now a MediaWiki:Trustedgedcomuploaders list. Admins can add user names to this list. Once a user has been added to this list, for gedcom's uploaded by the user from that point on, the cutoff date for early births is changed from 1750 to 1550.
Judy/Jennifer, please let me know if you think 1550 is too late for trusted uploaders. I could set it earlier, say 1500. Because we don't break up families, anyone born before the cutoff who has a child born after the cutoff will be included. So the current cutoff date is effectively around 1520 for most people.--Dallan 20:39, 24 November 2010 (EST)
1750 cutoff [17 December 2010]
copied from User talk:JBS66
How was a decision made for a 1750 cutoff without any discussion on Watercooler? There was lots of discussion about protecting quality of data, and various suggestions, but I could find no proposal to implement a 1750 cutoff date. Was the discussion there and then removed? Did I blink (ok, I wasn't watching carefully for a couple of months) and miss it?
I'm all for data quality, and I understand why there have been many suggestions about limiting uploads, but if we're going to have a limit like this, then we need a process to get people onto the trusted users list. As somewhat of an expert in data quality, I will argue strenuously against forcing manual data entry for new records - manual data entry is the best way to introduce errors - both small ones like typos on dates and larger ones like skipping generations. All you have to do is compare the trees of many compilers to the sources they say they were using to see how true this is.
I'm sure I'm not alone in having invested significant effort in my tree in a software package on my own computer, and not being prepared to spend an equivalent number of hours manually entering and rechecking data anywhere else. Not to mention that the software I use makes some tasks easier than WeRelate does. Frankly, when I hit the 1750 limit today, I was prepared to "take my toys and go home". Then I assumed it was an error, since the "to do" list mentions a 1450 cutoff date - one I could live with. So now I guess I just need to figure out how to get on the trusted list.
I've also seen the comment that many pre-1750 people have been entered with nice bios and we don't want the data messed up. Fair enough - I agree in cases where the data is good quality, and we also obviously don't want yet more duplicates of those records. But any suggestion that WeRelate has even scratched the surface with pre-1750 people is laughable. There were approx 600,000,000 alive in the world in 1700 (not to mention the millions who died before then) and WeRelate has fewer than 2,000,000 individuals. WeRelate is supposedly still in beta (as per the home page), and is nowhere near close to a saturation point. If we want to protect "good quality" records - whether they are pre- or post-1750, we need a way to explicitly protect them from being updated during a merge (like the semi-protect already in place). That is very different from preventing the upload of brand new records. How to prevent duplicates will be an ongoing problem. Unfortunately, I don't think WeRelate is well-protected from manual entry of duplicates, either - the search needs some tuning before it is reliable in finding existing records (sorry, Dallan, I don't mean to criticize, but I have noticed some deficiencies in the search that I assumed would be addressed over time).
I know that saying I might "take my toys and go home" seems very petty and I don't like being petty, but in reality, if I can't upload my GEDCOM of pre-1750 people, I won't be contributing - just like I decided about a different genealogy Wiki that did not plan to support GEDCOM at all. Over half of my (approx 6500) records are pre-1750, and from my experience so far, at least 2/3 of those are not yet in WeRelate. Not only am I not willing to put in the hours required, but I expect that forcing manual data entry will actually lower the data quality, so I see no incentive in investing in WeRelate under these terms.
That said, I can easily see (and have spent hours fixing) the garbage in WeRelate resulting from indiscriminate uploading of GEDCOMS - so, back to the idea of a trusted user list. Let's figure out how to "certify" trusted users, and get that going. I was hoping to get my relatively high quality data into WeRelate early (as I assume it would be much less effort than if I am a late-comer to a mix of good and bad data). Can we "certify" a bunch of trusted users so that we can get as much good quality data in soon? --DataAnalyst 17:26, 4 December 2010 (EST)--Jennifer (JBS66) 18:17, 4 December 2010 (EST)
Ability to manually match individuals [17 December 2010]
I believe that the only person/family matching that occurs during a GEDCOM upload is to match families and the children within them. Why not individuals as well? I can understand that it would be risky to try to automate this, but shouldn't you allow end-users to match individuals manually? For example, if I upload John and Jane Smith and their children and their children's spouses, the family itself might be new, but one of the children's spouses might already be in WeRelate (without a spouse, and therefore not picked up in the family match).
The last time I uploaded a GEDCOM file, I went through every individual after the upload was complete, and searched for duplicates. I found several situations where a record already existed, but there was no way of doing the match in the GEDCOM process. I did the appropriate merges after the GEDCOM had been fully processed. But I bet I'm a rarety that way. It would not occur to most people that they needed to do this, since they would assume that all possible merges had already been suggested to them during the GEDCOM review.
I think that there should be a way to find/match individuals (just like places and sources), and instructions to encourage users to do it. Or, if you are worried about inappropriate matching (which is, of course, a risk), then at least encourage users to search for duplicates after the merge.
I hope this makes sense. I'm not feeling at my most coherent. Ask if you want a more complete explanation. --DataAnalyst 22:27, 4 December 2010 (EST)
Question re: 0 error GEDCOMs [12 December 2010]
Trying to help Judy out. So took a look at the GEDCOMs needing review, saw one from Janiejac that has 0 errors. Question: why does a 0-error GEDCOM need review? (I went and approved it.) Jillaine 11:15, 12 December 2010 (EST)
Rules for stale files [14 December 2010]
In an effort to further automate the handling of gedcoms, Dallan has updated the response email to advise that abandoned files will be removed in two weeks. It already explained that users with a high warning rate should fix their data and send a new file. So we will no longer need to send manual notifications to users. This was an odious task for me with a high stress factor so I am very grateful for some relief.
The process: We will assign a delete date of two weeks from the upload date. At that point if we have not had any response from the uploader, the file will be removed. Use the button on the overview tab. When I remove a file, I usually copy the notes from the log to the bottom of the page. It helps in case there are questions or the user sends the same file back (this happens a lot).
I think we are caught up with the files that were involved in the transition. Many thanks to Jennifer and Jillaine for helping with this.
I updated the log for files in queue and added dates. Feel free to check my arithmetic. --Judy (jlanoux) 18:53, 13 December 2010 (EST)
Pre-marital births generating errors [17 December 2010]
I just uploaded a small gedcom (Jacob Link) to see how things work now from the user end. (If Dallan's watching: this is a sample from my HUGE 11k Schwenningen GEDCOM.) The only errors generated were three, and they all had to do with a child being born before the marriage date. Whatever you may feel about it, this happened a LOT in the 1700s and 1800s in Germany-- one explanation is that in some towns, couples were not allowed to marry until the man could demonstrate his ability to support a family; in most cases that I have seen in this particular town, couples who have an out-of-wedlock birth end up marrying, usually within a year. (Although I have an ancestress whose parents did not marry until seven years after her birth! But that's rare.)
How important is it to maintain the current rule for births before marriage? Not the rule itself but the time between pre-marital birth and marriage? How do people feel about this rule being adjusted to allow a year? Or perhaps a checkbox as in other menus for exclude/include?
I was also curious to see that these three errors seemed to be the cause for generating the message "your gedcom will likely not be approved."
Seems a bit harsh-- three errors. But perhaps I'm missing something?
-- Jillaine 08:06, 14 December 2010 (EST)
I agree Jillaine. I've brought this up as an issue for the Netherlands gedcoms as well. Births before marriage was quite common.
You aren't missing anything with the harsh error warning. You had 3 "warnings" and 118 people, warnings count as 1 point, 3/118 = 2.6%. Over 2% 'may' not be imported (as well as not allowing you to make family matches). Over 4% 'cannot' be imported. --Jennifer (JBS66) 08:13, 14 December 2010 (EST)
You are right Jillaine, three errors are not worth mentioning, and as to children before wedlock: I could flood w.r. with them and quite a bit longer than a year before marriage too. Leo. --Leo Bijl 09:28, 14 December 2010 (EST)
The 'early birth' warnings only count half. That was intended to give enough slack to not cause a problem. I understand the European issue. Allowing a year sounds reasonable if that will help. We have to work on percentages. With small files that may only be a few. The admins can use their discretion whether to import. You have no idea how many file I wade through where 8 children born before marriage (because they don't belong to this family!). --Judy (jlanoux) 12:08, 14 December 2010 (EST)
I understand that admins can use their discretion regarding importing. Unfortunately, if inaccurate warnings are causing users to 1. see the "will probably be rejected" message and run, or 2. cause users to not be able to match their family matches, that is a problem. Is there anywhere on the Overview tab that gives users another option (like an appeal)? Instead, it instructs users to "Click on the Warnings tab, print the warnings, correct or remove the incorrect information in your desktop genealogy program, remove this GEDCOM, and upload a revised GEDCOM." From what I have seen so far, when the "errors" are in fact correct, users are removing questionable dates, and reuploading to bypass the warnings.
I don't see where early birth warnings are counting at 1/2 point. With Jillaine's file, they appear to be counted as 1. Errors count as 2, but these were 1 point warnings. --Jennifer (JBS66) 12:20, 14 December 2010 (EST)
Dallan as contact? [17 December 2010]
Why is Dallan's talk page and email address listed on the "if you have questions, contact..." Shouldn't we be pointing them elsewhere? Jillaine 08:44, 14 December 2010 (EST)
_NEW gedcom tag [26 December 2010]
Dallan, is this tag necessary, or can it be excluded from gedcoms? It appears to put text like this onto pages:
_NEW: Type: 1 Date: 15 NOV 2010 Time: 14:13:17 --Jennifer (JBS66) 10:52, 21 December 2010 (EST)
Matching Places [24 December 2010]
Judy, in response to your question about matching places, I believe the help page still says this process is optional. However, this is what I am experiencing with Dutch gedcoms: the percentage of places in the gedcoms of new Dutch users that don't match are high (the wijmenga file in particular is 94% unmatched). What happens is that gedcoms for the U.S. usually have at least a state or an abbreviation, so the places match relatively well. With the Netherlands being a small country, many people put just the town name, because duplicate names are more rare. When there are duplicate town names, they will add the gemeente (like our county) or province to disambiguate, and at those times, the places match. It would be a helpful feature for international users (not just the Dutch) to have a drop-down box to indicate their gedcom's primary country. Then, if a place in a gedcom is just Leeuwarden (which wouldn't be matched) the software could try to match it with their primary country, and then Leeuwarden, Netherlands would match up. This, by the way, is not my original idea. It was suggested previously by a Dutch user.
This is becoming a problem on the Dutch "side" of WeRelate. If we want to search WeRelate for the keyword Friesland (a Dutch province) or the Netherlands, pages that contain red-links are not found. It also requires that volunteers fix these pages, which is unfair. In the case of wijmenga, approximately 2000 pages will be created from that file, with a very small percentage containing linked place pages. If I can have the user correct 200 places on his end, that saves somebody from having to fix 2000 pages in the future. Optimally, with an idea such as the drop down box, places wouldn't need to be fixed on either end. --Jennifer (JBS66) 07:32, 24 December 2010 (EST)
Thanks for explaining. Always something to learn about Dutch practices. It never occurred to me there would be only a town. Depending on what program they use, there should be a facility for fixing this. I spent a while with TMG cleaning up and standardizing my places before upload. It is good if you can get them to do that. --Judy (jlanoux) 09:02, 24 December 2010 (EST)
Another problem warning [27 January 2010]
I just came across a case where double-dating triggered an "event before birth" and a "Birth after death" warning. It appears to have been counted as two errors.
birth: 6 May 1699 death: 11 Feb 1699/1700
The other two warnings in this file were father 70 years old.--Judy (jlanoux) 14:17, 30 December 2010 (EST)
UPD tag [27 January 2011]
I'm baaaack [27 January 2011]
Sorry. Between holidays, sickness and the need to generate some much needed income, I had to disappear for awhile. I'm back to help on gedcom review. As I'm reading things as of today, I'm not seeing any that need review. Let me know how I can help. Jillaine 12:52, 25 January 2011 (EST)
Upcoming changes [1 February 2011]
I'll try to make the following changes by the end of this week:
--Dallan 14:38, 27 January 2011 (EST)
Check for given name containing Wife or Husband [9 April 2011]
Dallan, is it possible to do a check in gedcoms for a given name of Wife or Husband (perhaps it already does this...) I'm thinking that at the very least, an alert may be helpful so we can avoid creating John Doe and Wife John families! --Jennifer (JBS66) 16:02, 1 April 2011 (EDT)
On a related note (pun intended <g>), does the GEDCOM importer now remove numerals that are in a person's name? I've been cleaning up a GEDCOM from 2007 that has numerals in parentheses at the end of the given name, like this one. The number doesn't show up in the page title, but it's still there in the name. If it doesn't remove those now, is there any way that it could? -- Amy (Ajcrow) 22:51, 7 April 2011 (EDT)
Living checkbox is no longer check-able [20 April 2011]
The "Living" checkbox in the gedcom review program can no longer be checked or unchecked. The only way to have it unchecked it is to edit the page and put something in the death/burial date/place.
The converse however is not true: If a user edits a person that is marked dead but doesn't have anything in the death/burial date/places (this can happen for a variety of reasons: they were born more than 110 years ago, their children were born more than 90 years ago, etc.) the Living checkbox isn't unchecked if they save the page. So in the unlikely event that someone edits a page, enters something in the death date, saves the page so that the living box is unchecked, then decides that the person wasn't dead after all, re-edits the page, and removes the death date, the person will still be marked living.
There isn't an easy way around this unfortunately. An alternative, to allow people to check the living box whenever they want, but not be able to uncheck it unless something is in the death/burial date/place, seems counter-intuitive to me: I just checked this box but now I can't uncheck it -- why? The approach where the user can never check/uncheck the living box seems like the lesser of two evils because it seems like it will happen very rarely, and if the uploader gets into a state where they've marked someone dead but now they want to mark them living, they can always Exclude them, which is what they should be doing anyway.