User talk:Solveig/gedcom review


GEDCOM Review Procedure [14 March 2010]

Help:Administrators' guide/Review GEDCOM has some how-to info. Since the process was new, I've had to feel my way so here are some guidelines based on my experience:

  • Add new files to the log immediately after upload.
  • If it is a new user or first-time gedcom, review the file as soon as possible. Send them the first message in email and add footnote if they have any particular problem (e.g. lots of duplicates, bad name format, etc) that will need attention. In the message provide the link to the review for their file. Warn the user not to remove the exclude for living people if they have a lot of living in the file (very common).
  • About five or six days after upload send everyone (new or experienced) the second message.
  • It is helpful to indicate the Follow-up date in the log behind the user name so we can coordinate.
  • Our policy is to delete files 10 days after upload if we have no response from the user. Sometimes, if the file needs no extensive editing I sometimes will go ahead and upload it and notify the user. If we're late with the second warning, set the delete date for three days out.
  • The user may respond and ask for more time - usually when there are a lot of Family Matches. We will grant this, but put a FU date so we can check on progress - most never work on it and the file is eventually deleted anyway.
  • Experienced, regular uploaders (Delijim, BobC, HJL411...) do not need the canned reminders and usually handle their files on a timely basis. If one gets too stale, send a personal note asking about it.
  • Treat any file as an individual case and vary the process as you see fit. Leave notes in the log.

  • When the file is marked "Ready for Import", review it again and either Import it or Return to User Review and leave a note for the user about what needs fixing.

Thanks for any help you can offer. This class I'm teaching will have me tied up through the end of March. --Judy (jlanoux) 20:06, 14 March 2010 (EDT)

Sunday night update [22 March 2010]

Jennifer, I updated the log to add some new files and got the log arranged in the same order as the admin gedcom page. I sent a few messages. I'll try to check in tomorrow night when I get home. I notified Dallan and Solveig of the flood of files. Holler if you need more help. --Judy (jlanoux) 23:54, 21 March 2010 (EDT)

I added FU dates to the log entries to make it easier to deal with the volume. If we have heard from the user a progress check is indicated to see if they are actually working on it (most never do). To check progress I usually note the number of excludes or Family Matches or warnings edited - whatever the original problem was - and then look for changes on the FU date. If no progress, send 2nd message and schedule a delete date a few days out. --Judy (jlanoux) 11:47, 22 March 2010 (EDT)
I will try to come back later to help send messages. --Judy (jlanoux) 11:47, 22 March 2010 (EDT)

Major changes to the gedcom upload process [16 Nov 2010]

16 Nov: Major changes to the gedcom upload process:

  • People can no longer edit pages during upload. It's easier for them to simply edit the information using their desktop genealogy program and re-upload.
  • "Match Related" has been removed. It's better to have people match one family at a time.
  • Surnames with more than 3 words should now generate correct page titles
  • All living people are excluded.
  • You can't uncheck "Exclude" if the person has been checked living. (You can uncheck the living checkbox and then uncheck exclude in case we guessed wrong on living; hopefully people won't abuse this too much.)
  • People born before 1750 are excluded, with the exception that we don't break up families, so if they're a member of a family with at least one person born after 1750, then they're not excluded. This means that people born after 1720 are generally included. Excluded people can be added later by hand. We can change this cut-off date if you find it's too restrictive.
  • Warnings are now separated into "alerts" (e.g., husband and wife have same surname), "warnings" (e.g., child birth before marriage), and "errors" (e.g., child birth after death of wife).
  • Users can print the warnings list.
  • We restrict which gedcom's can be marked by the user as "ready to import". Gedcom's with definite internal duplicates (husband name, wife name, and marriage dates match) cannot be imported. Gedcom's with a warning rate over 4% cannot be imported. Gedcom's with a warning rate over 2% can be imported, but we tell the user that they'll likely be rejected, so you have some leeway there. To calculate the warning rate we count 1 point for each warning and 2 points for each error and divide the sum by the number of people being imported.
  • If a gedcom has internal duplicates or a warning rate >= 2%, the user can match existing wiki pages but cannot update them with the information in their gedcom. They would have to update these pages online. That way in the likely event that you decide to reject the gedcom, we won't have the problem of already-updated wiki pages.

Gedcom's currently in-process

Gedcom's currently in-process will have to be given some special consideration. All warnings generated by the old process are being counted as warnings (1 point), even if they are really alerts (0 points) or errors (2 points) when we calculate the warning rate. Also, people before 1750 are not marked as excluded and definite duplicates are not computed, because all of these things happen only in the new uploader. Finally, edits done by the user are not taken into account when counting warnings, so you may see a page where the warning has already been fixed by an edit. (For example, Crystalw already corrected their one error, but the system doesn't recognize that -- I've left them a message already.) Users won't be able to edit going forward, so this won't be a long-term problem. But it's something to be aware of for the next couple of weeks.


It's quite likely that there are some bugs with the new process. If you notice any, please let me know. Thank you!

Questions about gedcoms in the queue and warnings [17 December 2010]

Dallan, I have a few questions/comments so far based upon your recent changes

  1. Are UID and RIN numbers automatically removed now?
  2. Looking at a file like ‎paulus.ged - child birth before marriage is very common with Dutch Gedcoms.
  3. I feel the cutoff date of 1750 may be late for Dutch users, though I'd have to consult with them a bit more on this. --Jennifer (JBS66) 14:24, 16 November 2010 (EST)
UID and RIN numbers aren't automatically removed - I forgot about that. I'll do that later this afternoon. If you and Judy want to turn childbirth before marriage into an alert, that's fine with me. We could say "childbirth more than N years before marriage" is a warning, where N=5 or 10 say. Just let me know. Let me know also on the cutoff date.--Dallan 14:29, 16 November 2010 (EST)
I expected the 4% to allow enough slack to take care of the cases where a baby really came early. Jennifer, is this not enough to take care of the Dutch families? I know the system is a bit different, but am not familiar enough to judge. Dallan: Is it one strike per child or per family? --Judy (jlanoux) 19:54, 16 November 2010 (EST)
It's one per child.--Dallan 20:27, 16 November 2010 (EST)

When a person is excluded on the Person Tab, it seems to also exclude their Family as well. This does not work in the following scenario: There is a person in the gedcom who is Unknown (but with a page) and married to somebody who is known. When you exclude the Unknown person (so that you don't create an empty page), the Family he/she belongs to is also excluded. However, their spouse and other family details may be known. --Jennifer (JBS66) 14:59, 16 November 2010 (EST)

Sorry, another comment... you say "People can no longer edit pages during upload. It's easier for them to simply edit the information using their desktop genealogy program and re-upload.". In the Dutch gedcoms I've processed recently, making simple edits at gedcom upload stage is common. I will review their file and note small errors, like a patronymic name incorrectly in the surname field (which would produce an incorrect title). This can be quickly fixed here - rather than asking the user to make the edits and reupload. --Jennifer (JBS66) 15:03, 16 November 2010 (EST)

RIN numbers are now being ignored (it appears that _UID was already being ignored). If you see tags that are showing up in the text when they shouldn't, please let me know.

I created a sample gedcom in MyHeritage (the program that wouldn't allow you to exclude UID and RIN's). It looks like they were both ignored by the uploader - which is great! In case you want to see what it was doing before Dallan, take a look at ‎Dingley family tree - Ireland.GED that is in the queue. If you sort by name and choose one of the first 2 people who are excluded, they show how the UID's came in before. This user went through her file and edited them out for people who would be uploaded. --Jennifer (JBS66) 17:07, 16 November 2010 (EST)
I just checked the Dingley tree and it looks like RIN's and UID's attached to events are still being included in the event description (sigh). I'll fix this tomorrow.--Dallan 17:43, 16 November 2010 (EST)
RIN and _UID are being removed from events now.--Dallan 15:01, 18 November 2010 (EST)

Excluding families has worked that way for a long time -- as soon as one spouse is excluded, the entire family is excluded. But I agree that it should work the other way -- a family is excluded only if both spouses are excluded. I just fixed it.

It's possible when both spouses are excluded that the family may still have included children. In this case the family page doesn't get created, so there's no way to know that the children are siblings. But the alternative, creating an "Unknown and Unknown" family page containing just children, seems odd to me. I could be talked into it though, and change the rule so that a family page gets excluded only if all family members are excluded.

I would like the Family page where it is known people are siblings. When manually entering I use "Unknown Smith and Unknown" since at least the surname can be indicated. --Judy (jlanoux) 19:50, 16 November 2010 (EST)
Ok, how about if I exclude the family when there is only 1 non-excluded family member left (spouse or child). I'm thinking that a family page with just one member isn't very helpful, right?--Dallan 20:27, 16 November 2010 (EST)
That is perfect. I spend hours doing that manually when I review. We don't need one person families. I also exclude unnamed (place-holder) people. These are basically empty pages that their gedcom created. --Judy (jlanoux) 22:01, 16 November 2010 (EST)
I've added code to do that now -- single-person families without events/citations/notes will be excluded, as well as unnamed people without events/citations/notes.--Dallan 15:01, 18 November 2010 (EST)

I don't mind opening up edits for admins. But users have complained that editing here and then editing on their desktop was a pain. (Why they didn't figure out for themselves that editing on their desktop and re-uploading was easier I don't know.) Also, the warnings don't take edits into account - after you edit, the warning is still there. So I'd rather not open up editing for end-users. Let me know if you'd like me to open them back up solely for admins.--Dallan 16:33, 16 November 2010 (EST)

That would be a nice compromise - let the admins edit pages. VERY FEW regular users have ever edited a page that I can detect. --Judy (jlanoux) 19:50, 16 November 2010 (EST)
I'll do that. It will be back in place tomorrow.--Dallan 20:27, 16 November 2010 (EST)
Admins can now edit and unexclude people if they want.--Dallan 15:01, 18 November 2010 (EST)

I'd like to respond to the comment "People can no longer edit pages during upload. It's easier for them to simply edit the information using their desktop genealogy program and re-upload." The 2 times I have uploaded (or tried to upload) a significant GEDCOM (about 250 records each), it must have taken me about 2 hours to match sources. This is the first step I do, because in my experience with my first small test file, it didn't make sense to review individuals until they had been matched, which only occurs as part of family matching, and you have to do sources before family matches or you risk having a lot of MySource records created and used in updating existing families. Hence, I find the safest thing to do is first check warnings, and then match sources. Once I have invested 1-2 hours in that, I'm sure not going to start over again to fix one or two minor editing problems. If I can't edit the GEDCOM, I'll make a note and fix them later - maybe not a huge deal, but I wouldn't say that it is easier to go back to the desktop software. It all depends on how much time you have already invested in matching sources, places and families - it could be hours if you are being careful.

Please don't anyone respond by suggesting that I upload smaller files. 250 records is less than 5% of my database, and is hard enough to separate that from an entangled family tree. I was hoping to progress to files of 500 records at a time, not regress to smaller files.

On a related note, I look forward to the day when source matching is somewhat automated (maybe leveraging an improved source searching algorithm). At a minimum, if I matched a GEDCOM source description ABC in a previous load, it would be nice if it were automatically matched to the same WeRelate source in subsequent GEDCOM loads. --DataAnalyst 22:12, 4 December 2010 (EST)

How about if you edit first if you need to and then re-upload? Once you're pretty sure you don't need to edit anymore, then start matching sources and families. Or if you just need to make one or two edits, make them after the import is complete.
And yes, I agree that easier source matching would be very nice.--Dallan 20:13, 17 December 2010 (EST)

Regarding warnings [16 November 2010]

Dallan, can you take a look at ‎senn.ged? It has a warning level of 2.2%, which causes the user to not be able to match pages.

The 4 Errors that = 8 points out of his 9 error points are based solely upon 1 error. In his file, one person has 4 OCT 970 as the death date instead of 4 OCT 1970. The errors generated as a result of this 1 mistype are: Birth is after death (which makes no sense to me here), An event occurs before birth, An event occurs more than a year after death, and Marriage occurs after the death of husband. Is this how strict you intend this to be? --Jennifer (JBS66) 20:35, 16 November 2010 (EST)

Birth is after death is a valid warning in this case, since according to the gedcom he died in 970 and was born in 1889. We can raise the threshold - it's easy to change. However, instead of making this decision after looking at one or two gedcom's, I think we ought to wait for a few days and look at maybe 8 or 10. I think the goal is: when a gedcom is above the threshold (whatever it is), it's one that you would probably reject; when it's under the threshold, it's one that you would probably accept. I'm trying to get out of the business of my having to send messages to people telling them that we're rejecting their gedcom. I'd prefer to have the system tell them that.
In this case, it wouldn't take much for the uploader to correct the single date and re-upload. Or once we allow admins to edit (tomorrow) you could leave a message on the user's talk page and tell them to go ahead and import, and that you'll correct the date.--Dallan 20:56, 16 November 2010 (EST)
I must have misread the birth is after death warning - sorry - that makes sense now. Regarding editing that one bit of data, if I edited it, wouldn't the user still not have the opportunity to match their families because the warning % was calculated upon upload? Thanks for being patient with all my questions :-) --Jennifer (JBS66) 21:01, 16 November 2010 (EST)
That's true. Not allowing a user to edit existing pages when their gedcom has a lot of errors helps us avoid the case where someone has already edited pages but then we reject the gedcom, so it's a good idea in general I think. I'm not sure of an easy way to get around this, other than to ask the user to correct it in their desktop genealogy program and re-upload. Alternatively, we could count the number of distinct people and families appearing in the warnings list for the warning level, instead of the individual warnings. So a person with 4 warnings would count 1 point (instead of 4), and a family with 3 warnings and 2 errors would count 2 points (instead of 7).--Dallan 21:14, 16 November 2010 (EST)

Watching pages before gedcom is uploaded [22 November 2010]

Dallan, User:Ekjansen observed the following with the new uploader: a user is marked as a watcher of the pages that appear on the Family Matches tab before any matches are made, and before the file is uploaded. Prior to uploading his file, he shared 6 pages with a certain user. After uploading his test ged last night, that number jumped to 97. At that point, he did not match pages or upload the file.

He has since removed that file, uploaded another, and is matching his families. It would make more sense if Watching a page happened at this stage, otherwise anybody uploading a gedcom will watch pages they may not even accurately tie into. --Jennifer (JBS66) 07:30, 17 November 2010 (EST)

Are you sure? I believe the user is marked as a watcher once they click the "Match" button, even if they never click the "Update" button. I just tested this on the sandbox and that's the way it works there.--Dallan 15:01, 18 November 2010 (EST)
This is correct I was able to match, not to merge. Than I removed my gedcom (error quote too high) and I was a bit surprised to see my number of watched pages has grown! There is no need to finish the upload to become a 'watcher'.--Klaas (Ekjansen) 16:48, 18 November 2010 (EST)
This isn't an easy change. I'll put it on my todo list. In the meantime I suggest not matching pages when the warning level in your gedcom is very high.--Dallan 12:52, 23 November 2010 (EST)

Problem with warnings-alt events [23 November 2010]

Dallan: I'm looking at RuthL's file. She's one that is caught in the transition. There are a lot of warnings (13%) and thus she's marked as unable to import. This puzzled me as I had noted it as a pretty good file on initial review. The problem is that there are a lot of alt birth and alt death events triggering the event before birth and event after death warnings. I don't think alt dates should do this.

She's done about half of the Family Matches already so I don't know what we should tell her to do. I'm not sure if she has continued to work on the file. Review no longer shows what was edited. I'm more inclined to import rather than reject at this point, but we would need to save the warnings so she can work on them and also tell her to work her duplicate list after import (if she's still around.) --Judy (jlanoux) 11:17, 17 November 2010 (EST)

I had a similar situation with another user caught in the transition. He'd already made a bunch of edits and matches. I emailed him and said that if he promised to print the warnings and review them post-import, I would go ahead and import it. That's what I'd suggest in this situation too. Would you like to leave her a message or should I?--Dallan 15:01, 18 November 2010 (EST)
I'll try to get hold of her.
But what about the warning rule? Can Birth and alt birth be both looked at? And death and alt death? before creating an "event before birth" error. --Judy (jlanoux) 16:31, 18 November 2010 (EST)
I'll ignore alt-birth and alt-death events in the before-birth and after-death warnings going forward.--Dallan 12:52, 23 November 2010 (EST)

Why does the font spontaneously change like this? I don't do anything different that I'm aware of.--Judy (jlanoux) 11:18, 17 November 2010 (EST)
The font changes now when there is a space at the beginning of the line.--Dallan 15:01, 18 November 2010 (EST)

Child births less than 9 months apart [26 December 2010]

Is the system producing this warning for twins/multiples as well? --Jennifer (JBS66) 11:57, 17 November 2010 (EST)

It shouldn't be.--Dallan 15:01, 18 November 2010 (EST)
I wonder if this is only happening on gedcoms that were in place before your upgrade? If I look at ‎Polley.ged, one of those warnings is based on two children born 26 MAY 1892.
I just uploaded a sample ged with twins, and no warning was produced. --Jennifer (JBS66) 15:22, 18 November 2010 (EST)
Actually, the problem is between the last two children, born April 1899 and November 1899.--Dallan 15:24, 18 November 2010 (EST)
Urg! Sorry about that - I didn't even see that one... --Jennifer (JBS66) 15:30, 18 November 2010 (EST)

Dallan, can you tweak the formula for this? As well as ignoring same-day births, can you extend that to births one day apart? Some twins are born on consecutive days. I just ran across an instance of this, that produced the Child births less than 9 months apart error. --Jennifer (JBS66) 07:50, 20 December 2010 (EST)

Is it really worth the time it would take to do this? How often do you run across this?--Dallan 11:01, 20 December 2010 (EST)
Well, I wasn't analyzing how often it happens, but more that in this instance, it was an unwarranted error. I have another one that I came across today, Abt dates causing this same error. One child born Abt 1820, another born, say, 1 Jun 1820. This one did happen a lot in the gedcom I looked at. --Jennifer (JBS66) 11:21, 21 December 2010 (EST)
The about dates shouldn't be being treated as someone born +/- 1 year. So if you had someone born About 1820 and someone else born 1 Jun 1820, that shouldn't be a problem. If you can give me an example I'll look into it.--Dallan 18:24, 26 December 2010 (EST)
There are examples in Govegus' gedcom - such as de Roock Wijnand and de Kleijn, Jenneke (ABT 1820 & 10 SEP 1820) or Koetsier, Hendrik and Van Baaren, Helena (ABT 1820 & 15 MAY 1820). --Jennifer (JBS66) 18:39, 26 December 2010 (EST)
Thanks. I'll fix that next week.--Dallan 19:17, 26 December 2010 (EST)

Cut-off Date 1750 [23 November 2010]

In my opinion this date is too general. It depends on the experience and trustability of the hobby-genealogist how far back the cut-off date is really reducing the quality of the data. My Dutch genealogy is going before this data on nearly all my pedigree-lines. I am not at all willing to type those data once again into WeRelate, due to inadequate data of other users.
I think this could be maintained more individually. The admins dealing with a certain group of users should be able to determinate the cut-off data individually. When the data of the gedcom are suspicious at first or second sight, than the admin can block the upload and change the cut-off-date for this user.
I can understand and would also support to set a cut-off date for all newbies to 1750 or even more restrict as 1750. When they did editing and dealing with the system (not only 1-10 entries), than the date can be changed if wanted.--Klaas (Ekjansen) 09:19, 20 November 2010 (EST)

The cutoff date is in response to too many gedcom's containing bad information before this date. The admins and I are talking about possibly relaxing the cutoff date for selected users in the future.--Dallan 12:52, 23 November 2010 (EST)

Using Christening date to calculate warnings [24 November 2010]

It appears the christening date is being used to calculate a warning: "Husband died more than nine months before X was born". In a recent gedcom, husband died, but son was christened 2 years later (there is no birth date listed for the son). Should this error only be calculated on birth and not christening date? --Jennifer (JBS66) 17:22, 20 November 2010 (EST)

I'll see if I can ignore christening for that warning.--Dallan 12:52, 23 November 2010 (EST)
Going forward if someone has a birth but no christening, I assume that they were born somewhere in the 5 years before their christening and calculate warnings conservatively based upon that range. For example, if the husband died more than 5 years and 9 months before christning, you'll get an error. And if the christening date (not the five-year-earlier date but the actual christening date) was before the mother was 12, you'll get an error.--Dallan 20:58, 24 November 2010 (EST)

Pre-1750 Family Matches [23 November 2010]

Two people (spouses) in a recent gedcom were excluded due to being "early". However, they were matched to their family page already on WR and the user was allowed to perform the Family Match and update the pages. Is that what you intended? --Jennifer (JBS66) 18:52, 20 November 2010 (EST)

The data in the gedcom are complete with birth, christening and death, but these details didn't show up on the gedcom-side, only on the side of the existing page. --Klaas (Ekjansen) 01:48, 21 November 2010 (EST)
That should be fixed now. Family matches should no longer be calculated for people who are excluded.--Dallan 12:52, 23 November 2010 (EST)

Trusted gedcom uploaders list [25 November 2010]

There is now a MediaWiki:Trustedgedcomuploaders list. Admins can add user names to this list. Once a user has been added to this list, for gedcom's uploaded by the user from that point on, the cutoff date for early births is changed from 1750 to 1550.

Judy/Jennifer, please let me know if you think 1550 is too late for trusted uploaders. I could set it earlier, say 1500. Because we don't break up families, anyone born before the cutoff who has a child born after the cutoff will be included. So the current cutoff date is effectively around 1520 for most people.--Dallan 20:39, 24 November 2010 (EST)

Do you think I should make all admins trusted uploaders automatically? It's not that way currently.--Dallan 20:41, 24 November 2010 (EST)
Quick work - thanks. Why don't we just wait and since we're in the experimental phase. --Judy (jlanoux) 21:00, 25 November 2010 (EST)

1750 cutoff [17 December 2010]

copied from User talk:JBS66

How was a decision made for a 1750 cutoff without any discussion on Watercooler? There was lots of discussion about protecting quality of data, and various suggestions, but I could find no proposal to implement a 1750 cutoff date. Was the discussion there and then removed? Did I blink (ok, I wasn't watching carefully for a couple of months) and miss it?

I'm all for data quality, and I understand why there have been many suggestions about limiting uploads, but if we're going to have a limit like this, then we need a process to get people onto the trusted users list. As somewhat of an expert in data quality, I will argue strenuously against forcing manual data entry for new records - manual data entry is the best way to introduce errors - both small ones like typos on dates and larger ones like skipping generations. All you have to do is compare the trees of many compilers to the sources they say they were using to see how true this is.

I'm sure I'm not alone in having invested significant effort in my tree in a software package on my own computer, and not being prepared to spend an equivalent number of hours manually entering and rechecking data anywhere else. Not to mention that the software I use makes some tasks easier than WeRelate does. Frankly, when I hit the 1750 limit today, I was prepared to "take my toys and go home". Then I assumed it was an error, since the "to do" list mentions a 1450 cutoff date - one I could live with. So now I guess I just need to figure out how to get on the trusted list.

I've also seen the comment that many pre-1750 people have been entered with nice bios and we don't want the data messed up. Fair enough - I agree in cases where the data is good quality, and we also obviously don't want yet more duplicates of those records. But any suggestion that WeRelate has even scratched the surface with pre-1750 people is laughable. There were approx 600,000,000 alive in the world in 1700 (not to mention the millions who died before then) and WeRelate has fewer than 2,000,000 individuals. WeRelate is supposedly still in beta (as per the home page), and is nowhere near close to a saturation point. If we want to protect "good quality" records - whether they are pre- or post-1750, we need a way to explicitly protect them from being updated during a merge (like the semi-protect already in place). That is very different from preventing the upload of brand new records. How to prevent duplicates will be an ongoing problem. Unfortunately, I don't think WeRelate is well-protected from manual entry of duplicates, either - the search needs some tuning before it is reliable in finding existing records (sorry, Dallan, I don't mean to criticize, but I have noticed some deficiencies in the search that I assumed would be addressed over time).

I know that saying I might "take my toys and go home" seems very petty and I don't like being petty, but in reality, if I can't upload my GEDCOM of pre-1750 people, I won't be contributing - just like I decided about a different genealogy Wiki that did not plan to support GEDCOM at all. Over half of my (approx 6500) records are pre-1750, and from my experience so far, at least 2/3 of those are not yet in WeRelate. Not only am I not willing to put in the hours required, but I expect that forcing manual data entry will actually lower the data quality, so I see no incentive in investing in WeRelate under these terms.

That said, I can easily see (and have spent hours fixing) the garbage in WeRelate resulting from indiscriminate uploading of GEDCOMS - so, back to the idea of a trusted user list. Let's figure out how to "certify" trusted users, and get that going. I was hoping to get my relatively high quality data into WeRelate early (as I assume it would be much less effort than if I am a late-comer to a mix of good and bad data). Can we "certify" a bunch of trusted users so that we can get as much good quality data in soon? --DataAnalyst 17:26, 4 December 2010 (EST)--Jennifer (JBS66) 18:17, 4 December 2010 (EST)

The decision was made based upon comments from the volunteers who spend a lot of time monitoring incoming gedcom's. WeRelate is not a commercial venture and has no ability to hire full-time support staff. So we need to be careful about how we use limited volunteer resources, which means setting some limits on gedcom imports in order to reduce the monitoring workload.--Dallan 20:20, 17 December 2010 (EST)

Ability to manually match individuals [17 December 2010]

I believe that the only person/family matching that occurs during a GEDCOM upload is to match families and the children within them. Why not individuals as well? I can understand that it would be risky to try to automate this, but shouldn't you allow end-users to match individuals manually? For example, if I upload John and Jane Smith and their children and their children's spouses, the family itself might be new, but one of the children's spouses might already be in WeRelate (without a spouse, and therefore not picked up in the family match).

The last time I uploaded a GEDCOM file, I went through every individual after the upload was complete, and searched for duplicates. I found several situations where a record already existed, but there was no way of doing the match in the GEDCOM process. I did the appropriate merges after the GEDCOM had been fully processed. But I bet I'm a rarety that way. It would not occur to most people that they needed to do this, since they would assume that all possible merges had already been suggested to them during the GEDCOM review.

I think that there should be a way to find/match individuals (just like places and sources), and instructions to encourage users to do it. Or, if you are worried about inappropriate matching (which is, of course, a risk), then at least encourage users to search for duplicates after the merge.

I hope this makes sense. I'm not feeling at my most coherent. Ask if you want a more complete explanation. --DataAnalyst 22:27, 4 December 2010 (EST)

It's due to a combination of (1) extra coding that I haven't done, (2) I believe that it increases the difficulty for the user to match people that aren't in the context of families, and (3) I think that matching families catches most, though not all, matches.--Dallan 20:51, 17 December 2010 (EST)

Question re: 0 error GEDCOMs [12 December 2010]

Trying to help Judy out. So took a look at the GEDCOMs needing review, saw one from Janiejac that has 0 errors. Question: why does a 0-error GEDCOM need review? (I went and approved it.) Jillaine 11:15, 12 December 2010 (EST)

Errors are things like 12 year old fathers or 65 year old mothers. A file without errors still needs review for other problems. We exclude no-name and placeholder people, living (this also should be automatic now), look for bad name patterns,.... Families are checked and we exclude one-person families (these are common after living are excluded). Pages should be spot-checked for inappropriate material such as personal emails and phone numbers (common in Sources). FamilyMatch should be reviewed to make sure they didn't junt "no match" everything. And I give places a glance to make sure the system matches didn't run amok with mismatches. The uploader should do all these things. They never do. --Judy (jlanoux) 11:51, 12 December 2010 (EST)

Rules for stale files [14 December 2010]

In an effort to further automate the handling of gedcoms, Dallan has updated the response email to advise that abandoned files will be removed in two weeks. It already explained that users with a high warning rate should fix their data and send a new file. So we will no longer need to send manual notifications to users. This was an odious task for me with a high stress factor so I am very grateful for some relief.

The process: We will assign a delete date of two weeks from the upload date. At that point if we have not had any response from the uploader, the file will be removed. Use the button on the overview tab. When I remove a file, I usually copy the notes from the log to the bottom of the page. It helps in case there are questions or the user sends the same file back (this happens a lot).

I think we are caught up with the files that were involved in the transition. Many thanks to Jennifer and Jillaine for helping with this.

I updated the log for files in queue and added dates. Feel free to check my arithmetic. --Judy (jlanoux) 18:53, 13 December 2010 (EST)

I like this and support it. Jillaine 08:07, 14 December 2010 (EST)

Pre-marital births generating errors [17 December 2010]

I just uploaded a small gedcom (Jacob Link) to see how things work now from the user end. (If Dallan's watching: this is a sample from my HUGE 11k Schwenningen GEDCOM.) The only errors generated were three, and they all had to do with a child being born before the marriage date. Whatever you may feel about it, this happened a LOT in the 1700s and 1800s in Germany-- one explanation is that in some towns, couples were not allowed to marry until the man could demonstrate his ability to support a family; in most cases that I have seen in this particular town, couples who have an out-of-wedlock birth end up marrying, usually within a year. (Although I have an ancestress whose parents did not marry until seven years after her birth! But that's rare.)

How important is it to maintain the current rule for births before marriage? Not the rule itself but the time between pre-marital birth and marriage? How do people feel about this rule being adjusted to allow a year? Or perhaps a checkbox as in other menus for exclude/include?

I was also curious to see that these three errors seemed to be the cause for generating the message "your gedcom will likely not be approved."

Seems a bit harsh-- three errors. But perhaps I'm missing something?


-- Jillaine 08:06, 14 December 2010 (EST)

I agree Jillaine. I've brought this up as an issue for the Netherlands gedcoms as well. Births before marriage was quite common.

You aren't missing anything with the harsh error warning. You had 3 "warnings" and 118 people, warnings count as 1 point, 3/118 = 2.6%. Over 2% 'may' not be imported (as well as not allowing you to make family matches). Over 4% 'cannot' be imported. --Jennifer (JBS66) 08:13, 14 December 2010 (EST)

You are right Jillaine, three errors are not worth mentioning, and as to children before wedlock: I could flood w.r. with them and quite a bit longer than a year before marriage too. Leo. --Leo Bijl 09:28, 14 December 2010 (EST)

The 'early birth' warnings only count half. That was intended to give enough slack to not cause a problem. I understand the European issue. Allowing a year sounds reasonable if that will help. We have to work on percentages. With small files that may only be a few. The admins can use their discretion whether to import. You have no idea how many file I wade through where 8 children born before marriage (because they don't belong to this family!). --Judy (jlanoux) 12:08, 14 December 2010 (EST)

I understand that admins can use their discretion regarding importing. Unfortunately, if inaccurate warnings are causing users to 1. see the "will probably be rejected" message and run, or 2. cause users to not be able to match their family matches, that is a problem. Is there anywhere on the Overview tab that gives users another option (like an appeal)? Instead, it instructs users to "Click on the Warnings tab, print the warnings, correct or remove the incorrect information in your desktop genealogy program, remove this GEDCOM, and upload a revised GEDCOM." From what I have seen so far, when the "errors" are in fact correct, users are removing questionable dates, and reuploading to bypass the warnings.

I don't see where early birth warnings are counting at 1/2 point. With Jillaine's file, they appear to be counted as 1. Errors count as 2, but these were 1 point warnings. --Jennifer (JBS66) 12:20, 14 December 2010 (EST)

Warnings count half as much as errors. We can turn birth before marriage warnings into alerts, which count 0, or leave them as-is. I think we need to treat warnings as possible problems, not definite problems. So if you get a warning on something, we're not saying it's wrong. We're simply saying that you may want to review it.--Dallan 20:51, 17 December 2010 (EST)

Dallan as contact? [17 December 2010]

Why is Dallan's talk page and email address listed on the "if you have questions, contact..." Shouldn't we be pointing them elsewhere? Jillaine 08:44, 14 December 2010 (EST)

I have suggested that these be pointed to the Support page. Isn't that what it is for? Users can get much faster help there. --Judy (jlanoux) 12:02, 14 December 2010 (EST)
Thanks for pointing this out. If you notice places where my email appears, please feel free to redirect them to the support page, or tell me if it appears on places that aren't editable and I'll redirect them. These references generally date from before we had a support page.--Dallan 20:51, 17 December 2010 (EST)

_NEW gedcom tag [26 December 2010]

Dallan, is this tag necessary, or can it be excluded from gedcoms? It appears to put text like this onto pages:

_NEW: Type: 1 Date: 15 NOV 2010 Time: 14:13:17 --Jennifer (JBS66) 10:52, 21 December 2010 (EST)

I'll add _NEW to the list of tags to ignore.--Dallan 18:24, 26 December 2010 (EST)

Matching Places [24 December 2010]

Judy, in response to your question about matching places, I believe the help page still says this process is optional. However, this is what I am experiencing with Dutch gedcoms: the percentage of places in the gedcoms of new Dutch users that don't match are high (the wijmenga file in particular is 94% unmatched). What happens is that gedcoms for the U.S. usually have at least a state or an abbreviation, so the places match relatively well. With the Netherlands being a small country, many people put just the town name, because duplicate names are more rare. When there are duplicate town names, they will add the gemeente (like our county) or province to disambiguate, and at those times, the places match. It would be a helpful feature for international users (not just the Dutch) to have a drop-down box to indicate their gedcom's primary country. Then, if a place in a gedcom is just Leeuwarden (which wouldn't be matched) the software could try to match it with their primary country, and then Leeuwarden, Netherlands would match up. This, by the way, is not my original idea. It was suggested previously by a Dutch user.

This is becoming a problem on the Dutch "side" of WeRelate. If we want to search WeRelate for the keyword Friesland (a Dutch province) or the Netherlands, pages that contain red-links are not found. It also requires that volunteers fix these pages, which is unfair. In the case of wijmenga, approximately 2000 pages will be created from that file, with a very small percentage containing linked place pages. If I can have the user correct 200 places on his end, that saves somebody from having to fix 2000 pages in the future. Optimally, with an idea such as the drop down box, places wouldn't need to be fixed on either end. --Jennifer (JBS66) 07:32, 24 December 2010 (EST)

Thanks for explaining. Always something to learn about Dutch practices. It never occurred to me there would be only a town. Depending on what program they use, there should be a facility for fixing this. I spent a while with TMG cleaning up and standardizing my places before upload. It is good if you can get them to do that. --Judy (jlanoux) 09:02, 24 December 2010 (EST)

Another problem warning [27 January 2010]

I just came across a case where double-dating triggered an "event before birth" and a "Birth after death" warning. It appears to have been counted as two errors.

birth:  6 May 1699 
death: 11 Feb 1699/1700

The other two warnings in this file were father 70 years old.--Judy (jlanoux) 14:17, 30 December 2010 (EST)

I'll take a look at this during the upcoming set of changes.--Dallan 14:38, 27 January 2011 (EST)

UPD tag [27 January 2011]

I think _UPD may be another tag we can ignore. It came through in a recent gedcom from User:Redzji. --Jennifer (JBS66) 13:48, 24 January 2011 (EST)

I'll remove this tag during the upcoming set of changes.--Dallan 14:38, 27 January 2011 (EST)

I'm baaaack [27 January 2011]

Sorry. Between holidays, sickness and the need to generate some much needed income, I had to disappear for awhile. I'm back to help on gedcom review. As I'm reading things as of today, I'm not seeing any that need review. Let me know how I can help. Jillaine 12:52, 25 January 2011 (EST)

Nice to have you back!--Dallan 14:38, 27 January 2011 (EST)

Upcoming changes [1 February 2011]

I'll try to make the following changes by the end of this week:

  • try to handle split dates better -- done
  • ignore _UPD tag -- _UPD tag is already ignored; the gedcom in question had a bunch of _UPD (and _UID and other) tags written out as text inside notes. Those notes would need to be edited by the submitter prior to upload.
  • fathers 65-80 are an alert; fathers over 80 are a warning -- done
  • mothers 45-55 are an alert; mothers over 55 are a warning -- done
  • gedcom's with <= 5 error points are not locked out, even if they're small -- done
  • uploaders can edit pages again so long as the gedcom is not locked out -- done
  • make sure that the error percentages are being recalculated when pages with errors are excluded -- this appears to be true
  • birth <= 5 years before marriage is an alert; more than 5 years before marriage is a warning -- done

--Dallan 14:38, 27 January 2011 (EST)

I've made the above changes. I haven't tested them, but they were pretty straightforward so I'm hoping everything is fine. Would you please let me know if you notice any problems? Thanks.--Dallan 21:34, 1 February 2011 (EST)

Check for given name containing Wife or Husband [9 April 2011]

Dallan, is it possible to do a check in gedcoms for a given name of Wife or Husband (perhaps it already does this...) I'm thinking that at the very least, an alert may be helpful so we can avoid creating John Doe and Wife John families! --Jennifer (JBS66) 16:02, 1 April 2011 (EDT)

A few more:Wife/Husband in surname field; First Second Third & Mother in name fields --Jennifer (JBS66) 16:55, 1 April 2011 (EDT)
I'll add these to my list of "noise" words next week when I work on the gedcom uploader.--Dallan 22:47, 7 April 2011 (EDT)

On a related note (pun intended <g>), does the GEDCOM importer now remove numerals that are in a person's name? I've been cleaning up a GEDCOM from 2007 that has numerals in parentheses at the end of the given name, like this one. The number doesn't show up in the page title, but it's still there in the name. If it doesn't remove those now, is there any way that it could? -- Amy (Ajcrow) 22:51, 7 April 2011 (EDT)

I'm reluctant to change user-entered data. Could you bring this question up on the watercooler to see what others think? I'd feel better about moving it from the given-name field to the title-suffix, but I'd be ok with deleting it altogether if others are ok with that.--Dallan 22:56, 9 April 2011 (EDT)

Living checkbox is no longer check-able [20 April 2011]

The "Living" checkbox in the gedcom review program can no longer be checked or unchecked. The only way to have it unchecked it is to edit the page and put something in the death/burial date/place.

The converse however is not true: If a user edits a person that is marked dead but doesn't have anything in the death/burial date/places (this can happen for a variety of reasons: they were born more than 110 years ago, their children were born more than 90 years ago, etc.) the Living checkbox isn't unchecked if they save the page. So in the unlikely event that someone edits a page, enters something in the death date, saves the page so that the living box is unchecked, then decides that the person wasn't dead after all, re-edits the page, and removes the death date, the person will still be marked living.

There isn't an easy way around this unfortunately. An alternative, to allow people to check the living box whenever they want, but not be able to uncheck it unless something is in the death/burial date/place, seems counter-intuitive to me: I just checked this box but now I can't uncheck it -- why? The approach where the user can never check/uncheck the living box seems like the lesser of two evils because it seems like it will happen very rarely, and if the uploader gets into a state where they've marked someone dead but now they want to mark them living, they can always Exclude them, which is what they should be doing anyway.

I haven't forgotten about the promised changes to the gedcom uploader. They'll be made later this week.--Dallan 14:07, 20 April 2011 (EDT)
Users still can't edit their pages during gedcom review, right? So they'd need to edit their home software to add something to the death/burial date/place and reupload? --Jennifer (JBS66) 14:11, 20 April 2011 (EDT)
Users can edit their pages if the gedcom's warning level is below the "this file will likely be rejected" threshold (5 warning points or 2%). So yes, if they have a lot of warnings, they won't be able to edit anyone to mark them as dead. I realize this is a bit of a problem, but I'm not sure how to get around it. It seems that either we have to allow people to edit pages even if the warning level is high (say above 2% but below 4%), or we have to allow people to mark people as dead by unchecking the living checkbox even if they don't have a death date, or we have to accept that if someone uploads a gedcom with more than 2% warnings, they're not going to be able to mark people as dead.--Dallan 18:34, 20 April 2011 (EDT)