WeRelate talk:Functional Specification for Data Consistency Verification


Family Pages [5 October 2012]

I am assisting a new user in cleaning up her data entry. This user entered multiple person pages for the same person with pages for every surname; maiden and married. There are also duplicate family pages; one with the maiden surname and a second with the married surname. There are also person pages that include a prefix and or suffix. I don't know exactly what criteria WeRelate would use to find pages with these types of errors; but we definitely need some type of system to identify new user problems. --Beth 10:14, 2 July 2008 (EDT)


I believe problems of that sort are the focus of work Dallan is doing to suppress duplication at upload time as well as complete rejection of GEDCOMs with too many problems.

The focus of this effort is identification of defects, including duplication, once data arrives in werelate.--Jrm03063 12:17, 2 July 2008 (EDT)

JRM these trees were not uploaded via gedcom; so I thought these problems might be relevant to this discussion. --Beth 12:27, 2 July 2008 (EDT)


Oh dear...--Jrm03063 12:48, 2 July 2008 (EDT)


I think we need to implement these checks twice: once whenever a page is saved, and periodically run over the entire database.

I'm thinking that rather than showing you a list of all warnings for a particular tree, how about a list of warnings on all of your watched pages?--Dallan 15:39, 5 October 2012 (EDT)

I can imagine three different check times.
  • Check-in time
  • Incremental - checks on a page that hasn't been incrementally checked since the last check in
  • Periodic - checks over the entire set of pages a few times a year
Probably more or less comparable to backup strategies.
I don't have a strong opinion on how resulting data should be presented. --jrm03063 16:57, 5 October 2012 (EDT)

Fact Assertions? [22 January 2013]

The fact assertions, presently being proposed/experimented with, are specifically intended to create additional opportunities for consistency verification. This spec should be reviewed/revisited after (and if) that is made a community standard. --jrm03063 18:45, 22 January 2013 (EST)


Surname consistency checking - request [30 November 2013]

I am all in favour of introducing data quality checks. I do, however, have a note about this spec:

Child surnames consistent with father if present, mother if unknown.

Could we ensure that this check is sophisticated enough to recognize patronymics and not flag them as errors/warnings. I realize that we are talking about a template that would allow the user to mark issues as "not a problem" (as my software calls it), but it would be pretty annoying to have to review several hundred people - not to mention the chance of inappropriately accepting an erroneous situation buried in a bunch of acceptable ones.

Also, I assume that "related names" on surname pages would be used to accept spelling variations. (A side benefit might be that asking people to review inconsistencies might be a good way to channel traffic to surname pages to add variant spellings.)

Thanks for starting this spec.--DataAnalyst 22:59, 30 November 2013 (UTC)