Genealogy Data Correctness
Genealogy systems can not do a lot to fundamentally validate the data they manage. By this standard, the only real data that can be checked is probably the syntactical correctness of a date string.
Genealogy systems typically provide a lot more than that however, because they do not try to find explicit individual errors. Instead, data is checked for consistency to see if some configuration of two or more facts is mutually exclusive. For example, a date of death before a date of birth. Knowing which is correct is impossible, but knowing that something is wrong is easy to determine.
This document is an attempt to keep track of possible consistency tests that could be applied within the framework of werelate, as well as ideas for how to make that information available for display.
Ordinary Consistency Checks
For a single user page
For a family page with associated parent and child pages
Separate Warning and Error Threashold Levels
A simple example of this situation is the case of a birth to a mother aged 48. It is not impossible, but it is unlikly enough to warrant a warning. A birth to a mother age 70 on the other hand, is plainly an error.
Threashold Levels adjusted for Era
Marriage and birth to a mother aged 16 would have been more typical in the 1600s then in the 1900s. It may be useful if warning and error threshold levels allow for a refinement based on the century in question.
Detection of Cycles
A child can not be their own parent or grandparent, but confusion in early records and incomplete information in the hands of particular researchers can easily create this type of defect. In principal, such errors could be searched out over the entire space of werelate genealogy, but it would be more realistic and probably just as useful if cycle detection was only implemented across the members of a given family tree.
Orphaned Tree Fragments
A werelate tree is a bit of a misnomer. It is really only a page that references a collection of other pages of various types. Under normal circumstances, the person and family pages of such a tree will represent a completely connected tree graph. Sometimes however, individual people or family pages can become detached from the tree at large, even though their name is retained in the werelate "tree". Detection of such discrepancies can be useful.
Presentation of warning information
In thinking about these areas, I realized that all represent warning conditions, and that a common warning detection and reporting strategy should serve them. Some ideas for this follow.
Warnings Page Companion for each Person and Family Page
Create a companion "warnings" page, for every person or family page. When a person or family page is seen to be older than it's associated warnings page, then the warning/duplicate logic could be triggered. Since it's common for a page to be edited several times before being left alone, and associated pages that would affect a warning might be similarly in transition, actual warning page regeneration would need to be delayed for a period after the last edit of a person or family page. When the warning page is regenerated, the logic would check if the warning content had changed. If so, then a new version would be checked in. If not, then the old version would be quietly updated to be newer than it's associated person/family page. In the event of a warnings page change, the system would also add a trivial entry to the associated "talk" page. In so doing, the user community for the page would receive notice that there are new warnings for a particular page.
Warnings Report for a Tree
While it may be useful to have a copy of warnings for any given person or family page that a user is working on, a user will probably want to see warnings on a higher level basis. All the warnings associated with a particular tree seems the most likely. Working from such a list of errors, the user should be able to gradually and systematically improve the quality of his tree's data until very few warnings remain. Expecting the user to manually walk the pages of their tree to see if individual pages have warnings (or have acquired them, due to other uploads and research) is utterly unrealistic.