WeRelate talk:Suggestions/Format Date Field

Refinement of date edits [6 October 2020]

Question to the WeRelate community:

The following dates are not well-formed (according to GEDCOM and WeRelate standards):

  • Bet 10 Oct and 15 Nov 1810
  • From Oct to Nov 1810
  • Bet 5 and 18 Oct 1810

I would like to automatically change them to (respectively):

  • Bet 10 Oct 1810 and 15 Nov 1810
  • From Oct 1810 to Nov 1810
  • Bet 5 Oct 1810 and 18 Oct 1810

That is, for bet/and and from/to, if the first year is missing, use the second year and if the first month is missing, use the second month. Are these reasonable assumptions to make? Does anyone object?--DataAnalyst 19:36, 6 October 2020 (UTC)

To me, that seems like the intuitive interpretation, so would seem like a good idea. If you wanted to play it conservatively, could you put the answer you think is right and leave the original string stored in parentheses? --Jrich 20:51, 6 October 2020 (UTC)
Sure could. Makes for a very long string, but if someone objects, they can review and remove the parenthetical portion. I like human review rather than too much automation - this sounds like a good compromise.--DataAnalyst 21:30, 6 October 2020 (UTC)

Date Qualifiers [7 October 2020]

The date qualifiers are often misused and perhaps under-defined in the GEDCOM specification.

CAL is for calculated dates. To me this is calculates to a single values, such as age 67 y. 3 m. 10 d. at death can calculate a specific birth date.

EST is an algorithm using some other event date. To me this is age 67 at death, which actually could allow for two different birth years, depending at what point in the year they were born and died (perhaps more given certain ambiguity in age at death, i.e., age 67 versus in 67th year). But, is a birth 2 years from sibling an algorithm? (N.B. use of EST should ideally include specification of the algorithm used by the poster so it is clear to the reader.)

ABT means the date is not exact. This is a bad definition, it is ambiguous. Does it mean the date is not specified exactly (i.e., ABT Oct 1813), or the date is not known exactly (something that is also true of CAL and EST). Based on documents I find on the Internet, I assume the later, the implication being that the previous qualifiers do not apply, so more of a guess and not based on any calculation or algorithm.

So having expressed some of my frustration with the poor definition of these qualifiers and the resulting common misuse, which I think makes easy design of these issues hard, but is largely a side issue...

I think if you are going to replace SAY, it should be changed to ABT and not EST. It is basically a guess with no real basis, much closer in meaning to ABT.

I like SAY for various reasons, but appreciate it is not in GEDCOM, even if commonly seen in literature. --Jrich 16:02, 7 October 2020 (UTC)

Hi. I agree that the GEDCOM specification does not define the qualifiers very well - leaving them open to interpretation. And I would use ABT and EST the opposite the way you interpret them. If I have an age from a census record (and I assume it is accurate), then I know the person was born in year X or year X+1. That means they were born "ABT year X" - an imprecise year. Even though I know that ages on census (and other records) are unreliable, I would still use ABT, as it is based on the information available and all I did was a calculation (similar to CAL, but less precise), not an algorithm.
I found this interpretation of "an algorithm using some other event date" on the web:
Estimated 1850 - Use estimated when you are basing your guess on some parameters. For example, if I estimate someone's marriage date based on the age of their oldest known child or I am estimating it based on the groom being about 21 and the bride being about 18, it is still a guess but I have considered some external data.
This matches how I interpret EST - more of a guess based on other information. I would say, based on where I see EST used, that others interpret it the same way. If you read WeRelate's Help page on date conventions, I think you'll find that it matches this interpretation of EST. This is why I think that "say" should be mapped to EST - it is a guess, often used to distinguish one person from another, or to support the listing of children in a family in an orderly way.

But I think that a method based on other events is inherently an algorithm, not a "guess". So I don't see EST as a guess at all. Personally I see no difference between the EST and ABT tags as defined (I don't use EST). I think an estimate is an educated guess, and any attempt to give a date not known exactly will apply presumed patterns (the pattern being the algorithm) to other known data, i.e., an estimate. However the definition of ABT is less specific, indicating nothing about how it was arrived at, so if either corresponded to what I think of as a flat-out guess, it would be ABT.
To me, the characteristic of SAY is that it is very imprecise. I am SAYing this but it could be wrong. A recent case where I used it, for example, a known child had no age at death, and could fit into any of three available gaps between siblings over 15 years. Or, a childless couple, first evidence of their marriage at age 50, but who knows, it could be anytime over the previous 30 years.
Neither EST or ABT have any implied precision in their definition. There is no distinguishing between a pretty tight estimate versus a loose estimate. It would help to be able to identify the loose case so collaborators could better understand a posting, but as noted, SAY is not part of GEDCOM, and there appears to be no way to do this.
The loss of SAY is not huge because its use can often be avoided. Many times those situations are best handled with BEF and AFT qualifers, and that is typically what you would see in an academia article: give the provable, not-wrong assertion rather than a loose guess that can be misleading. For example the marriage above would be best represented BEF xxxx. Your marriage at age 21 example would be more accurately represented as birth BEF xxx (because generally 21 was when he could first marry but certainly not all men married at age 21, sometimes not until their 30's or later). Or a man is said to have died BEF xxxx when his heirs sold his land. The last previous event many have been 20 years prior, but the fact that is provable is that land was sold and he was called deceased. --Jrich 21:00, 7 October 2020 (UTC)

Given the looseness of the definitions and the lack of consistency in usage, we'll probably just have to agree to disagree. In practice, what I see is that ABT is used pretty consistently when an age is given, and EST is more likely to be used when there is less information (think of WFT EST year ranges, which probably influence how many people think of EST). The same web page I cited earlier gives this guidance for using "about"

About 1850 - Use About when you are fairly certain you are within a year or two

Not to say that the web page is an authority, but it does indicate how others view "about". What I think has happened is that people don't use EST, but use ABT for all approximations and estimates, so it is hard to know how to interpret it.

I tried to find something in a book or journal article and the first thing I found was the Bulkeley genealogy (p. 13) in which the birth year of Rev. Edward Bulkeley is given as "abt 1540" (page 14 says "born not far from 1540", apparently based on dates of his education) and the birth date of his wife (Olive Irby) is given as "say 1547" with no explanation, so presumably based on the approximate marriage year. The author (Jacobus) chooses to use "abt" and "say" with apparently different intentions, and I would say that he estimated Olive's birth year. He also uses (p.13) "say" to interpolate birth years between generations - again, I would call that an estimate.

I'm going to go ahead with converting "say" to "est" because I do think it reflects someone's estimate. I kind of like "say" myself because of its connotation but as my husband pointed out, it probably confuses the heck out of people who don't understand English well.--DataAnalyst 22:54, 7 October 2020 (UTC)

I can see what you are saying, but I don't see anything in the definition of ABT in GEDCOM that says it is limited to being based on a relationship to another event or a calculation based on partial knowledge (i.e., age at death or age in census). The word "estimated" to me carries a connotation of forecasting and predicting, implies some analysis went into it, making it more than simply a guess, while "about" simply means in the neighborhood with no definition of required precision or amount of basis necessary. Of course, usage at WeRelate isn't consistent and I think you will find that most people simply use ABT for all cases, not being cognizant of the GEDCOM spec. --Jrich 00:08, 8 October 2020 (UTC)

Hiccups [8 October 2020]

Currently looking at Person:Sarah English (10)

Birthdate 5 Feb 1703/04 says "suggested: 5 Feb 1703/4". Project page says "ensure split-year dates are formatted as yyyy/yy". GEDCOM says [ <NUMBER> | <NUMBER>/<DIGIT><DIGIT> ] which says both digits are required, no optionality indicated.

Also, burial has no date, just a location, is generating "Incomplete date". --Jrich 18:48, 8 October 2020 (UTC)

Thanks. Defects in the code. I've fixed these in the sandbox and will get Dallan to deploy the fixes in the next day or two. Please keep reporting issues, as it is very hard to think of all cases to test.--DataAnalyst 19:24, 8 October 2020 (UTC)
Corrections have been deployed to production.--DataAnalyst 02:07, 9 October 2020 (UTC)
Thank you. Your communication is excellent. --Jrich 02:47, 9 October 2020 (UTC)