Are aliases needed for the qualifiers? [20 February 2009]
It may be desirable to add aliases to the given list of qualifiers, i.e., bef.or by, abt. or calc., est. or ca. --Jrich 12:35, 20 February 2009 (EST)
From Watercooler/Suggestions [4 January 2011]
- I attempted to write a guideline for nicer person and family events, but I didn't get far so I never submitted the work for review. Part of that was spent thinking about date formats. The most useful things I came up with were:
- Wikipedia has an extensive manual of style, with specific guidance on representing dates of birth and death. Included there is guidance on how to represent approximate dates, date ranges, and more. Werelate adopts this guide with caveats:
- If there is a page that already provides WeRelate guidelines, please point me to it. I could not find one using Search on the Help namespace and the WeRelate namespace. The only real comment I found on dates was preferably in the "Day Month Year" format (for example: "9 Aug 1812") which seems a little lacking. If wikipedia's guidelines were enstated (where is this documented?), my comment is that wikipedia's guidelines seem to have some of the same shortcomings that prompted the proposal in the first place.
- The motivating problem is frustration over seeing a date given as an unadorned year, which may have one of several meanings:
- that is all that primary sources recorded (and hence all that is likely to ever be known)
- it is within the year somewhere, but too lazy to provide the full precision
- it is calculated to this year (some evidence exists)
- it is estimated (warning: assumptions)
- One problem with the wikipedia guidelines is that there is no way to differentiate a calculated date from potentially groundless estimates. While there appears to be one example using "before", I saw no explicit guidelines about using before and after, which is very important in genealogy, as opposed to the obscure "fl." which I don't think I have ever seen in genealogical literature. Wikipedia says to convert dates to the Gregorian calendar which I believe is a mistake because it makes it difficult to compare to sources and it introduces a lot of possible conversion errors. Not to mention those exceptions that you pointed out. So, as stated elsewhere by me, since their purpose is different, it is quite possible different guidelines are needed for WeRelate. --Jrich 17:18, 20 February 2009 (EST)
- absolute genealogical preference for day-before-month forms.
- always abbreviate month names to their three-letter form. Never use a numeric.
- years before 1000 should include a leading 0.
- days of the month before 10 should not include a leading zero.--Jrm03063 13:28, 20 February 2009 (EST)
It's a good summary, but I need two points cleared up: why do we need a period after such terms as bef, aft, est and all the rest? And I've been using btw for between; should it be bet? The other point concerns Quaker dating. I thought it was more than just the year being different. Isn't there a 3 month difference? As when the Quaker records say he was born 12d 2m 1730, I have been using 12d 2m (Apr) 1730. I've always heard we should leave the date as we found it but can indicate a more modern conversion. Maybe I've misunderstood.--Janiejac 13:43, 20 February 2009 (EST)
- My primary goal is to push people to maximum preciseness, and to differentiate between calculated and estimated dates. I think there are several formats possible for the qualifiers, maybe multiple forms for each, as consensus (or Dallan) finds most suitable, but it more important to me that the meaning of each be clearly understood.
- In pondering this since I wrote it, I actually think abt. and ca. have an inherent ambiguity, so at this moment I would have written the page using "Calc" and "Say". "Say" is commonly used in literature to indicate an educated guess and seems to indicate what I mean by Est., and Calc obviously because it clearly indicates there is some reference point from which a date may be calculated, probably indicating somewhat of a higher relibility. I tend to like sentence case and periods because I type reasonably well and it is more like regular English (I hate reading Savage) but have no religious feelings about those attributes. (There is always the issue that Dallan would have to discuss the feasibility and desirability of trying to enforce incoming data, or clean up existing data, to some format, which may well be the make or break point for any convention to work.)
- I thought I did write that the date should be given in its original format in the source citation. I proposed a format using square brackets for indicating conversion, i.e., "12d 2m 1730 [12 Apr 1730]", because I think brackets are generally understood to signal editing of the original. For example, your form "12d 2m (Apr) 1730" happens to be exactly the format used in newer editions of the Encyclopedia of American Quaker Genealogy, and it would not be obvious without more information that you interpreted 2m to be April, or the author. (Probably disgressing, I tend to cheat, and if the source uses , I often change them to () to preserve the use of  for my comments. If I think that is too dangerous or confusing, then I have to use some more bulky method because it is very important to differentiate between what the source provides and what I interpret.)
- Regarding Quaker dating, Apr being the 4th month in modern numbering, isn't that only a 2 month difference to interpret "2m" as April? Before 1752, their month numbering was just the same as everybody else's. What I am not sure about, is whether they continued to use the old style month numbering after 1752, or did different things area by area, or just went along with the civil authorities? [P.S., quick check seems to indicate that they followed civil authorities, not really different than anyone else therefore, except that they tended to use month numbers more than than is seen in civil records, though it is seen there also, so this is not just a Quaker issue.]
- --Jrich 14:48, 20 February 2009 (EST)
Is there a reason to not use the ISO standard for dates?
I'd like to see a discussion of why we don't use it. Or how we decide which standards are chosen?--Jsadler 01:55, 21 February 2009 (EST)
Two quick reasons: 1) this standard's scope is for communicating Gregorian dates, and genealogy must also communicate dates when the Julian calendar was still in use (and before?), and 2) numeric dates are ambiguous (partly because the month numbering changed when the beginning of the year was shifted and partly because the ordering of the numbers must be understood by both the writer and reader of data which is problematical). Also, a minor nit, the form 6 Jan 1898 requires no punctuation to be clear (not even the spaces if you want to push it) and so is difficult to mess up. --Jrich 09:39, 21 February 2009 (EST)
Always more questions, sorry. How would the month names be handled for foreign sources? Keep in native language or translate to current language? Also how about Chinese or Japanese dating systems? (official documents would use the era dating system) Should I/we use dating system in place at the time of the record? This is similar to the Quaker discussion but I believe different in that they are independent systems.
--Jsadler 19:34, 21 February 2009 (EST)
I am most definitely unqualified to talk about dates in foreign countries. If you have experience with these perhaps you, or somebody else, would like to add a section. --Jrich 19:47, 21 February 2009 (EST)
Thanks, JRich, for putting this together. I think your proposal makes a lot of sense. I was not aware of the subtle distinction between "abt" and "est" (I had tended to use "abt" for both cases, or just put the year only, which I take as an implied "est"), but will try to follow these good suggestions going forward.
I just had a couple of questions.
One was already asked but not answered: are the periods really necessary for "bef", "aft", "abt", etc.? I have been using those, but without the period. Part of that is just personal aesthetic, but by way of more objective justification, I believe that those tags are used without periods in the GEDCOM standard grammar. Unless Dallan has gone to the trouble of adding periods, I think that anything automatically uploaded from GEDCOM will not have periods after those tags. (They're likely to be in all caps, too.)
On the question of alternate tags ("ca" instead of "abt", "btw" instead of "bet"), since we'd eventually like to be able to generate GEDCOM from WeRelate, I think it's best to stick to the standard GEDCOM tags. Thus, I'd vote for discouraging alternative tags.
The other question is for Dallan. Is there anything being recommended here that interferes with the software interpreting the date for purposes of sorting, matching, etc.? That has been my only hesitation in putting anything other than DD MMM YYYY format in the birth/death date fields. I assumed that "abt", "bef" and others may be safe. The one I particularly worried about was the pre-1752 double year business, you know, "22 Feb 1733/34" for George Washington's birthday. Does the software correctly understand that as "22 Feb 1734" for standard comparison purposes?--TomChatt 04:07, 22 February 2009 (EST)
The date is stored as text so the only issue with WeRelate software is when doing matching and searching where it must interpret it as a date, I believe. I think Dallan mentioned once that he ignored any qualifiers before the date, and uses the first year of double dating. So things like before, after and between get treated the same as the base date. --Jrich 09:49, 22 February 2009 (EST)
Your comment about GEDCOM tickled a nagging concern. GEDCOM is a genealogical standard and is a good place to look for advice, keeping in mind it is designed for computers to exchange information, not necessarily humans. I had inspected the GEDCOM specification before, and expected the date issue to be more prominent, so overlooked some of the detailed specification. Your comment prompted a more thorough inspection. In general, it is not greatly different, but suggests some alteration in my proposal may be good.
The GEDCOM specification does provide ways to give dates in many formats if you use different tags, including things like foreign month names, etc. Unless WeRelate were to add a field where one can identify a date format (like the type of Source), I suspect the date formats must be limited in WeRelate to the default form, which follows the basic double dating form (one of their examples: 15 APR 1699/00).
GEDCOM does support BEF, AFT, BET (though it uses the word AND instead of a dash with BET), FROM and TO (which seems superfluous to me), and INT. (I do not think INT would belong in a WeRelate date field, as interpretation probably should be shown/discussed in a source citation and the date field can simply give the final result. An GEDCOM example is not given, but I suspect Quaker dating might be one place it gets used, e.g., "INT 6 AUG 1666 (6 (6) 1666)", where a GEDCOM is communicating an interpreted date, followed by the non-standard text defining that date. .)
- A note about "from/to". The proper usage of "from/to" would be to describe a fact that was true for an extended period of time. For example, "lived in Boston from 1673 to 1690". "Bet/and", on the other hand is for a discrete event that took place between two dates, e.g., "born between 1670 and 1680". I did not take the liberty of updating the Help, where it advises against using "from/to" - but if you see my point, you may want to do so.--DataAnalyst 20:31, 4 January 2011 (EST)
- I see your point, though though I think it is largely a grammatical difference in a data representation environment. However, a valid point, and I will try to adjust the comments. Feel free to do more if I don't do it justice. --Jrich 22:25, 4 January 2011 (EST)
The GEDCOM specification supports three qualifiers, ABT, CAL and EST. The first is for an inexact date. This would presumably be when the primary source is incomplete, worn or torn? CAL is for a calculated date, and EST is for a date approximated based on another event. Being unaware of CAL, my proposal was using ABT for CAL, since I thought use of an incomplete date implied ABT, but it is probably better to adopt the GEDCOM system here.
Regarding periods and capitalization, GEDCOM does not seem to mandate upper case within data though all examples shown are upper case, and it does not use periods after these tags. For the reader of WeRelate, I suspect there is not much difference seeing After, AFT, aft, Aft., etc., on a WeRelate page. This issue would be more of a specification provided by Dallan based on how much flexibility he thinks is reasonable without losing the ability to convert WeRelate data into a GEDCOM. --Jrich 10:58, 22 February 2009 (EST)
We should kick-start this again... [28 October 2009]
We should get this actively moving again. We're eventually going to want to have data sanity routines work through the WR data base, and those are going to rely heavily on the ability to recognize dates. Indeed, the first such one I would suggest would simply walk through looking for bad dates. A couple of on-point notes:
- I skimmed the document and the suggested/recommended prefixes for estimated, before, between, etc., appear in all upper case, though a case preference is disavowed. For GEDCOM purposes, perhaps case doesn't matter, and - for purposes of import and export such strings should just be left alone. However, when they appear displayed on a page, all capitals is a screen real estate hog, besides distracting from the date proper. For that reason, in my work, the only thing (in a date) that I capitalize, is the first letter of the month abbreviation. So I would turn "BEF 20 FEB 1690" into "bef. 20 Feb 1690". Likewise "BET 11 JAN 1490 AND 6 JUN 1491" would be "bet. 11 Jan 1490 and 6 Jun 1491".
- Maybe I didn't look closely enough, but we probably also need to deal with "Old System" dates. I worked through a bunch of Imperial Russian stuff and there are a lot of dates noted "O.S." and similar. If the source records a date in OS, then it's probably improper to compute the NS date and write that as documented by the source (A cute behavior: I'll bet that WR page display code could be made smart enough to recognize "OS" dates and then automatically provide an NS corrected value for purposes of display).
- Also from the "getting cute" department. It's often the case that we have a DOD and "ae" description. If we could standardize the "ae" portion somehow/somewhere (presumably in the DOD description string), then the page display code could automatically roll up an estimated DOB (less hazardous and more faithful to sources, since it really isn't a permanent part of the record for a person).
--Jrm03063 09:06, 28 October 2009 (EDT)
- I am fairly sympathetic with your comments. However, WeRelate does very little processing of the dates, which is probably not unreasonable, allowing almost anything to be input, leaving it to the users to clean it up. Building knowledge of dates into WeRelate raises the possibility that it will cause problems with unanticipated input, so implies continual vigilance and the possibility of frustration by some future user. Dallan has explained his date processing somewhere, and it is fairly rudimentary, seemingly in keeping with this minimalist philosophy. For example, children born "aft. 4 Jul 1776" will sort before "bef. 4 Jul 1776". So these suggestions (understanding O.S., ae/Aet, etc.) represent quite a couple of quantum leaps in capability. --Jrich 11:11, 28 October 2009 (EDT)
Well, I'm not sure if it's all that much of a quantum jump. Still, I think it's a good data base sanity step to start walking through the data base, looking for dates that can't be understood in a compute sense. To begin with, all that we would probably do with that is to issue a per-page warning about an unrecognizable date string attached to some fact. Doing that though, means that we decide two things: what date syntax is recognizable and then, what date syntax is preferred (presumable the latter is a subset of the former).
I think this document is a good start on the latter. I know how to write parsers that can do the former, though I've never had a reason to work in PHP, or write code that lives on a web site or as an agent, etc. If Dallan could specify a framework for operation, maybe I could write some code that would do this. Hmmm... --Jrm03063 11:47, 28 October 2009 (EDT)
Oh, BTW, my thinking here isn't that this precludes users from entering anything - either via GEDCOM or normal page editing. I see this as a detached process that walks around and looks at person/family page fact date strings. When it finds a page that has unrecognized date strings, it either adds the name to a list of "date troubled" pages, or adds a warning to the associated talk page. Probably the former at present. Maybe the correct model is actually like a spell checker - build up a unique list of date string forms, attached to the pages that contain that string. Then walk the list and build up a second list of date-troubled pages accordingly. Or something like that... --Jrm03063 11:54, 28 October 2009 (EDT)
Date Parser? [5 January 2011]
It's been a while since I was a compiler writter, but I'm sure I could roll up some code to actually implement a date string parser/validator. Are we about there?
--Jrm03063 11:18, 5 January 2011 (EST)
Standardizing Dates? [12 November 2012]
We are approaching a time when we'll be able to have bots that work their way around and perform minor edits such as normalization of date string forms. There was recently some discussion on this on the WaterCooler. I think it makes sense to have a WR preferred form that's a little more narrow than GEDCOM. I really like mixed case for a month specifier, but all lower case for any other non-numeric. I can go either way on whether some of the abbreviations should have a trailing "." - which does not appear to be GEDCOM standard - but we probably should articulate a preference. --jrm03063 11:48, 12 November 2012 (EST)
- My personal preference is to like mixed case in the qualifier too, because the fact on a page looks like sort of a sentence, and it starts with the date qualifier, and so it looks odd to see it start with a lower case letter. My second choice would be all caps because the presence of the qualifier tends to get ignored by careless researchers that don't realize what a big difference in meaning it makes. But those are personal preferences, and fundamentally, all cases communicate the date. I would like to see this suggestion implemented where Dallan forces all dates into whatever form he likes and this issue stops wasting time when it really makes no difference to the meaning of the data. --Jrich 13:56, 12 November 2012 (EST)
- I just went to Help page for Date Conventions to read what was there. I do not understand why all the qualifiers are in ALL CAPS and the resulting date(s) are in mixed case. Even that mixed case does not look good to me as I would prefer the words to be all small case. Example:
- "The qualifier "BET ... AND" may be used to indicate a range of dates, where known dates define a window within which the event occurred. For example, if a person's will was dated 10 Nov 1798, and proven 20 Nov 1798, the death date would be specified Bet 10 Nov 1798 And 20 Nov 1798." Now why in the world should Bet & And be capitalized? --janiejac 14:40, 12 November 2012 (EST)
- As the author of that page (long ago), I can answer that. The page was written looking at the GEDCOM specification. The GEDCOM specification doesn't care, in fact, states that all the various case arrangements should be accepted and treated as the same value. That basically means all the cases being talked about in this discussion are legal, and none are wrong. In the text of the Help page, the qualifiers were capitalized to make them stand out as keywords, meaning those exact three characters were required. This is, I believe, a common convention in computer manuals. I believe, though it is a very old Help page and I haven't read it in a while, that it does state explicitly, that any case is legal. In the example they were mixed case because that is my preference (see explanation in previous post), and as that was how I normally entered them on pages, I entered the examples like that. As it is not wrong, it is by definition, legal. It is purely a matter of personal preferences and likes that are being discussed here. Meaning there is no right answer. Meaning the answer to "why in the world" is because somebody likes it and it is not illegal.
- Many software programs (like Family Tree Maker) do make the months and qualifiers upper case, which means anybody loading data from one of those programs is going to have their data end up in that form regardless of what any conventions say. Personally I think it is a waste of time for people to be busy changing these dates when the next GEDCOM touching that page is very likely to make it upper case again. This is a losing battle. So if a particular form is desirable, then I think the computer should enforce it. That would make this whole issue moot. Personally I think Dallan should choose one that strikes him as functional and aesthetically pleasing, that simplifies the import and export of GEDCOMs, and that allows for him to sort and compare dates efficiently. People could import, or enter, dates in whatever form they like and the computer should force them all into the selected form, or turn them red if it can't figure out what they mean. After all, whether Dallan chooses ABT, Abt, abt, or even circa, I think we'll all understand what it is saying. --Jrich 16:40, 12 November 2012 (EST)
- I would think that we'll have no trouble creating a date parser that will read everything that GEDCOM allows and probably a good deal that it doesn't. What this group can best add is what subset of the GEDCOM format looks best when it appears on a resulting page. Also - whether that exact form should be used in exported GEDCOMs - or something slightly different. Lets just take it as a given that any date form normalization would be done by software - so no one will be going back and changing "ABT" to "Abt." or "JANUARY" to "Jan" or whatever. --jrm03063 19:01, 12 November 2012 (EST)
- if you're exporting via GEDCOM, you won't be doing something "slightly different". That means ABT, in some case, but no period, for example. If Dallan wants to build the software to convert from, say "Circa" to ABT, that's fine, but it will go out as ABT (in some case combination) if it is GEDCOM. Otherwise, it's not GEDCOM. The only leeway here is how to display it while it's in WeRelate. --Jrich 22:23, 12 November 2012 (EST)
- By slightly different, I mean different within the set of forms that are legal GEDCOM. In other words, for display we might prefer that some strings are mixed case or lower case - while folks might prefer that things are upper case for export - but I thought that was obvious. --jrm03063 22:29, 12 November 2012 (EST)
- If you are going to have the software normalize the capitalization, I won't say much about it here since I am indifferent what convention you choose. But I do believe that the punctuation should be left out. It appears odd to me to see “Abt. Nov 2012” where there is mixed punctuation. I would prefer to just see it as “Abt Nov 2012”. —Moverton 00:27, 13 November 2012 (EST)