WeRelate:Suggestions/Format Date Field

Implementation Plan

I plan to implement a date editor that not only formats dates consistently (using mixed-case) but also requires them to be in a GEDCOM-acceptable format. Since GEDCOM allows any text (such as "winter of 1883") as long as it is enclosed in (), this allows considerable flexibility. The plan is to store all dates in English (except for the parenthetical portion) and translate them to the user's preferred language (a User Profile setting) on display.

I plan to implement this in stages:

  • The first implementation did the following:
    • automatically change the date to mixed-case if that is the only change required
    • automatically replace 'UNKNOWN' with a blank field
    • suggest a standard version of the date if more change is required than just a change to mixed-case; edits include
      • correct an ambiguous date (dd/mm/yyyy or mm/dd/yyyy format) where one of the first two numbers is greater than 12 and therefore must be the day
      • change "from/to" to "bet/and" for events that can only occur on a single day (such as birth, death, etc.). "From/to" is sometimes misused by people who don't understand the distinction.
      • change yyyy-yyyy to "From yyyy to yyyy" or "Bet yyyy and yyyy" depending on the type of event
      • change leading c, ca, c., or ca. to Est
      • ensure split-year dates are formatted as yyyy/yy
      • abbreviate months entered in full
      • handle a few identified variants of qualifiers (e.g., Calcd becomes Cal)
      • convert "say" to "Est"
      • place day before month if input is mmm dd yyyy or mmm dd, yyyy
      • standardize on "(in infancy)" and "(young)" (both in parentheses to meet GEDCOM standards) for "infant" and "died young"
      • accept month names and abbreviations in a handful of other languages (Dutch, French, German and Spanish)
      • display month abbreviations in other languages (with qualifiers such as Abt and Bef in English - e.g., "Bef 10 avr 1823")
    • show a message for each date that does not meet GEDCOM standards and for which I have not coded a suggestion

Note that this first implementation shows suggestions and messages only when:

  • the user first brings up an existing page in Edit mode
  • the user selects "Show preview" - the suggestions and messages are not in the preview itself, but in the edit portion of the screen

The first implementation didn' require anything to be fixed, so that there was an opportunity for feedback before automatically replacing existing dates with suggested dates.

  • The second implementation did the following (only when a page is previewed or edited):
    • automatically replace dates in the edit portion of the screen (not the preview) with suggested dates as listed above
    • show the previous version of the date below the new version

The second implementation didn't require anything to be fixed, but allowed users to accept or change the formatted version. This still gives opportunity for feedback before I start automatically replacing existing dates when saving the page.

  • The third implementation was a tweak to the second one, with an extra bonus (see last point). It did the following:
    • automatically replace dates with suggested dates as listed above (including on the preview screen)
    • show the previous version below (unless the only change is a change of case)
    • prompt the user to review the changes to ensure no loss of meaning - this will look like the page has errors that need to be fixed, but in fact, if the user is satisfied with the changes, they just need to save the page. Hopefully the new message I added makes that clear enough.
    • if the only change to a date is the case, no message will appear and the change will be saved (it wasn't saved before unless you did a preview or were editing an existing page) - that is, once this change is implemented, you will be able to add new dates without capitalization and WeRelate will automatically capitalize when saving the page
  • The fourth implementation refined the edits as follows:
    • if a date includes number suffixes (e.g., 1st, 2nd, 3rd, 4th), they are ignored rather than causing an error message
    • error message for a date range with the first date later than the second (e.g., from 1823 to 1800)
    • error message for a day with value 0 (e.g., 0 Feb 2021)
  • The fifth implementation made the following changes:
    • recognize and standardize a few more date variations, such as "&" for "and" and some modifiers in other languages (as found in the data)
    • no longer request review of minor automated changes, such as removing leading zeroes or moving the day from after the month to before the month
    • on removing a fact, make sure that error messages and highlighting move with dates that "move up"
  • The sixth implementation made the following change:
    • reject dates in the future
  • The seventh implementation made the following changes:
    • edit dates in uploaded GEDCOMs and:
      • issue an error message for each date that cannot be interpreted
      • automatically format each date that can be interpreted
      • issue an alert (that the user has to click on) for each date that required "significant" interpretation
  • The eighth (and hopefully last) implementation made the following changes in the wiki (manual data entry):
    • require users to correct dates that cannot be parsed or that are invalid (e.g., day number too high for the month or invalid date range)
    • warn when events are out of order (e.g., death before birth)
      • the warning appears when you edit an existing page with this problem or when you choose "show preview" on a page
      • WeRelate doesn't require you to correct the problem because sometimes sources provide conflicting information (such as baptism date before birth date) - however, you are encouraged to correct the problem if it is the result of a typo, misunderstanding of a source, or conflation of two individuals

Status

Completed as of 6 Nov 2021.--DataAnalyst 02:17, 7 November 2021 (UTC)

Original Request and Discussion

Any time a page is saved, whether during GEDCOM upload or manual editing, it would be nice if WeRelate could try to parse the input given in date fields and force it into a preferred format presumably 9 Feb 1963, or similar, and converted any qualifiers into a preferred format. Thus all dates would have a single, consistent format, and it obviate the need for edits merely to clean up the date format. If the date cannot be parsed, and so the computer cannot tell what date is intended (say "winter of 1883"), then the user input string should be displayed verbatim in red. --Jrich 09:53, 10 November 2012 (EST)

I agree. In particular, just because the GEDCOM standard uses all caps for dates doesn't mean WeRelate has to upload them that way. I hate all caps so much (too much a vestige of 1970's computing, and the industry has come a long way since then) that I take the time to edit my GEDCOMs before uploading them and it is a step I would prefer to skip.--DataAnalyst 09:28, 11 November 2012 (EST)

All Caps is also a pet peeve for me. Seems I'm always changing to mixed case, including the "all lower case", e.e. cummings style. A standard would be extremely helpful.--SkippyG 16:40, 29 June 2017 (UTC)

While I agree that the basic format (DD MMM YYYY) should be standardized and non-conforming entries converted before they are accepted into the system, the issue of mixed letter case has some things to take into consideration. We currently allow the month abbreviation to be entered in any language, and many languages do not capitalize the first letter of month names or abbreviations at all.[1]
Since allowing all languages means that it is possible to have roughly 6,500 different abbreviations for each month entered, we had been discussing a proposal to change the abbreviations allowed to be those of the three official languages of the International Organization for Standardization, namely English, French and Russian. But, guess what ... of those three, only English uses a mixed letter case for months. French and Russian do not capitalize the first letter.
So, since we plan to continue to allow other languages and letter case does not have any effect on the searchability or functionality of the date value, it would seem that it doesn't make sense to restrict the use of letter case to a format that may not be correct in other allowable languages. --cos1776 19:31, 29 June 2017 (UTC)