WeRelate talk:Watercooler

Topics


WeRelate Watercooler 2017

This page is for discussing anything you want to discuss, unless it relates only to a specific page. If it does, then post your comment on the Talk page associated with that specific page or on the WeRelate Support page.

To learn about using this Watercooler page or to ask questions about using it, go to Help:Watercooler.

If you don't want to leave comments on this page, you can email them to dallan@WeRelate.org.

Are you a new user? Have a question about how to use WeRelate? Post it to WeRelate talk:Support.

Old topics have been archived: 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016



WeRelate Page Growth Statistics [14 February 2017]

For anyone who is interested, WeRelate has grown from 2,750,000 Person pages in early July 2016 to over 2,794,000 Person pages by mid Feb 2017. This is at the same time that a few of us have been improving data quality by deleting pages for several hundred living persons and merging duplicates.

Image:WeRelate Stats.png

Due to the efforts of people working on improving the quality of data, the percentage of Person pages with first name Unknown has been reduced from 2.02% to 1.88% over the same time period.

Image:WeRelate First Name Unknown.png

As well, in less than 2 months, the percentage of Person pages with no birth country has been reduced from 36.99% to 36.77%.

Image:WeRelate No Birth Country.png

--DataAnalyst 02:54, 21 December 2016 (UTC)

Based on your statistic above, I'm guessing you and cos1776 are working in opposite directions regarding the use of "Unknown" as a first name, e.g., Person:Unknown Triggs (1), and Person talk:Unknown Seaver (18). --Jrich 16:38, 12 February 2017 (UTC)
I don't think so. As far as I am aware, the reduction in the name Unknown is due to finding names for pages. At least, that is what I have been doing.--DataAnalyst 22:09, 12 February 2017 (UTC)
He is converting many pages to have given name Unknown, so increasing the number. Your choice of it as a statistic to measure and report on suggests you don't believe that is a good thing, that it is a measure of bad quality. The pages are, in the two cases cited, and many other, names that were never given, so not precisely unknown. To truly reflect pages where the given name is unknown, meaning under-researched, as opposed to never-knowable, the latter case should be given a different value or else they will always remain "Unknown", inflating the count. There are various practices that have been employed on this website for this special case, I believe Unknown is not appropriate. --Jrich 22:51, 12 February 2017 (UTC)
Metrics need not be perfect to be useful - particular when used to show change over time or some other variable.
I would certainly like to know if there's a different custom for a first name that is unknown at the moment versus something that is asserted to never be knowable. I treat both as "Unknown" at present, and suspect that's what folks have done for years. I suppose the key question would be whether there's a generally accepted best practice for such given names in GEDCOM files - if so - we would presumably want to follow that.
More generally - I looked for some other common forms for an unknown first name (typically in the case of stillbirth or infant death). Searching in the first name field: "Child" - 673, "Female" - 641, "Male" - "580", "Infant" - 1302, "Baby" - 537, "Boy" - 266, "Girl" - 224, "Stillborn" - 155, and "Unknown" - 30,628. So there may be another 1/10th of a percent that neither measure accounted for - but I assume the defect was approximately the same in both numbers so the delta is still meaningful. --jrm03063 18:28, 13 February 2017 (UTC)
Daughter - 846; Son - 837. "so the delta is still meaningful": some of the alternate forms are being changed to unknown. So while one set of cleaners is reducing unknowns, another cleaner is add unknowns, so the delta understates how much cleanup was done. Besides the question this raises about the validity of the statistics, the use of alternate forms has been intentional on many pages specifically to distinguish from unknown, and changing it to unknown loses that information. --Jrich 19:00, 13 February 2017 (UTC)
I wasn't aware that other ambiguous forms were being substituted in preference to "Unknown". My memory is that we never distinguished between unknown at present and unknowable - and that the proper form in either case was simply "Unknown". But no matter - the question remains whether there's an accepted GEDCOM best practice on this. Also, does historical guidance here at WR suggest different variants of unknown? --jrm03063 19:27, 13 February 2017 (UTC)
Keep going... we also still have roughly
  • Dochter = 36
  • Zoon = 39
  • Levenloos = 152
  • Levenloze = 60
  • Doodgeboren kind = 14
  • Wife = now 61, but was abt 150 that have either been sourced with a name or changed to Unknown
  • Fru = 5
  • Frau = 5
  • Hustru = 2
  • Vrouw = 2
  • Miss = now abt 380, but was 432 that have either been sourced with a name or changed to Unknown
  • Husband = 62
  • Man = 5
  • Mann = abt 20 where it means "husband"
  • Første Mann = 2 (first husband)
  • Unk = now 0, but was abt 200 that have either been sourced with a name or changed to Unknown
  • FNU = now 0, but was 124 that have either been sourced with a name or changed to Unknown
  • GNU = 3
  • Unnamed or (Unnamed) = abt 225
  • Naamloos = 4
  • Not known = abt 50
  • Not used = 2
  • Not named = 9
  • NN or N.N. = abt 1249 (Nomen Nescio)
  • Anonyme = 19
  • Inconnu = 1
  • Unbekannt = 6
  • Onbekend = 7
  • Don't know = 2
  • Young = abt 25 that are suspect
  • and on it goes ...
There is no standard convention for this. Entry is based on user preference. Yes, I have cleaned a lot of this up and been mostly standardizing page titles to "Unknown xxx" when the given name is unknown for any reason as opposed to one of the words in the lists above. That is what our program defaults to when the field is left blank and the logical place to look for Unknowns in our wiki page lists. Most of these pages came in with old GEDCOMs, have zero sources, and haven't been touched since 2007-2009. For our site, the "quality" of a page is not only measured by the amount and type of genealogical proof attached to the data, but also by the ability of that data to function with our program and with other programs (if it leaves). Improving both of these is important.
Entering the simple word "Unknown" in the Given name field, as was pointed out, has been acceptable here since the beginning. Now, if you want to talk about that convention, I'd be glad to, because I would also like to see it changed. However, not in the way that is being discussed. In a nutshell, name fields are for names only! We should not be entering any placeholder words or made up names into any Given name or Surname fields for which we don't have a sourced name - not "Unknown" or "Unnamed" or "Stillborn Daughter" or "Levenloos kind" or anything else. When we do not know a particular name part, the corresponding field should remain blank. So, following this convention, the correct way to enter children with no given name, for any reason, is to enter just the complete surname they inherited from their parents in the Surname field and to leave the Given name field blank. Our software handles that just fine, as do other programs, and any confusion about what is being implied or in a different language, etc. is eliminated. There are other data fields on the page better suited to clarification when it is needed. --cos1776 01:26, 14 February 2017 (UTC)
Things like wife and husband should translate to unknown. Presumably if they lived long enough to marry, they had a name, and the poster didn't know it. This would seem to be exactly the definition of unknown. But that is different than the situation where a name was not believed to have been given, as in Unknown Seaver, female, formerly (female) Seaver who was born in 1865 with no name listed in the birth record and not listed with her family in the immediately following census in 1870 and had no gravestone in the cemetery where other family members are found on Find A Grave. Maybe there is no desire to distinguish between the two cases the result in having no name to post, but I am pretty sure distinguishing exactly that was the intent of the poster in using "(female)" as opposed to the unknown he uses on other pages.
Of course, there is no convention, and as will always be the case when there is no convention, that means everybody does something different. So the first step is to decide if this is significant, then, if it is, how to distinguish it from simply unknown. The fact that dataAnalyst was using Unknown as negatively correlated to data quality suggests it is significant. Maybe because users that are likely to be unaware of this rule can upload GEDCOMs, or because language variants provide too much variability, it is believed pointless to try. That is for the group to decide. Distinguishing data values don't have to be in the name field but that of course has the advantage of getting propagated to the infoboxes and erased automatically by the name if it ever gets discovered that a name was given. But there are other ways, one of them being the various title fields, or add a new fact type, or adding some template to the Narrative (a la nomerge}.
This is a relatively trivial matter, but for that reason, probably a good chance for the Overview Committee to start developing a process for developing such conventions and notifying the user community (is there one page everybody should be watching?) Date conventions are fairly detailed, but there are lots of other conventions that could use some serious discussion, like Married Name, like handling Intentions, like what data is best posted on the Person page and what data is best posted to the Family page. Wikipedia, as an example of a more mature wiki, has its usability is greatly enhanced by the fact that most town pages have similar outlines, most kings are presented in a similar style, etc. I believe similar attempts at standardization are needed here. --Jrich 02:35, 14 February 2017 (UTC)
Well, this is a case where a GEDCOM best practices document would help (for example, I'm pretty sure the GEDCOM spec is silent on the question of whether dates before 1000 should have a leading zero - and no one would want to write a GEDCOM interpreter that required a leading zero there - but a best practices document would give guidance on which form was preferred). So if GEDCOM proper doesn't dictate what Unknown ought to look like (I don't think so at least - unless their idea is that it should be empty). We should first make sure that guidance isn't lurking somewhere in that spec. If GEDCOM leaves specification of unknown up to the application/user - then we should probably look at conventions used by the more common Genealogy systems out there. Both web sites and desktop applications. This seems like a common enough situation that some practice will be found to be a little more prevalent than the others - and we should therefore follow whatever practice seems most common. Do GEDCOMs in the care of LDS follow a practice? If so, then, etc.... --jrm03063 18:54, 14 February 2017 (UTC)

I applaud the efforts of people working in the Data Quality Improvement project. This is the single best chance for WeRelate to make a difference in my opinion. Every time I investigate other sites, spoken highly of by various people, I continue to find inferior quality, and decide to stay right here at WeRelate. This particular project is the one that I think most adds value to WeRelate, and hope that at some time, the percentage of quality data might make this the site the one primary place where people will go to share new discoveries for review, to correct long-standing myths by providing evidence, and where we can build a culture that says what you know is not as important as how you know it. --Jrich 03:25, 21 December 2016 (UTC)
Thanks. I agree that quality has become somewhat of a focus of the collaboration efforts at WeRelate, and I'd like to see us fit that niche of where you go to find trustworthy data. We're still a long way from there for some of the old GEDCOM data, but where there is a high degree of interest in data (e.g., early New England and New Netherlands), I think we've got some pretty good quality. And where we're fixing up the old GEDCOMs, I know that we are ending up with parts of trees that are better researched than anywhere else available online.--DataAnalyst 13:11, 21 December 2016 (UTC)

Excellent ! Thx, Ron woepwoep 08:45, 21 December 2016 (UTC)


how to find template [24 December 2016]

I found a GEDCOM with what appears to be no sources, but I didn't know where/how to find a template to mark it 'needs sources'. Even if you tell me where to find it, I may not remember it. (Memory NOT improving with age!) So is there a way to easily locate this kind of thing? Where it is hiding? Maybe we need a link at the top of pages to an index of sorts. That might help lots of folks. --janiejac 02:20, 22 December 2016 (UTC)

Hi, Janie. I think the template you want is Template:Sources needed. You can find any template by searching the Template Namespace (Select Search from the top menu, then All, then Template). I entered "Sources" in the title and found the template, although I did have to look a bit because there are a few that are similar. Take your pick.
I agree about making it easier to find these things. I'm hoping that this is one of the things that will come out of Overview Committee work over the next year.--DataAnalyst 02:43, 22 December 2016 (UTC)
I believe {{Sources needed 1}} is better as it includes a message that encourages referencing contemporary documents and not just copying from other unsourced items (i.e., passing the buck). --Jrich 03:22, 22 December 2016 (UTC)

If there's one thing I have used more than any other this year on WeRelate, it's #redirect. It's absolutely vital in working on Places, but it would probably be just as useful for bringing together Templates and Sources that really ought to be together. Merry Christmas! --Goldenoldie 16:48, 24 December 2016 (UTC)


Seasons greetings from the Overview Committee [24 December 2016]

On behalf of the Overview Committee:

Seasons Greetings! The Overview Committee is back in action after a few bumps this year. Getting ourselves organized is taking a while, but in the meantime, 2 members (Cos1776 and DataAnalyst) have initiated a refresh of WeRelate Help.

We've put a considerable amount of thought into how to organize Help, but we need to do some more work before it is ready to share (hopefully by the end of January). In the meantime, I have requested further community discussion on a couple of date conventions so that they can be included in Help pages (and hopefully even automated). Bear with us. While this is a slow time for one of us (off work for 2 weeks), it is a busy time for the other (lots of Christmas activity and travel).

Best wishes for 2017 from the Overview Committee!--DataAnalyst 14:33, 24 December 2016 (UTC)