WeRelate talk:Junk Genealogy

Views

Watchers

[show all]

Archives 2008/2009

Topics

1 A Bit of a Rant Regarding Junk [7 April 2009]
2 Quality Scale [8 April 2009]
- 2.1 Articles vs. People/Family Pages [8 April 2009]
3 Proposed Rating / Scoring System [8 April 2009]
4 Alternative Approach [13 April 2009]
5 An additional suggestion [11 April 2009]
6 NGS Standards for Sound Genealogical Research [14 April 2009]
7 Transfer to Mission Statement [14 April 2009]
8 Template for insufficiently sources pages [14 July 2009]
9 User Community Focus [26 April 2009]
10 Overall [2 May 2009]
11 How about a "death row" status? [15 July 2009]
12 Mary Jones [16 July 2009]

A Bit of a Rant Regarding Junk [7 April 2009]

I'm new at WeRelate, but I'm an old hand at genealogy -- researcher, librarian, editor, and workshop teacher, over the years. I use TMG (I was a member of Bob Velke's original design team fifteen years ago) and that's where I keep everything. I have also had a website for about a decade that includes modified reverse register reports of my eight gr-grandparents. That's a preface to saying that I've recently begun uploading to a couple of websites as well. To Ancestry, simply because it's so large, and I can pick up all those census listings I hadn't gotten around to for various collateral lines. To Genie, because the social networking aspect of it has helped me get a large number of relatives interested and involved, and that's always a good thing.

Now I'm poking around in WeRelate, figuring out how it works and what it might be good for. (Like most librarians, I do a lot of work at Wikipedia, so I'm generally familiar with the mechanics.) I had been wishing for years that a well-designed wiki site would appear for genealogy, since it would enable free-form cooperation with other researchers, both within my family and outside it. Especially, my hope was that at such a site junk GEDCOM uploads could somehow be minimized.

Thoughtless, useless uploads are the bane of serious work at Ancestry, for instance, because people will download some mish-mash of a GEDCOM from World family Tree, merge it with another GEDCOM from an equally specious source, and then re-upload the new GEDCOM to Ancestry -- which only compounds the chaos. As a result, if you look at the "One World Tree" section at Ancestry, it's easy to find trees where all the children from three wives are lumped under a single wife, with the last six kids being born long after her stated death date. And so on. How is this supposed to be useful to anyone?

There are certain instances in my own lineage where an assumption of descent from an entirely unconnected person of the "right" name -- pure wishful thinking -- is claimed by practically every beginner who happens upon a connection to that family -- even though the error of making such an assumption has been demonstrated over and over again, and in detail, by serious researchers in the family. And these people upload the specious linkage in yet another junk GEDCOM, leading others to assume (again) that the information must be correct.

My apologies for ranting on this subject. It's a hobby horse with me, I admit. Every time I teach a class for novices, I emphasize repeatedly the necessity of doing your own research and citing your sources, or at least checking the research of others before you adopt it wholesale. I would like to see WeRelate -- or some such website -- take a principled stand on this. YES, there IS "good" research and "bad" research. YES, there IS such a thing as "junk" genealogy -- which should never see the light of day on a website that wishes to carve out a niche for itself. Not all research is equal, folks. You can encourage good research practice, you can teach family researchers how to do good, useful work. Or you can cater to the lowest common denominator and accept anything someone wants to dump on your server. Accepting everything uncritically actually damages the work of other researchers. Do you want WeRelate to be different from those other sites listed in the table farther up in this thread? "Different" in a meaningful way? Please think about it. --mksmith 17:15, 3 April 2009 (EDT)

We started WeRelate with the idea that, as a wiki, junk could be removed by better researchers. We have a number of users doing this - fixing up and merging information. There are two areas where further improvement is needed:

We've got to stop uploading GEDCOMs that create Person and Family pages that are duplicates of what we already have. The new and improved GEDCOM import function currently in final testing at the sandbox will hopefully solve this problem. It makes GEDCOM upload much more labor-intensive because you're required to review the families in your GEDCOM that have potential matches to existing families and link them to those existing families rather than creating duplicate pages, but raising the bar for GEDCOM uploads may not be a bad thing overall. The new GEDCOM import function also requires people to review suspect dates in their GEDCOM and gives them a chance to correct them before import.
We need to encourage better sourcing. Until recently I didn't realize how much genealogy was unsourced or where the source was simply another GEDCOM file. I have a lot of theories as to why this is the case, and I believe that at least part of the problem lies with existing tools making sourcing difficult. We've been cleaning up the source database for the past year in preparation for a big effort the latter half of this year to making sourcing easier.

The amount of sourcing going on in WeRelate articles is about the same as on Ancestry. In the case of Ancestry about 95% (literally, I've got data that shows this) of the lineages lack ANY source information. Of that remaining 5% who do source, most are simply citing GedCom's or someone else's lineage that they drew from. Only about 1% of the lineages actually provide the underlying sources, either pointing to original sources (in the BCG sense of the word) or point to something that's making use of original sources. The numbers might be a little better here on WeRelate, (I've only looked casually) but not by much. NOT sourcing is the near universal truth among genealogists on the web---unfortunate, but that's the way it is.

I believe one of the reasons for that is because people don't really understand what's needed. The 5% I mentioned, ARE trying to do the right thing---they ARE sourceing their information---in the sense that they are saying where they got the data from.---"Why, I got it from Billy Joe's GEDCOM!". What they don't understand is that this actually is not what's intended when professional genealogists say that you need to cite your sources. They really mean you need to point to the underlying sources of information---the original sources, contemporary with the events, that show that "Mary Platt born 1675, was the daughter of Epenetus Platt and wife Jane Wood". And usually, you can't do that with a single source---you usually need multiple interlocking sources to show the truth of the a statement like the above.

And that's really too much work for most people. They'd much rather take someone word for it---"Just tell me what the right answer is so I can get it down in my GEDCOM." And despite the best intent, that is not likely to change much. You can encourage folks to do the right thing, but the right thing is much more than explaining where they got their information. Q 14:49, 6 April 2009 (EDT)

I think Quolla6's analysis is probably pretty close to the truth. There are stages in appreciating sources, as in most other areas of growth. Keeping sources is only the first step. Finding a source that is wrong, perhaps stupidly wrong, helps you appreciate the quality of sources. You have to get burned to learn to avoid fire. You may know it's dangerous, but until you are burned, you don't know what priority to give its avoidance, or how much effort to spend avoiding it.

There's already a reason why genealogy attracts mostly older people, and it's not just because you have to experience life to appreciate what your ancestors did to get to you, but also because it takes time. If you have a job, and young kids waiting when you get off from work, it is hard to find the time to do exhaustive searching of sources. WeRelate, books.google.com, and various websites will help distribute the sources making this easier. WeRelate's great value will be to provide the connections that identify a person more accurately than names which may be common, misspelled, aliased, etc.

Hopefully accumulation of good sources on WeRelate is like a ratchet wrench, only going one way, towards better sources. The great fear of GEDCOM uploads is, of course, the danger to this progressive march --Jrich 16:06, 6 April 2009 (EDT)

Ratcheting. Good point. Perhaps the advantage of WeRelate is that those who are further along on the learning curve, can insert the underlying sources as they come to them. Hopefully, bad genealogy does not drive out good, and gradually a corpus of well sourced, well documented family relations will be built. And perhaps, as good stuff builds up, others will see what's needed and act move along on the learning curve themselves. There's really no such thing as Junk genealogy---just a lot of sincere, but incomplete work that sincerely needs some TLC. Something that I notice one of the other wiki's starting to do is emulate the Wikipedia's "Barnstar" approach of flagging well-done articles. Perhaps that's something we should do here. That way people will be able to spot well done articles, and hopefully, see what's involved in an article that others think is well done. Q 16:24, 6 April 2009 (EDT)

Q, We already have a variation of Wikipedia's "Barnstar" approach-- in addition to the "Nominations" option (which we could certainly all use more actively), the recent portals also serve as a way of promoting both good examples as well as "featured pages". I know that there are plans to seriously improve the help pages -- including examples of "model pages" as part of the help system would also go a long way to encouraging good practice. jillaine 08:28, 7 April 2009 (EDT)

Wikipedia uses both "Barnstars" and Featured articles. Presumably a featured article is thought to be worthy of a barnstar, but not necessarily. I know in Genealogy Wiki they also do both, recently adding the Barnstars to articles. I think they started with those that were featured, but I suspect not every feeatured article there got a Barnstar, and probably some with Barnstars have not been featured. They also are trying to implement a "Quality Scale", rating articles 1-5, with 5's being candidates for Barnstars. They are trying to use specific criteria for rating articles based on whether they meet certain requirements. I think that is a good approach, as it makes the evaluations objective rather than subjective. I'm not sure that describing someone's work as "garbage" does much to encourage them to improve their articles---but telling them that its a "1" on a scale of 1 to 5, because it lacks "sources", a "family register", "problems with English", etc. makes it the fault of the article, not the person, and gives them specific objective guidance on how to make it better. Perhaps the point is that since there are no criteria for making an article "featured", there's no way to tell why it was featured. Seeing that a specific article is "featured" doesn't tell you it was genealogy well done--or at least why it is thought to be well done. Q 08:57, 7 April 2009 (EDT)

There's also a line of thinking that says that unsourced material shouldn't be allowed on WeRelate. I don't agree with this; it's like saying that articles without good references shouldn't be allowed on Wikipedia - I think it raises the bar too high for a lot of people who would eventually turn into good contributors. I think it's better to allow unsourced articles to be added and then encourage others to make them better. However, semi-protecting certain pages - say pages for famous or medieval people (people with a link to an article at Wikipedia) or people with more than 5 people watching them - so that they cannot be edited during GEDCOM upload but instead must be edited manually, makes sense. This is also implemented in the new GEDCOM import function.

Finally, the new GEDCOM import function could give administrators a chance to review every GEDCOM, or every GEDCOM above a certain size, before it was uploaded, so make sure that it didn't contain a lot of junk. We haven't decided whether this should happen because it could slow down the turn-around for uploading GEDCOM's, but if we had enough people willing to review incoming GEDCOM's, I'd be in favor of giving it a try.--Dallan 11:38, 6 April 2009 (EDT)

[add comment] [edit]

Quality Scale [8 April 2009]

Above, Q mentioned that the Genealogy wiki is considering something like a quality scale of 1-5, with criteria identified for each level. It appears to be being discussed here. They also appear to be basing it on Wikipedia's own article assessment process. Seems like a good idea and that werelate should come up with our own variation of the charts on the latter page. jillaine 09:18, 7 April 2009 (EDT)

I like the idea, but what about basing it on something like the genealogical proof standard? Perhaps something like: no sources vs. some sources vs. every fact is sourced vs. facts are sourced and conflicting evidence is presented and analyzed? I don't know the standard well enough to say what the levels ought to be. I do believe that one of our long-term goals ought to be to encourage and make it easy for people to cite sources.--Dallan 09:51, 7 April 2009 (EDT)

To start, how about a barnstar for pages with a non-gedcom source? Something that simple might even be automatable, so that the system could add the barnstar automatically?--Dallan 09:56, 7 April 2009 (EDT)

Dallan and others, the GPS standards are not a spectrum from good to better, but are five "elements":

a reasonably exhaustive search;
complete and accurate source citations;
analysis and correlation of the collected information;
resolution of any conflicting evidence; and
a soundly reasoned, coherently written conclusion.

We could do some sort of "star" system, five of which would have to hit all five of the above.

And yes, most of what we've got would have zero stars at this point.

jillaine 10:53, 7 April 2009 (EDT)

I probably missed something. The scale talked about appears to communicate an award system more than quality. The ratings were not a scale. They reminded me more of the student of the month awards in elementary school, electronic certificates that this article was once a featured page.

I do not really like rating anything, unless it can be computerized or done entirely by following a flowchart based on simple yes and no answers. In other words, the hard part is writing really good criteria for assigning the rating.

I don't think rating articles is a good idea. I think the quality of articles speaks for themselves, and people read genealogy pages because of the topic, not because of the quality. Some people are not well-known, and any page on them will be poor, but the reader may be happy to get anything they can. Setting up some arbitrary criteria for passing out atta-boys could potentially distract people into chasing form instead of substance. I prefer to let the natural diversity of different people's varied interests to cover all the bases, rather than artificially favoring one aspect of genealogy over another by setting up a rating system that values one thing, but not another. And I suspect WeRelate pages are more volatile than wikipedia pages, so how does a rating get maintained in the face of changes to the page?

There may be some value to rating sources based on comptemporaneous-ness and how far removed from original, but any criteria must be very simple and mechanical or it probably is too subjective. As a side effect, a page could be scored based on having sources for each fact and their relative quality. But, in any event, research must consider all the available sources as a body and sometimes what looks like the highest quality source is the one that appears to be in error. So trying to come up with a rating system of sources carries the risk of short-circuiting a thorough analysis because it is so easy to just take the highest rated source.

I believe rating sources has been discussed relative to GEDCOM updates, obviously with no commitment to doing it. The difficulty is figuring out a way to automate it, which is the best way to get consistency and thoroughness. --Jrich 10:22, 7 April 2009 (EDT)

Jrich,

We're not speaking of an arbitrary award system, but a method for identifying solidly done research (for which there is a set of existing criteria). This would highlight strong models of such research and encourage others to improve their pages to meet said criteria.

jillaine 10:46, 7 April 2009 (EDT)

Dallan, That would be a good starting point, though I'm not sure how easy it would be to automate this. Perhaps making the criteria "inclusive" rather than "exclusive" would be appropriate. That is, tell people what they should have, rather than what they shouldn't. But that would be harder to automate: Easy to exclude a page that uses a GedCom as the source, hard to identify a page that uses something else as appropriate. As an example of that, giving a barnstar to a page that had no GedCom identified as a source would give a barnstar to pages that had NO sources---at least citing a GedCom gives lipservice to the idea of sourcing, so is an improvement over no sources at all,

I suspect that at least at first, you need to keep this as a "people decision" until criteria can be worked out. Perhaps the way to go about this is to focus initially on the featured articles. Looking at them systematically might give us a good idea of what people like in articles. Since the point of departure for this was "encouraging sourcing", the use of "original sources" in the BCG sense of the term, would be one criteria for giving a Barnstar---but "original sources" is probably too stringent to start off with, as not many pages would meet that criteria. Perhaps "effective use of sources" might be a less restrictive criteria that would still encourage good sourceing practice. Others might be effective use of the narrative section of the article, and the effective use of graphics. But limit the field initially to the featured articles. Then you can explore what works as a criteria and what doesn't using a suite of articles already identified as articles people (or at least someone) liked.

IN agreement with Jrich, I'm not sure you really want to get into the business of rating every article (computerized or not). Genealogy Wiki can do that because they have relatively few articles to work with. 2M plus here mitigates against evaluating every article, unless there's a computerized way to do it. If the idea is to give people examples of what's considered a well done article, than Jillaine's point about "featured articles" is very well taken. The problem there is there's no critieria as to why a particular article was featured. Using a barnstar to denote on the page that it is thought to be a good example of effective sourceing (referenceing BCG standards), might help make it clear. It could also be used to point people to good examples to emulate for various purposes. Q 10:52, 7 April 2009 (EDT)

[add comment] [edit]

Articles vs. People/Family Pages [8 April 2009]

Mm... Perhaps we are having two separate conversations here. Where *I* am coming from is in response to the topic of this page-- i.e., "junk genealogy" -- the lack of evidence provided, the lack of cited sources. If we focus on that, we can come up with some pretty solid non-arbitrary criteria.

But if we're talking about rating other types of articles, that seems to me to be a different topic (and discussion page) altogether.

-- jillaine 11:00, 7 April 2009 (EDT)

Reading the last few comments, it's my impression, too, that we're talking about two different subjects. I don't think you can "rate" Person pages based on how much they present. I have any number of folks in my database on which there simply isn't much to find -- and I've been looking for several decades on some of them. If after thorough searching, you have only a marriage date, say, and one land transaction -- and you've added some thoughtful speculation based on that -- you've pretty much fulfilled the Genealogical Proof Standard. It's unfortunate that there isn't much to find, and you may never learn much about them, but you've done the work in a proper manner.

Now, in the matter of someone uploading a skeletal GEDCOM that includes no "real" sources and zero interpretation, it comes down to only two ratings: Acceptable vs. Unacceptable. And the great majority will be the latter. Which, I guess, is why I would have to question the point of even allowing junk GEDCOMs on the site. With no sources, they're worthless. Worse even than that, they're often misleading to the inexperienced. (I'll stop there before I get started again. . . .) --mksmith 15:37, 7 April 2009 (EDT)

Where is that list of stages of a genealogist again? I would like to see that given a prominent place somewhere as I think all of us can find ourselves somewhere on that list. As someone who has recently experienced being raked over the coals for posting what I thought was 'reasonable conjecture' I can empathize with those who aren't as experienced and are told their work is junk. That will certainly not encourage them to learn how to make it better. They will just decide to post their info on another site which is more welcoming. And which won't help them learn any better either.

Some of my work has very good sources; some of it is from other folks' research and I work to give them credit for it - though I don't know their sources. But I want to be comfortable posting here on this site where someone else with better sources can fix what I don't know. I would hope that would be one of the benefits of a collaborative site. --Janiejac 16:29, 7 April 2009 (EDT)

If I misunderstood above, I apologize, but the link you provided was about rating articles, and scale included two ratings designed to flag featured articles or almost-featured articles. I have seen plenty of websites where users rush to collect post of the day awards, and don't think that is a good idea. I then shifted in my remarks above to source quality because it seems like the only reasonable way to rate the quality of pages.

Like mksmith, I think the controls need to be put onto GEDCOM updates. Why not put screens on the window, instead of running around the house with a fly swatter? In some ways, the new controls of 5 people watching is so overwhelmingly simple and easy to understand (while it may not be my first choice) that I think/hope it will work, and think we should wait and see. It should mean I only have to fix a page 4 times, and after that, then there will be myself and 4 other people watching it. :-) Maybe I could even get all my family members to sign up so I control five user ids myself. :-)

Scoring pages (a phraseology I like better than rating) would not be too obnoxious if it only measured completeness and quality of sources. It would indicate pages where more help would be beneficial (in case there are people with spare time on their hands), and would provide guidance to new users of what is desired. Any kind of scoring system for sources will depend entirely on the actual criteria, which hasn't been discussed at all. If rules are developed around the scoring (e.g., you can only update data by providing a higher quality source), I suspect there will always be situations where the scoring system gets in the way of doing valid updates. --Jrich 16:52, 7 April 2009 (EDT)

[add comment] [edit]

Proposed Rating / Scoring System [8 April 2009]

Desired outcome: high quality, sufficiently researched/sourced data on people and family pages.

0 = no source information at all

1 = Up to 25% of the data is sourced or otherwise a strong case is made

2 = 26-50% of the data is sourced or otherwise a strong case is made

3 = 51-75% of the data is sourced or otherwise a strong case is made

4 = 75-99% of the data is sourced or otherwise a strong case is made

5 = Fully meets GPR (?) standards

-- jillaine 18:14, 7 April 2009 (EDT)

Jillaine out of the 2M articles on WeRelate, how many do you think fall in each of these categories, as defined above? Q 19:32, 7 April 2009 (EDT)

Out of the 2M articles on WeRelate, how many are you going to read to figure out if "a strong case is made"? I tried a couple of times to come up with a scoring system that would involve mostly computer calculation but believe it would inherently useless. Even the best quality sources have errors and no formula fits every case. Each page needs to be analyzed in-depth by the interested people who have knowledge of the subject. That would be the normal course of things, if the only style of input was field-by-field typing data into the page. I would like to believe that even the greenest of genealogists, if forced to at least look at what was on the page, and who knows, maybe even read it, would bow to high-quality cited sources. That does not happen because GEDCOM update enables an unthinking, unreading, one-way, non-collaborative mass update and so empowers people to change data faster we can respond with nasty messages on their Talk pages.--Jrich 19:48, 7 April 2009 (EDT) P.S. To be fair to Dallan, the new GEDCOM update really slows down this process and does cause some thinking. However, the display of sources is so cramped on the update/merge screen, that I am not sure people will do a good job. I think immunizing pages from GEDCOM updates by accumulating 5 watchers will be what really ensures that changes are towards higher quality.

A viewpoint I also agree with, though I don't know about that "nasty message" part. Personally, I don't think scoring every page on WeRelate would a) be do-able, b) achieve the desired goal. I presume that the desired goal is to get better pages. I think that purpose is better served by highlighting articles that meet some standard. Sourceing is part of that standard, but not the only thing involved in a high quality page. Denoting a page as a "featured article" is probably part of the solution, but by itself its incomplete. Also, a high quality page on this site is NOT necessarily a "person page". Q 20:00, 7 April 2009 (EDT)

As I merge duplicate pages, it would not at all be difficult -- assuming we had a set criteria in place -- to take an extra minute and score a page as to how well sourced it is. So it could be done, by humans, while de-duping the already existing pages. In fact, this is where it comes up most for me-- while I'm merging.

And while I concur that high quality pages also include pages that are not person or family pages, the overall point of THIS talk page is about dealing with "junk genealogy" -- i.e., the crap that people upload (i.e., what becomes person and family pages) -- not with other types of articles. The rating / scoring system I'm proposing concerns the person/family pages, not other types of articles. If you want a scoring/rating system for those, feel free to propose one in the appropriate discussion forum.

-- jillaine 21:58, 7 April 2009 (EDT)

Jillaine, think about it. There are 2M articles on this site, most of which are person articles. NO, its not hard to do one, or ten, or 100... 2M is an entirely different matter. But more to the point, roughly how many of those two million do you think meet your various criteria? Q 22:03, 7 April 2009 (EDT) Q 22:03, 7 April 2009 (EDT)

Probably 95% do not meet the criteria. But so what? The point is to put something in place that encourages people to have something to shoot for. A bar. A standard of excellence. A standard that says: this is what we're shooting for here at WeRelate. Help us make that happen. jillaine 22:14, 7 April 2009 (EDT)

I think you are being overly optimistic. After you cast out the cards for people who show NO sources, plus those whose source is simply a GedCom file, or another unverifiable source, you'll probably have less than 1% of the cards getting into your second category. Scoring each card is a) not possible by hand, b) unlikely by automated means. The approach suggested by Dallan is plausible, but as Jrich pointed out, its probably a more complicated problem than can be done by machine. Might use the machine to winnow it down. But there are easier ways to show the way than grading each and every card. Q 22:32, 7 April 2009 (EDT)

please propose something, Q. jillaine 22:42, 7 April 2009 (EDT)

I believe I did, though I didn't elaborate. There are two elements that are needed.

First, a presentation of what a "good" person article would entail---ie, criteria, and the criteria definitely go well beyond sourceing. The BCG standards of proof, for example do include sourcing, but they include other elements as well.
Second, a set of examples selected to illustrate "good" articles--articles that meet the criteria to varying degrees.

One might also include examples of articles that have problems to be resolved in terms of their quality, but I think you'd get further by not going the critical route. Positive examples could be featured articles, but its not usually clear why they are featured, or what makes them good. If the issue is "good genealogy done here", then you need something to say "this is what doing good genealogy requires, and here's an example of that." It would probably be useful to mark such articles so they would be easily recognizable when people came across them. That's why the Barnstar approach works well on Wikipedia. I think Wikipedia gets carried away with Barnstars a bit---seems like they have Barnstars for just about everything, following the motto that everybody likes to be praised, so everybody gets an award, even if we have to make one up. (I believe that was what Jrich found annoyingly gradeschoolish about this approach.) But a limited set of Barnstars to highlight certain features well done might be appropriate. Then again, maybe a single barnstar to mark "good genealogy done here" would keep it more focused. Q 23:08, 7 April 2009 (EDT)

Mmm... Perhaps we're not so far apart as it was feeling yesterday. (And apologies if I got a bit snippy yesterday; a couple of you pushed a few old buttons of mine related to how feedback is provided and I let it get to me.)

Just out of curiosity, what buttons got pushed? I try to keep things objective, and have no interest in pushing folks buttons. So it helps to know when there's an inadvertant button pushing. Q 09:04, 8 April 2009 (EDT)

Sigh. I don't want to get too sidetracked on this, but basically, my buttons get pushed when someone (anyone) makes a proposal and then others come along and only point out what's wrong with it. MY part in this is that sometimes that's all I see-- the criticism-- and not the good points. I'm working on it. -- jillaine 09:26, 8 April 2009 (EDT)

I'm not proposing that we go through and rate/score 2M "cards" (interesting alternative to the term "pages"),

Traditionally, genealogy programs have referred to the pages that contain basic family information as "cards". I believe the term arose out of the Apples original "Hypercard" which eventually was superceded by the web. (Not sure how much Hypercard contributed to the web development process, but its awfully similar in style to what's characteristic of the web.) "Reunion" is a program that fairly clearly shows its hypercard roots. Q 09:04, 8 April 2009 (EDT)

but I am suggesting that we have some sort of criteria that we can use so that when we do find a really great person or family page that is well sourced that we can rate/score it. And that doing so would then somehow "lift it up" -- either as a featured page or in some other way -- so that people can see examples of good research. And if they want their pages to be so rated/scored, they know what they have to do to get there.

We could similarly see a potentially great page and give it a lesser score/rating such that people watching it could see what is still needed to be done to bring it up to excellence. So perhaps the rating might be more like this:

5-star - Meets BCG standards of excellence
3-star - On its way to excellence; needs one or more of the following (and list the BCG requirements for excellence)
red-flag (or something) - Could possibly be automatically added to all un-sourced pages as a notice/warning that said page has no sourced information at all and should be relied upon with caution. I realize this is a more negative than positive approach but would certainly get people's attention.

as in

? Q 09:04, 8 April 2009 (EDT)

Dallan, on this last, I wonder if there's a way to do some sort of scan during upload, that comes back to the user with a warning message: "Our scan of your file indicates that your data has no (or insufficient) source information included; currently WeRelate is not accepting such files. For more information, please visit... etc."

I think something like that was what Dallan originally suggested. Certainly seems do-able, though it could only be the jumping off place. The problem with this is that virtually every dang page on this site is going to have a red flag! Q 09:04, 8 April 2009 (EDT)

Well, I'm not sure I concur that virtually every page would have a red flag, but even if that was true, then so be it. It's incentive to encourage people to do what's needed to bring data up to standards.

I've looked fairly closely at the quality of work produced on Ancestry. Ancestry is well suited for looking this because you can use their search engine to pull up the number of cards that include notes, sources, and both notes and sources. I' used that as an index to whether folks were documenting their data, and attempting to present an analysis of it. There are problems with this approach, but as a first approximation, it gives a fairly decent picture of how things are being done by the users of Ancestry. This is more difficult to do on WeRelate, but I think the Quality on WeRelate is probably comparable.

On Ancestry a study of a 0.1% (I didn't do the statistics on this but given the size of the sample, over 600,000 cards, I think its reasonably accurate) sampling of the ten most common surnames showed that 20% of the cards included "notes", and about 25% showed sources. Cards that showed both notes and sources (which I took to be evidence of looking at things in more depth) amounted to about 5% of the sample. When I manually looked more closely at that 5%, I saw that probably only about 1% of the total were showing what I would consider legitimate sources (most were GedComs and other unverifiable sources.)

The significance is that if you applied your red flag to the cards on this site, virtually all (99%) would probably get a flag. I could be wrong, but I don't think that's going to encourage people to do better work---more likely its going to turn them off, and they'll go away altogether. Q 10:12, 8 April 2009 (EDT)

And how many pages can you point to that you think meet all 5 BCG proof standards? I've seen some nicely done pages here, but I don't know that I've seen any that meet the BCG criteria. Brag on yourself if you think you've got some of your own that meet those standards, and point them out. I personally have none that I think measure up to that. Q 09:04, 8 April 2009 (EDT)

There may be NO pages that currently meet all 5 BCG proof standards, but it would be a great to set goals towards which to strive. I could imagine seeing a scorecard on the home page-- a chart-- that shows the increase in the number of 5-stars. Research indicates that when you give a team of people data about how well (or not) that they're doing-- just hard data-- they are more likely to engage in behavior that improves practice. Okay, now let me go find the citation for that research. ;-) jillaine 09:26, 8 April 2009 (EDT)

-- jillaine 08:33, 8 April 2009 (EDT)

If the point is education, just having 100's of five-star pages will be too unfocused. If somebody does not know what is good, they will probably not be able to cull what is good from one example. To figure out what is good, they would need to find what is common among many examples, and probably will not take the time because the people covered don't interest them. What is needed is one, or a very small number of example pages, with annotations explaining explicitly what features of that page are good, and why.

If the point is let people know their work is appreciated, how about adding a button that says "This helped me". When somebody pushes that button it sends an email to all the people that contributed to that page saying "Thank you. Your work has helped me." But please, give us the option to turn off receiving these emails.

I have no problem with some monthly set of featured pages. I think this should be done for general interest rather than quality, the way NEHGS does their stories of interest in their newsletter. I have no connection with Abraham Lincoln but was interested in a story detailing how he no longer has any living descendants. In general, I find that reading about people with no connection to me isn't all that interesting. So the featured article needs to have broad relevance due to history, or some interesting genealogical issue.

Anything else that isn't applied equally to all 2M pages (someday to be 2B pages) strikes me as arbitrary, an accident of what page is read by whomever is empowered to award 5 stars. --Jrich 10:15, 8 April 2009 (EDT)

Featured pages aren't necessarily examples of genealogical thoroughness. They're generally chosen because someone happened to notice the article and add it to the nomination page, and they have a picture and an interesting story. I hope eventually we'll develop a more formalized process for choosing featured articles, but it's pretty ad-hoc at present.

Thinking more about this, if we tried to rate a reasonable number of articles we would have to use an automated approach rather than manual. There are too many articles to rate manually. If we were to use an automated approach, it would need to be pretty simple. I don't think we could automatically rate articles based upon the genealogical proof standard. It's probably even asking too much to automatically rate articles on a 1-5 scale. A yes-no scale would be do-able, like displaying an icon if the article contained at least one source, possibly requiring the source to not be a gedcom.

Alternatively or in addition to an automated approach that attempted to give a simple yes/no rating to person/family pages, people could be encouraged to give a human rating to pages. Not many pages would get rated this way, but the criteria could be more involved. Having said that, adding a barnstar to a page (another yes/no rating) might be more encouraging for people than getting a 1-5 rating. I'm not wild about putting red flags on 99% of the pages here. I like the "this helped me" button. We could keep track of the number of times that button was pressed (by different users) for each page and display that. We could even look at pages with a high "this helped me" count as featured page candidates.

On a related thought, I've been thinking that maybe one way to encourage people to cite sources is to let people a list of all of the source citations they have added to their pages, and show the total number next to their user name wherever their signature appears. (I'd omit the number if it was zero.) This wouldn't happen right away of course. But maybe there's a better approach.--Dallan 10:16, 8 April 2009 (EDT)

[add comment] [edit]

Alternative Approach [13 April 2009]

In response to Dallan's comment (and I think I'm repeating myself, but just to be clear)

yes, its not possible to manually rate all of those pages

See above, I am not suggesting we rate ALL pages. jillaine 10:45, 8 April 2009 (EDT)

yes, the only way that could be done is automating the process.

I still don't agree with this; especially if we are highlighting well-done pages that illustrate the kind of quality we're seeking, this could absolutely be done manually as Dallan suggests above. jillaine 10:45, 8 April 2009 (EDT)

Since we're in agreement on this, I probably should not comment. However, the point you were disagreeing with was that it was not possible to rate ALL pages manually. I think we are actually in agreement that rating ALL pages MANUALLY is not possible. Identifying (and perhaps rating, if people see a need) well done pages would be doable and very helpful to show what the site is striving for.---(You just couldn't do that and be sure that every worthy page was so evaluted). The Barnstar approach seems like a reasonable way to do this, though there are other ways this could be done. Lead by example. Celebrate the good, not convict the evil....(or in this case, the mis-tutored.) Q 08:57, 11 April 2009 (EDT)

The problem is, the likely tools needed for the machines to evaluate the quality of the pages, using anything like the complete BCG standards, just isn't there. Eventually, maybe will have that kind of capability...but not now. So, rather than try to climb the whole mountain all at once, perhaps we should be satisfied with smaller steps.

And I'm proposing smaller steps; see above. jillaine 10:45, 8 April 2009 (EDT)

Going back to Dallan's initial proposal to identify pages that lack any sources whatsoever could be the way to go. Perhaps rather than slapping a red flag on them, a note could be posted to the (talk page perhaps?)---something along the lines of "WeRelate is attempting to help its users improve the quality of their pages. This page would be improved by indicating the original sources on which the information (such as Date's of Birth) is based on. Can you help us improve this article by adding sources for such information?"--Then the message could point to appropriate pages for guidance in what is needed in the way of sources. Q 10:33, 8 April 2009 (EDT)

I LIKE this last idea. Nice. jillaine 10:45, 8 April 2009 (EDT)

This could be done pretty easily the same way Wikipedia does it with pages that are stubs, or which lack citations -- by having a template that says "The page has *NO SOURCES*! Please add sources! Otherwise, this page will be considered for deletion!" . . . or something (possibly less shrill) along those lines. Since it's a template, in addition to a bot adding it to source-less pages, anyone cruising the site and examining pages could add it as needed, either to the top of the main page (more highly visible) or to the talk page (possibly more polite). Those pages would be automatically added to the list in a category: Pages Without Sources. Users interested in those individuals would then be warned to hunt for sources themselves if they want them to survive, and possibly to merge them to the user's own pages. (I'm already starting to do some of this with the Hatfield pages I'm creating, and I've seen several other family groups that I'm familiar with and which need to be merged, or sourced, or both.)

Certainly, that would be one way to do it, though 99% of the pages would end up getting tagged. It would be less intrusive, less in-your-face than a "red Flag" appearing on the page. There's no distinction to be made when virtually everything is tagged. Perhaps this could be done with a selected set, or as people come across pages they'd like to see better documented. A common template, perhaps to be placed on the articles talk page, could be effective. But to do something like this, we still need the underpinnings---what are the standands that should be met. If we can't point to a set of standards, how can we tell people what they should be doing? Q 10:08, 11 April 2009 (EDT)

Perhaps we flag pages of people born within a certain timeframe to begin with, say 1600 to 1700 in the US. There are lots of sources for that time period, and those people tend to be the ones with the most watchers, thereby the ones that might most benefit from a flag. And I would bet that the percentage of pages with multiple watchers without sourced from that time period is actually much lower than generally, so it wouldn't be quite so overwhelming. If it has the intended effect, we can expand the project.--Amelia 12:00, 11 April 2009 (EDT)

Speaking of stubs: Would ya'll consider a page that has only the name and nothing else -- no dates, places, or relationships -- a "stub"? I've seen some like that. They're just floating out there, isolated. Should they be similarly marked? (Well, they have no real sources anyway, only the GEDCOM or "OneWorldTree".) --mksmith 09:42, 11 April 2009 (EDT)

If the pages are truly isolated, and linked to nothing, contain nothing, and if they've been floating around in e-space for sometime, they probably are just flotsom and jetsom, worthy of having their electrons freed to be put to more useful purposes. On the otherhand, that's a function a machine could perform more efficiently. Perhaps the critiera for deletion would be A) No data other than title, B) No link to any other page, C) been in existence for X months, without activity. Q 10:08, 11 April 2009 (EDT)

And Dallan, what about my question above about some sort of scanning upon GEDCOM upload that displays a message and does not accept the unsourced GEDCOM? jillaine 10:45, 8 April 2009 (EDT)

I want to set a minimum standard for uploaded GEDCOM's, especially large ones, but I don't want to set the bar too high that it discourages people from participating in the community and thereby learning how they can improve over time. I think about this like learning to play chess -- you wouldn't want to tell people that they had to play at a certain level before they could join an on-line chess playing group. What I need to do is analyze how many of our current GEDCOM's didn't contain sources, or didn't contain either source or notes.--Dallan 11:14, 8 April 2009 (EDT)

Well, . . . to carry out your chess club analogy, Dallan: If your club meetings usually attracted fifty regular players who knew the game (never mind that some players were better than others), and the meetings were suddenly flooded by a thousand new people who didn't even know what a chessboard looked like but insisted on having a place at the table, . . . how long would your club survive?

I'm definitely not saying beginners or "name-collectors" shouldn't be allowed in the playground. My wife and I teach classes regularly in which we try to teach those folks how to do "good" genealogy. And she's the Examining Genealogist for the First Families of Louisiana Program, which is also about encouraging better research standards and reporting methods. But a lot of people who simply accumulate GEDCOMs have no interest whatever in doing actual genealogy. They'll upload here (and at every other site they come across) and walk away. They're not teach-able. So the question becomes, does WeRelate simply allow anyone who passes by to dump their garbage on its front lawn and drive away? How many family researchers who discover this site look around, note the very high proportion of OneFamilyTree clones, shake their heads, and write WeRelate off as yet another dumping ground? --mksmith 12:32, 11 April 2009 (EDT)

Its a good point, but as I look down the corridor into the future, I think this is going to become less troublesome. Despite my concerns about the willy nilly merging going on, there's real merit in having only a single card per person. Eventually, that will tie everything into one large integrated tree. What that means is that folks will no longer be able to simply add a new branch or build an entire tree here, as the persons they are interested in will already be embedded in the tree. At some point, people are going to have to start working on the data that's associated with each card---finding supporting evidence, building narrative descriptions, etc. Good genealogy is going to start to drive out the less good. That means better sourcing, better articles, and a better educated user community that understands the values of sources etc. So if some folks today are dumping poor quality work onto the site, I'm confident that such work will eventually be scrubbed away. In the meantime, I'm not worrying about it too much.

That's not to say I don't think we should encourage people to do good genealogy here. We Absolutely Should. Finding the right way to make that happen is what we're really talking about. That may be a with a rating system, or perhaps with a barnstar approach. Perhaps both at once, carrot and stick, perhaps something else. What ever it is, I like Amelia's suggestion above, that we might start with a certain time period---say 1600 to 1700. Perhaps Dallan can tell us what fraction of the total number of cards have DOB's between those two dates?

But some ground work is needed first---there needs to be a statement concerning what's expected in terms of standards that each person page should meet. We need to identify pages that we think meet those standards, so we've got something we can point people too as models to emulate. Perhaps rate or flag the pages that don't meet the standards, perhaps barnstar or feature especially good articles that do meet those standards. Then either rate the pages in the target period, or otherwise identify pages that need improvement and send the appropriate users an appropriate message asking them for their help.

Would someone like to take a shot at identifying the standards that should be met? Stictly BCG, or do we need something broader, less intimidating, that can be achieved by many people? Q 17:33, 11 April 2009 (EDT)

Excellent points, well made. In fact, most of this should be tucked away for a Statement of Purpose page. --Mike (mksmith) 20:37, 11 April 2009 (EDT)

The fastest way to teach people is, surprise, lecture. In other words, tell them what you want. So yes, the first step is putting together the criteria. Flagging pages is feedback to the author, but is not very effective for teaching to the wider audience, or for educating new users. Once you have criteria written, then find a small number of clear examples of each point, and maybe another small collection of counter-examples to show what not to do, pointed to by links where the criteria is explained. It adds to the authenticity to use real pages for examples, instead of making them up, at a slight risk of diluting the clarity of the example. --Jrich 21:18, 11 April 2009 (EDT)

The concept of "Mentoring" comes to mind. "Better genealogy one person at a time." Q 21:35, 11 April 2009 (EDT)

I just want to mention one thing to note, when I uploaded my GEDCOM, I purposefully omitted my sources and notes. I wanted to avoid creating a MySource mess. I wanted to put my "shell" of information online, then figured I would take the time to add my sources correctly afterward. I would hate to "judge" GEDCOMs based upon that criteria.--Jennifer (JBS66) 11:25, 8 April 2009 (EDT)

Some stats: of the roughly 2400 GEDCOM's uploaded, just over 500 of them (22%) don't list any sources, and just over 300 (12%) don't list any sources or notes.

As an alternative to disallowing GEDCOMs without sources or notes entirely, we could flag them for administrator review and ask admins to contact the uploaders to ask if they plan to add sources/notes once the GEDCOM is uploaded. This is more work for the admins, but maybe not too much effort since relatively few GEDCOM's don't have any sources or notes.

As an aside: Once we get source matching working later this year, I'm thinking about not creating MySources anymore for uploaded GEDCOM's. Instead, the information for unmatched GEDCOM sources would be added to an expanded source-citation section directly on the person and family pages.--Dallan 12:06, 8 April 2009 (EDT)

Since you're talking about "sources" that are just the date of the GEDCOM upload, etc, you could almost just put that info in the page's text box, rather than creating source statements for what are actually non-sources. --mksmith 12:32, 11 April 2009 (EDT)

To clarify, if there were 300 GEDCOM's that didn't include sources or notes---that means a substantial number did---but those stats are for an entire submission, not for individual pages. The distinction is significant. If a submission contained just one source and one note, then the entire GEDCOm would get scored as having sources and notes---To get comparable numbers as I cited for Ancestry, you'd need to examine specific cards, not the entire GedCom. Q 12:24, 8 April 2009 (EDT)

That's right -- my guess is that about 1% of the pages here have real (non-gedcom) sources. This discouraging number is a big reason why I don't want to be too hard on people that don't source their work. I'd rather have them become part of the community where peer-review will help encourage some percentage of them to start sourcing than raise the bar so high that most people won't want to participate.--Dallan 22:14, 10 April 2009 (EDT)

Which is another reason why wholesale rating of pages might not be such a good idea. Since the vast majority of folks are not citing useful sources, most pages would get red flagged, or whatever. Highlighting that probably sends the wrong message, and would make the site less welcoming. Would it be helpful to create a set of standards that are to be strived for? That could include, of course, sourcing, but it could also include other aspects of genealogy as well. Perhaps a list of example pages that embody the sites goals? Of course, first we have to establish what those goals are. Q 08:57, 11 April 2009 (EDT)

In terms of trying to alert new users to good practices, the idea of putting a message on their Talk page about (lack of) sources would be good. Make for a pretty long message if 5000 people are uploaded with zero sources. Maybe if the GEDCOM has 10 or fewer pages without sources, you list the pages, otherwise you just say that 50% or whatever percent of the GEDCOM lacked sources, could you please review it and add any sources you can, and point to a help page that gives a brief explanation of what kind of sources are most valued.

GEDCOMs without sources are not without value. If forced to vote, I personally might say block them in the hopes that eventually somebody else will come along with sourced data, and in the long run, we'll be better off. But that takes a lot of faith and patience, and I have to admit that as long as they are only adding new people, the information and connections they provide, may give a clue to some other researcher who then adds the sources. And certainly, there is catch-22 for WeRelate, to attract people you have to have people, so at this stage, scaring off people has its drawbacks.

The biggest worry about Junk Genealogy is protecting data that is already there. The hard part of Junk Genealogy is that most of it has sources, they are just outdated by newer research. So people can input Junk Genealogy with sources, and it is hard to think of a way to catch them. Again, if there is no data there previously, even outdated data has value, as it starts the process. But if the data has already been refuted in a detailed explanation, then we don't want the GEDCOM update overwriting things. It would be nice if there was something like the nomerge template that caused updating of pages by GEDCOM updates to be blocked, but I think the 5 watcher rule is a start, and probably less prone to misuse.

I am not so much worried about manual updating, though as mentioned in a different place, I feel it would be nice if the presence of a discussion on the Talk page was more prominent, and even if you could add a flag T1 to facts that says be sure to review Topic 1 on the Talk page before changing this birth date or something along those lines. --Jrich 13:25, 8 April 2009 (EDT)

I too believe that the more important concern is that unsourced GEDCOM's don't degrade existing pages; adding unsourced pages for new people that we don't already have in the system isn't as bad because those pages will hopefully be improved later, especially if we can make adding sources easy. I'll make sure that the nomerge template works for GEDCOM uploads as well as regular merges.

Yeah, this is a major concern, I think. I would hate to see properly-done, well-sourced pages being auto-merged with "junk" pages where a mere statement of GEDCOM upload date is given equal weight as a "source." --mksmith 12:32, 11 April 2009 (EDT)

There has been a lot said on this topic that I have not read and I have flipflopped on my opinions about allowing gedcoms with no sources or sourced by another tree. The new merge capability with the gedcom upload should help with the duplicate pages. I believe that after a gedcom is uploaded that meets the criteria of no sources or only sources with another tree, such as WFT # such and such, should be flagged for deletion after a period of 6 months. We could notify the user and allow the user to request an arbitration on the decision if they so desire. We could also state this on the gedcom upload page. I don't support a rating system as such; but don't oppose giving a gold star to super pages as a feature to encourage users to adhere to certain standards. Some of my research is only on WeRelate so please don't decide to delete my pages; it is a work in progress. There are sources but some are indexes from Ancestry. The proof is in the sources but have not the time now to write the proofs for the article. If you do decide to delete pages I would like you to consider whether the user only has the data entered on WeRelate or also entered in another database. When the gedcom download is activated that will not be a problem, but now it is a major problem. --Beth 20:11, 12 April 2009 (EDT)

This is a tough problem. We could try to filter bad GEDCOMs, but really, the problem isn't so much bad GEDCOMs as much as it is bad GEDCOMs that are then abandoned. Anyone, and maybe everyone, starts somewhere, and often that place is sort of feeble. If someone uploads a "junk" GEDCOM, but demonstrates a commitment to improving the data, it's really not junk - just a work in progress. On the other hand, a weak GEDCOM, that isn't purely junk, could be abandoned here and create just about as much hassle and chaos as pure junk. We're often looking at data of the latter sort, wondering what the better choice might be. Semi-ok data may actually be the harder problem, since it's less clear what to do with it.

For these reasons, my proposal has long been a small GEDCOM "newbie" limit. Until someone generates a track record of individual hand-edits, demonstrating a commitment to really using werelate, they really shouldn't be able to load more than a few dozen - or perhaps a few hundred - total pages via GEDCOM.

Dallan has heard this suggestion from me a bunch of times, and those of you who have been working the site for a while probably have too. I'm not sure to what extent Dallan may have adopted elements of the idea or not, but I toss it out again in hopes that it may be useful...--Jrm03063 09:34, 13 April 2009 (EDT)

And I've suggested including "before you upload" text that stresses what kind of place WeRelate is and what kinds of GEDCOMs we're seeking. He's incorporated some of this into the new text for uploading, but I'd almost rather they see something before they even get to see the "upload" button. Not quite a user agreement, but a "hey folks, don't upload junk here that you don't plan on maintaining..."

Dallan, I'll volunteer to be on a GEDCOM review committee.

-- jillaine 11:32, 13 April 2009 (EDT)

I think a "read before you upload" text might be talking to the wrong people. It's not a bad idea to include such an admonishment, but I don't imagine it will have much effect on the people who upload and abandon junk GEDCOMs in the first place. I mean, we would pay attention to it, but they wouldn't. On the other hand, the idea of limiting the size of permitted GEDCOM uploads until a new user has established his bona fides might actually work -- although it's a little late now. :)

I still haven't uploaded a GEDCOM. I've done about a hundred pages by hand, in those family groups where my own research is most active, and where I can add useful text in addition to mere vital statistics. (And I'm looking at the occasional duplicates I'm coming across and merging those in as I go along, so my network is also growing quite a bit). I'm not sure uploading a large GEDCOM at this point serves any useful purpose -- for me. I'm actively pursuing perhaps 15%-20% of the people in my database. The rest are grandchildren of siblings, and in-laws, and cousins of cousins, and whatnot -- people whose identity I'm interested in for contextual reasons, but not "real" members of my direct family. Which means I don't have much information about them, so I don't see a point in creating yet another nearly empty page. Not yet, anyway. I'm aware that my attitude toward GEDCOMs is out of step with most people who use them, but I guess I believe there's such a thing as making a worthwhile endeavor too easy. --Mike (mksmith) 12:48, 13 April 2009 (EDT)

But Mike, that 'nearly empty page' you don't upload may be the one thing someone else needs to find to be able to connect other family members. Because I'm a Jackson researcher, I've accumulated a lot of info about folks that are not even my immigrant ancestor's descendants; no relation to me at all. I want to eventually put it all on werelate where someone else can see it and benefit from it, even though I may have nothing but census records to put on their pages.--Janiejac 14:51, 13 April 2009 (EDT)

Mike makes a good point. If the warning is pointed to people submitting GEDCOMS that are not sourced, they won't understand that its directed toward them. After all "they know their work IS sourced (and very very good too), because they took it from someplace on the web".

If you want to get people to do better genealogy, you probably need to go one on one with them. But that means they have to be in communication, and that means they can't be excluded because the gate was set so high they couldn't get in.

But, if the objective is to have only well sourced person articles, then just go back to the original idea and bleep out every article that doesn't contain at least one non-Gedcom Source. Dallan can do that almost automatically. And there's lots of benefits to that, too. Among other things, it would certainly make the merging process easier, as you'll have eliminated 99% of the person articles per Dallans statistics. Q 15:42, 13 April 2009 (EDT)

[add comment] [edit]

An additional suggestion [11 April 2009]

What would help me as a researcher as well as help WeRelate in its goal to quality, sourced data, is an automatically generated list for each of my trees. Perhaps it's under "My Relate" -- but basically, it's a selection that says something along the lines of "View unsourced info in your tree(s)". It's basically an automated "to do" list. I could use that RIGHT NOW. ;-)

-- jillaine 07:33, 11 April 2009 (EDT)

[add comment] [edit]

NGS Standards for Sound Genealogical Research [14 April 2009]

Copied from Standards for Sound Genealogical Research (note the copyright notification below) (I've bolded those items that seem particularly relevant to the topic at hand).

Standards for Sound Genealogical Research As Recommended by the National Genealogical Society

From the National Genealogical Society, for About.com

Remembering always that they are engaged in a quest for truth, family history researchers consistently —

record the source for each item of information they collect.
test every hypothesis or theory against credible evidence, and reject those that are not supported by the evidence.
seek original records, or reproduced images of them when there is reasonable assurance they have not been altered, as the basis for their research conclusions.
use compilations, communications and published works, whether paper or electronic, primarily for their value as guides to locating the original records.
state something as a fact only when it is supported by convincing evidence, and identify the evidence when communicating the fact to others.
limit with words like "probable" or "possible" any statement that is based on less than convincing evidence, and state the reasons for concluding that it is probable or possible.
avoid misleading other researchers by either intentionally or carelessly distributing or publishing inaccurate information.
state carefully and honestly the results of their own research, and acknowledge all use of other researchers’ work.
recognize the collegial nature of genealogical research by making their work available to others through publication, or by placing copies in appropriate libraries or repositories, and by welcoming critical comment.
consider with open minds new evidence or the comments of others on their work and the conclusions they have reached.

Permission is granted to copy or publish this material provided it is reproduced in its entirety, including this notice.

-- jillaine 16:10, 13 April 2009 (EDT)

That's a good set. Similar to the BCG, though with a bit more elaboration. This and the BCG proof standards would be a good starting point for a set of goals for this site. Believe Mike referred to it as a "Mission Statement".

If the goal is to have a single card for each discrete individual, and to create a card for every person whose lived (not really, but perhaps in theory) then that could/should be part of the goal statement. Perhaps we should coin a new term to describe this----"Integrated Family Tree". Is that analogous to Ancstry's "One World Tree"?

Perhaps what should be stated as part of a site Mission Statement is that when people upload their GedCom, it should be done with the realization that this is a wiki, and to anticipate that "Their" tree will eventually be integrated into the "Integrated Family Tree". Q 16:33, 13 April 2009 (EDT)

The above might lead us to this type of criteria. Yes, I've chosen a different word than Barnstar. -- jillaine 16:25, 13 April 2009 (EDT)

Pages that achieve “Source-Star” status at WeRelate are those in which 100% of the information provided is well cited per the NGS standards. This means that all information meets the following criteria:

*each piece of information cites a reliable source document

* where specific citation is unavailable, the data is supported by convincing areasons for conclusions reached

I'm not overly enamored with "Barnstar". Its just a work that people are familiar with if they've been around Wiki's, and if that's to be done, than another term might be a good idea. Barnstar IS kind of clunky. On the other hand "Source-Star", is probably not the most memorable choice.

Can you point to A page that you'd currently give a star to? Q 16:36, 13 April 2009 (EDT)

[add comment] [edit]

Transfer to Mission Statement [14 April 2009]

Comments Transferred to Talk:Mission Statement

User:Jrich developed what amounts to a Mission Statement for WeRelate, embodying many of the comments above. User:Jillaine made several modifications to that Mission Statement. I've transferred this to a separate article called Mission Statement. That includes a talk page, (of course) where further comments can be made. In the meantime, we can modify the actual Mission Statement, without disrupting the article itself with commentary. Q 15:15, 14 April 2009 (EDT)

[add comment] [edit]

Template for insufficiently sources pages [14 July 2009]

Consider using something like the following at the top of particularly annoying pages you come across while merging or otherwise working on the site::

Excuse me. We don't mean to hurt your feelings, but um, this page? Well, it's really lousy. Without having any decent source info, it's just a bunch of 1s and 0s on the aether, and frankly, we don't really want it here. But don't feel bad, it's not your fault; you don't know what you're doing. And we don't make it very easy for you to figure that out anyway. And you're not alone: See Category:Source Warning for a list of other crummy pages.

-- jillaine 17:17, 21 April 2009 (EDT)

Eventually, this will be a very desirable feature---when poorly done pages are not the rule. I'm afraid that if that was used systematically now, it would be on virtually every person and family page on this site. You would not be able to go through a lineage and not see this. And if the argument is to only use it on the very very very annoying pages, then the counter argument is that 99% of them lack original sources, and very very very annoying---though for me personally, I don't find this so much annoying, as simply genealogy not well done. Q 19:20, 21 April 2009 (EDT)

I'm thinking of pages where NO source information is provided at all. Or worse yet where no DATES are provided. My favorite one earlier today was related to a merge. Both targets were something like

husband: John Smith
wife: Mary Jones

no parents listed

no dates or places born/died/married.

More annoying than source-less dates. I now understand Beth's desire to dump such doo-doo.

jillaine 22:13, 21 April 2009 (EDT)

How about marking pages that include only tertiary sources, such as GedCom's and personal communications that can't be verified long term? Q 08:38, 22 April 2009 (EDT)

Here is perhaps the worst page I have ever seen [[1]] It is so filled with crap, I'm sure the submitter hasnever even looked at it.--Scot 14:36, 14 July 2009 (EDT)

Breathtaking! Especially the inclusion of a SSN! If there was ever a justification for calling something "junk", it was probably met on this card. However, apart from the seemingly endless stream of useless references, there's actually some useful information there. I've trimmed that out. (Made a discovery in the process---if you hold down "delete", the system apparently just enters a LOT of "delete's into its cue", and that endless string of useless references was toast in about a minute.) I'd recommend not deleting the card now that its somewhat cleaned up. Maybe Dallan should go back in and free the elctrons of the backup copies that include the SSN Q 14:58, 14 July 2009 (EDT)

What there do you consider useful information, she was born in 1690 in Virginia to parents from North Carolina, alt name joshua, Chistened in 1732 in England, died in 1843 in Minnesota, then again in 1864 was living in Kentucky in 1910 and is believed to be living aged 319 yrs.--Scot 15:14, 14 July 2009 (EDT)

Scott pointed out to me that the actual content is, ahem, flawed. Its probably a merger of two persons widely separated in time. I suppose one could sort them out, but the truth is, he's right. Worst junk ever. Good for delete. Q 15:02, 14 July 2009 (EDT)

I understand the desire, but I think this terminology and template is a little harsh from the perspective of the newbie apt to encounter it. What about using the wikipedia terminology and call them "stub quality" on account of limited or inadequate sourcing...?--Jrm03063 08:51, 22 April 2009 (EDT)

Updated the template, softening the language SOME. Also softened the color a bit. jillaine 10:14, 22 April 2009 (EDT)

Overall, I agree with Jrm. I think this is counterproductive. But if you are going to apply this, you need some fairly precise standards as to what gets this treatment. Please point to a page that you think deserves this. Q 10:34, 22 April 2009 (EDT)

I would rather prefer not to see this 1.9 million times on the site. There's already {{cn}} which serves a similar purpose. The solution to Mary Jones and John Smith is to delete the page (although I know some people think that might be rude, it's no ruder than slapping a big "caution" warning on it.)--Amelia 10:36, 22 April 2009 (EDT)

Agree with that too. Also, thank you for the pointer to {{cn}}. That's handy. Might there be another such template that's more specific to this problem---ie, "source needed"?

Personally, I wouldn't spend the time inserting this kind of thing on those 1.9 million pages. But I suppose if one really felt the need for pointing out a problem on a specific article, then {{cn}} or something similar, might do the trick. Relatively innocuous (and non-antagonistic). Possibly something that could be added to that template would be a link to a page explaining in general what's needed. For example if someone see's this flag, then goes forward and provides a citation or source where the information is coming from, that would be good. But if the source they insert is "Smith Family Gedcom", then not much has been gained. So what might be needed is some guidance as to what's needed. Q 10:54, 22 April 2009 (EDT)

I like the {{cn}} template as well, but it's used on a per-fact basis, not a per-page basis.

That said, you all make good points. I just get so darned frustrated when I'm spending hours of volunteer time cleaning up so much horse manure. I guess it's times like those that I need to take a break and go work in the garden. -- jillaine 12:24, 22 April 2009 (EDT)

If we are keeping all of these pages, I don't see any reason not to use a template as Jillaine suggested. We could rephrase the template to be a little more user friendly. Something along the lines of Please adopt me; I am in desperate need of evidence and documentation.--Beth 16:47, 22 April 2009 (EDT)

You really want to see that red smear on virtually every page on this site? Q 16:52, 22 April 2009 (EDT)

Hey Bill, it doesn't have to be red, looks orange in my browser, <g>. Now I really don't want to see the pages on this site, but you already know that. What is acceptable to you? Can't we put something on the pages? --Beth 17:48, 22 April 2009 (EDT)

I don't think any of us have a veto---except Dallan. Using the CN template where needed seems reasonable. If someone felt they HAD to mark entire page, then something less intrusive than this. In general, with the ORANGE template:

A) This is wholly un-needed. If the page bugs you so much, fix it. Just be sure you cite proper sources, and I don't mean secondary and tertiary sources.

B) It is counter productive. Unless you're willing to show them what's needed, then there will be no effective result. Except probably to drive some folks away.

C) in due course these problems are going to get fixed anyway, so why stir up ill feelings?

Frankly, this is a suggestion that is simply not in the best interest of the site. But hey, if Dallan wants to do it, then by all means. Q 18:46, 22 April 2009 (EDT)

To be fair, Beth, I keep editing the template color, so it very well may have been RED when Bill first looked at it. I also keep editing the text to make it less offensive, but I'm not sure the readers here are hitting the refresh button. And Bill, NO, as I've repeatedly said elsewhere, I'm not talking about marking 1.9 million pages with it, however much 1.9 million might deserve it. We've discussed elsewhere on this page about starting with a particular set-- such as Colonial New England pages that have few if any source information.

But since this is clearly making even a few people angry and cranky, we can just drop the whole thing. Consider it a rant of a frustrated volunteer. Back to my garden.

-- jillaine 19:42, 22 April 2009 (EDT)

I don't know of anyone who is angry. One of the wonderful things about this site is that we can have open discussions and we agree to disagree. We usually sort it all out and come up with a good decision that hopefully is best for WeRelate. There is not some executive board that exists, not naming names , who says you cannot do this here. Now that is fantastic. Now I have been looking into other alternatives and found this message on this site. Look at this page, at the bottom of page see the Improve genealogy by editing this page. Here is the page link. [2] Maybe this would be acceptable to all. We could change it to say add documentation, dates etc. Whatever needs to be added to improve a page.--Beth 20:25, 22 April 2009 (EDT)

No, this person, at least, is not cranky with Jillaine. Also I noticed the color of the template is now a nice shade of pink; definitely an improvement. (One might say that the current wording is probably closer to Jillaine's intent, though probably meant with tongue in check. (G)). Yes, I understood tht Jillaine only want to do this selectively. The truth of the matter is, something like this is either policy or not. Doing it selectively also creates a problem. Specifically, it sends a very wrong message. What it says is "Your card is crap. Everyone else's card that has exactly the same problem but they are OK. Its YOU that's the problem". That's NOT what Jillaine intends with this template, but that's the message its occassional use would send, intended or not. Using the cn template as Amelia pointed to is much more precise, (though more time consuming), and definitely not hostile, non-threatening, and is help oriented. if something could be crafted along those lines that would fit the needs of an overall page template, that would be good.

Personally, if the objective is to help improve the quality of work placed on WeRelate, I think the mission statement is where the focus should be initially. Define what the site is about, that its a collaborative effort, and that people should expect that their work will be incorporated into a single monolithic "WeRelate Tree", and that we are all working to improve the quality of that tree...lets them know what they are getting into.

But you have to set the stage first though the Mission Statement. Then when a page is changed by some one other than the original creator, that change can be seen as something not directed toward them personally, but part of a process of improving the "WeRelate Tree"---which we are all contributing our efforts to.

One way to foster this might be through a standard template that appears near the top of every page---perhaps that could state what the common goal is (once its been defined in the mission statement.) Another approach would be to state the current site focus. Presently, the focus is on merger. Maybe later it could shift to something like improving documentation, sources and citations. Then it could move on to say improving.... Using such a template would make the point that you should expect others to be working on "your" family tree, and would not single anyone out for beating about the head and shoulders.

An example might be:

The WeRelate user community is currently focusing on the elimination of duplicate person and family pages.

The purpose for this focus is to allow us to bring together the best and most reliable data about a given individual or family, on a single card.

Also, Jillaine has been working on the "Mission Statement", starting with the original input on it from Jrich. I think its coming along nicely, though its still in its early throws of development. So far Jillaine is the only one really working on it, but I intend to start doing more lifting and toting, rather than commenting. Its a task, however, that is better done with the help of the many, than with the few. Anyone getting to this current page, concerned with the subject, might want to consider working on the Mission Statement.

Q 09:39, 23 April 2009 (EDT)

[add comment] [edit]

User Community Focus [26 April 2009]

How about we take a page from Wikipedia, and tag person/family pages as stubs of various levels? "This page is a stub lacking in source documentation. Please help improve WeRelate by adding sources if you can." Or "This page uses no primary sources. Please help improve WeRelate by adding primary sources or explaining why they are unavailable for this person." or "This page has no descriptive information about this person's life. Please help improve WeRelate by adding additional information." Very basic and straightforward, no colors, and expected on a wiki.--Amelia 14:01, 25 April 2009 (EDT)

Sounds good to me Amelia. Similar to the Improve Genealogy by editing this page that Familypedia is using that I mentioned earlier in this topic.--Beth 19:06, 25 April 2009 (EDT)

Yes, ultimately, that's where I think we need to go. The key for something like this is that it can't be threatening or condescending---otherwise you accomplish little more than driving someone off. The real problem though, is that in an environment where most pages lack sources, let alone primary sources, virtually everything gets tagged with a score of "0 out of 5".

The "focus banner" I was pointing to would be one way of letting people know what needed to be worked on across the board. I was suggesting one which emphasized the current focus on merging, mostly because it stressed where WeRelate is going. The follow up banner I was thinking of was something like what you are suggesting...say something like...

"The WeRelate user community is currently focusing on the improving source documentation for person and family articles. Please help us improve WeRelate by adding sources to persons or families you are familiar with."

Because such a banner would appear on EVERY Person or family page, but not refer to specific quality issues with that page itself, we'd not be in the condescending mode of telling people their work is not good. Instead, its a generalized problem that we are working together as a group of users to overcome. Q 19:16, 25 April 2009 (EDT)

Q 19:14, 25 April 2009 (EDT)

What about pages that already have different types of banners? I am not sure I am keen on the idea of having multiple banners on a page.--Beth 19:56, 25 April 2009 (EDT)

I would hope any banner going on every page would go at the bottom, otherwise I vehemently protest its existence. Stub notifications also go at the bottom.--Amelia 20:00, 25 April 2009 (EDT)

Yes, I think we can probably all agree that a banner such as on the SWVP would not look good splashed across every page. Jillaine template would have similar drawbacks.

However, what I was thinking of would be something fairly discrete up in the page menu bar--perhaps immediately below or above. Could also appear at the top of the left hand side bar. Or as Amelia suggests at the bottom of the page, though there its non-prominance would probably defeat its intent. Q 20:07, 25 April 2009 (EDT)

Here are two ways something like this could be done.

And

Newbie's perspective:

He/she isn't going to know what you mean by 'card' but can probably understand person 'page'.
If he/she doesn't have sources, telling him to add them probably isn't going to change his ways.
You could say please add sources to 'this' page, not 'your' page. Maybe another viewer knows a source that the author doesn't know. I expect all of us have some unsourced info that we gathered when we first started out and didn't understand the importance of keeping track of the source. But does that make the info so bad that it should be tossed out?

I think the second example (on the left) is more likely to be seen. The top one is easily overlooked.

Since I haven't merged pages other than my own, maybe I'm not seeing the problem correctly. But I don't see the need for a banner or notice or anything of that nature to be placed on unsourced pages. Won't those pages be ignored or eventually improved if folks find connections to them? I think when WeRelate is out of beta that you'll be having a lot of this kind of thing and it may not be worth spending this time on. Perhaps an email to the author inquiring about their sources may be more effective and could open up good communication and encourage more participation on their part for merging their own duplicates. Sort of mentoring. A lot of this (helping folks with sources and merging their dups) could be ironed out on forum pages if that is ever implemented. To me personally, the forum has a much higher priority than banners on unsourced pages. --Janiejac 10:22, 26 April 2009 (EDT)

I think the idea is that yes, the pages will be improved as people work with the data on WeRelate. But first you have to get people to source their information. The idea of the Focus Point is that this acknowledges the intent and desire of the user community to improve the source documentation for people articles. There are limitations to the approach, but its a whole lot better than saying "Your work stinks". Or alternatively, you can just start improving articles on you own. Since the idea now is that this is supposed to develop into a group, monolithic tree, with no duplicates, you've every bit as much right as the next person to start improving specific pages. EVERYBODY here has poorly sourced pages. That includes, me, Jilliane, Beth, you, JRM, et alia ad nauseum---no one that I've looked at is immune to this problem, including myself. So start with our own cards---at least you're familiar with them. Find one that needs better sources, and start working at it. Q 10:52, 26 April 2009 (EDT)

I'm not especially invested in the idea of marking pages one way or the other, but if we're going to do it, it ought to not just create noise on the page that's easily ignored. I didn't see either notice on either sample until I looked very carefully, and there's absolutely no way I would notice a change in the text from one focus to another. I agree that card is not a good term, and that we should avoid "your" page - there's no such thing, and we ought not to cultivate that idea. I don't give much of a fig what the "community" is doing, and I'm actively involved in it. But then, I don't have a problem being judgmental and labeling pages based on their problems. I don't a problem with something showing up on 95% of pages as long as it's not huge and colorful, it's accurate, and it can be removed when the problem is fixed. If we're trying to focus effort on pages that need them, a universal label doesn't help. We have stub notifications on a vast majority of the places and sources already, it would not be weird or rude to have them on person pages - it doesn't say your work stinks, but it does point out that this page is worse than some others.--Amelia 12:29, 26 April 2009 (EDT)

Our focus should be to encourage the public to improve the existing pages. I want them to understand that all of the pages on WeRelate are community pages and anyone should feel free to contribute. Perhaps placing some type of general notice on pages would encourage improvement. I suggest that we begin with the pages that have no dates, places or sources or perhaps not. Exactly how would one identify these people to begin to improve on the page? Then we have pages with dates and no sources; we have pages which source a tree or Ancestral File. The majority of my pages have no narrative or proof. We need to encourage people to use the Wiki as it was designed to be used. --Beth 14:39, 26 April 2009 (EDT)

I think the time to address this is when people sign up. The sign up process should require them to read through a small set of materials that constitute what they have to know to use the system (Some of my recent posts are good examples of someone who should have read more of the instructions). Other than that, as stated in similar issues elsewhere, all I would be in favor of, would be an automated system. For example, in the More menu, a choice like "How to improve this page" which does a scan and points out that there are no sources or deficient sources. However, quality of sources strikes me as very hard to assess, so I really only favor identifying lack of sources. Anything done by manual effort will be inconsistent and selective in its application, and I think would be ill-advised in an open-to-the-public system. --Jrich 15:54, 26 April 2009 (EDT)

[add comment] [edit]

Overall [2 May 2009]

Having thought about this a bit over the last couple of days, I'll make a few observations.

For the most part, those who have participated in this discussion do so because they'd like to see better work done on WeRelate. I suspect that in many cases they are offended by genealogy ill-done. What they individual think of as "ill done" probably varies from person to person....but they'd all like to see somethings done better. Things that probably draw people's ire include:

The offending page gives information that the offended party KNOWs are wrong
inconsistent data (Daughter born three years after the death of the mother; someone born in an area that wasn't settled until 50 years later, etc)
lack of citations pointing to the source of information
lack of quality citations (e.g., pointing to GedCom's and thinking that meets the need)

I'm sure there are many other things that get people in the mood of describing things as "junk genealogy". (Feel free to add to the list.)

So, whatever else that can be said, there's a perceived need to improve the genealogy being done on this site. Great idea! How do we make that happen?

The main suggestion has been to insert something on the page "flagging it" as "needing improvement. This is not a bad idea, and I was originally attracted to it myself. But after looking at what has been said, I came to see drawbacks and limitations to this approach. The main problem is that the majority of genealogy done on the web (and on this site, unfortunately) has many of the problems flagged above---especially those related to citations and sources.

I see two main problems with doing that: first, there are just too many cards that have these kinds of problems. Flagging virtually everything serves no useful purpose, I think. Second, the people whose work you want to see get better, are probably not paying attention to the site. Most users of this site are drive by shooters so to speak. they come, they upload, they go. Very few stick around to actually work on the site. (There are reasons for that, and its not at all unique to WeRelate. I see this on other wiki's as well.) If you change one of the pages they've inputted, you MAY get a response from them, but probably not. Most have simply gone away, and have no intention of working the site itself. There are probably fewer than 100 persons who have continued to work the site at least on a periodic basis.

So, if the object of flagging cards is to get people to do better work, the reality is, not many folks are going to see those flags, letalone pay them any attention. (Those that do see them will be the 100 or so dedicated users, and either they won't think it applies to them, or don't need the admonition in the first place.) I doubt that flagging cards is going to effectuate a useful change in the the quality of work done here.

What I do think will work, is one on one discussions with folks over very specific matters. Much like I've noticed Beth doing in recent weeks. Pointing to cards where she finds problem, adding suggestions, etc., but focusing on cards created by people actively working the site. I think doing something like this is far more likely to have a positive effect on the quality of work done on the site, than flagging every card that offends, but which were created by folks no longer active on the site. If the idea is that you want THEM to make the changes, you should probably pack a lunch, because they probably will never see the flag. And if they did, probably wouldn't understand what they are being told. Q 09:03, 1 May 2009 (EDT)

Thank you! I have to agree with you!

Good summarization. Many people are busy cleaning up sources and merging duplicate pages to get WeRelate ready. But another activity that other people can use to better WeRelate is to take some of these under-documented pages and "donate" some research. Given society's current antipathy towards reading instructions (my father is turning over in his grave), the best way to teach people is by example. If they come to WeRelate, and with a high probability see good pages, hopefully good things will happen. --Jrich 10:56, 1 May 2009 (EDT)

Bill, I accept that the majority of the active users on WeRelate wish to allow the unsourced pages to remain on WeRelate and to continue to allow the addition of such pages. However, after working with duplicate pages for several weeks I request one limitation on gedcom uploads. The pages uploaded should include an actual or estimated birth date. It is extremely difficult to evaluate possible duplicate pages that have no dates whatsoever, but some users have uploaded gedcoms that contain no dates.

I have also suggested a new project to Dallan and he seemed to like the idea. This is a future project; after volunteers complete the duplicate pages project, source pages project, and updating the tutorials. Volunteers on the project would select a family group on WeRelate in need of some tender loving care. The project would be highlighted on the main page or the community portal. We would establish research goals and a research plan. We would utilize a research log to list tasks and the volunteer who signed up for the particular task with an estimated completion date. After the evidence is collected, we would evaluate the evidence and write a proof or conclusion. My hopes are that this will be a positive example of good research and an example of how to work on WeRelate with others in collaboration on a family project. Opinions?--Beth 11:15, 1 May 2009 (EDT)

Just a thought about requiring birth or birth estimated dates. I went from reading that to inputting data sent to me which gave me dates on a person I had listed with no dates. And I wondered why did I have this person with no dates? My source had been a recognized biography in a book which only mentioned that so-and-so had a child named Joseph Gathings. And I had no other info on Joseph. But I had put in 'unknown' as the date of death. Now someone has sent me info to fill that in. When there is no d/o/b there should be a requirement that something, even 'unknown' is in that d/o/d or there is no way of knowing if the person uploaded is living or not. So perhaps the requirement could be amended to read either a date of birth or date of death. Either could be estimated but something in one of those two places. I would hesitate to disallow uploads without dates because the info may well be correct, just incomplete. But we do need to have some indication they are not living. --Janiejac 12:19, 1 May 2009 (EDT)

Hard to evaluate something that has no data. Also hard to pick it out in a search to see what it might match. Including at least one date would indeed make your task easier. However, if you engage with the specific lineage, and do your comparisons, say with Ancestry family tree's, you can usually figure things like this out. Excluding someone's GedCom because a page lacks any vita at all, seems draconian. I would guess that almost every GedCom, has at least one card that lacks vita.

Bill, I am not suggesting excluding the gedcom; only the pages that have no estimated birth date; the user can choose to either examine their research and add an estimated date or not add the page.--Beth 12:38, 1 May 2009 (EDT)

An important distinction, but likely to lead to holes right in the middle of a lineage! Q 12:52, 1 May 2009 (EDT)

On the second point, yes, a very desirable project. I believe that ultimately, the success or failure of WeRelate will be driven by the willingness of the user community's cooperative activity in building better pages. its quite easy to tear things down. Building up is much harder to do, though in the end it produces a better product. Q 12:27, 1 May 2009 (EDT)

I don't really like the idea of forcing people to guess dates. You force someone to put dates in 30 people before they upload, and they're going to be not much better than nothing, and if it's just "unknown" that's useless. But if we do such a thing, christening dates should count. --Amelia 12:37, 1 May 2009 (EDT)

I agree. Genealogy is always a process of learning more. For some ancestors we know less than others, and may leave some information blank--simply because we don't yet know what the answer is. That doesn't mean we aren't still working on it. But if you exclude cards without such information, you also exclude the other information that comes along with it.---like the identity of the person's parents, or of someone's siblings. Q 12:52, 1 May 2009 (EDT)

Well, you don't have to insert "fake" or made-up dates. I agree, that's worse than useless. But you must know something about the person, right? Or you wouldn't have their name in your database in the first place. If it's a woman, do you have info on one of her children? Or info on when a parent married or died? You can probably come up with a "flourished" date of some kind. "This person appears to have been an active, child-producing adult in 1860," or whatever. It's a start. Anyway, it would be nice to know whether the guy whose page I'm looking at might have been a veteran of the Revolution -- or of World War I. --Mike (mksmith) 12:57, 1 May 2009 (EDT)

One could always use the WFT type estimates---, e.g. "born sometime between 1703 and 1803". I don't know how they generate those estimates,but its probably based on reasoning you suggest. Usually don't find those estimates that useful, but they would meet the purpose you describe. In theory, they could also be generated by computer, starting with known data, on associated cards in the lineage, and working backwards (or forwards as the case may be) to get an estimated DOB or life range. That's probably a fairly complex solution, but in theory this could be done for EVERY missing date---an automatic calculated plug in. Lots of work, though. Q 13:09, 1 May 2009 (EDT)

I would hope to do a little better than "sometime in the last century." :) But I have a number of people in my database, frankly, where the only thing I know is that a woman named "Sarah" married the grandson of the guy I'm actually interested in. If that came from a census, then I can make an educated guess at year of birth, but sometimes there's not even that -- like if the woman's name came from a correspondent of my grandmother. And since those folks are mostly outliers in my research, I'm not in a hurry to pursue them in any detail -- not until I get the 5,000 more important gaps in my research filled, anyway. So I could simply omit those "name only" people in my GEDCOM imports, . . . but knowing the wife's given name is something, which is better than nothing. I think.

Okay, I give up. As for as I am concerned I will no longer discuss this topic on WeRelate. I will deal with the junk as best I can and continiue to volunteer on WeRelate; but if another Wiki is created that at least requires the basic documentation of one's research I will probably leave and go to the new one. I suppose this entire subject depends on how one envisioned WeRelate when they joined. --Beth 14:18, 1 May 2009 (EDT)

By the way: As an aid to keeping track of pages in my imports that need work more urgently -- mostly good sources -- I've created a tree called "Sources," to which I attach those pages. That way, there will be a sorted work list in Family Tree Explorer. (This idea just occurred to me the other day, but I'm sure some of you are doing something similar.) But this method could also be used to keep track of pages uploaded by other people that you simply want to keep track of, or that you intend to work on or improve yourself. --Mike (mksmith) 13:40, 1 May 2009 (EDT)

Interesting idea. Since I don't usually work more than a couple of generations up or down of 1770, the tree function is not something I've looked at in detail. However, I have noticed that other things can appear in the tree, and you're idea of tracking sources, might be a helpful mini tool. Might work for images as well. Thanks! Q 13:59, 1 May 2009 (EDT)

Yeah, anything for which a page exists can be attached to a tree, including image pages -- which is another reason I don't really thing of them as "trees." I've also been attaching to the relevant tree the Source pages I create in the process of creating or re-editing a Person page, just so I'll remember they're there and so I can check to see what's been happening to them lately (for my own education). --Mike (mksmith) 09:28, 2 May 2009 (EDT)

[add comment] [edit]

How about a "death row" status? [15 July 2009]

Ok, my topic is intentionally inflamatory, but it captures the essance of my idea.

It's all well and good to have folks donate research, improving weak or problematic pages and so forth. I've tried to do that when I can, but there's a whole lot more space to work on than there is me. It's also more than a little unfair - just because I care about the content of werelate, why should I be stuck trying to research someone else's areas of interest?

jrich, why do you feel you have to work on researching someone else's areas of interest? jillaine 15:26, 14 July 2009 (EDT)

Also, I've recently been working through some merges that are really outside my area of interest. Some of the intanglements are just hopeless, and only serious research is going to help. I don't always want to do it, and if it's not forthcoming from the folks that uploaded those pages, maybe they just need to die.

When I first started merging, I focused only on those pages that I was likely to have interest in-- mostly early colonial new englanders. That made it less horrendous to me. Later, I joined others in working more methodically through the alphabet, with the sole focus of de-duping obvious duplicates, and in the process getting rid of the kaka like that which was initially on that Mary Jones page before someone cleaned it up. (Ref #s, repeated mentions of GEDCOMs and FTW files, etc.) If you focus only on de-duping and cleaning out the kaka (which can be done on the 2nd merge comparison screen, mostly), that will be sufficient for now. Then those who are more interested in that time period, location or surname can come through later and work on what they're interested in. I think you may be taking too much on your shoulders, jrich. jillaine 15:26, 14 July 2009 (EDT)

Attempting to engage users is great, but that's a tough one too. So you send the mail, then wait...how long? Do you try for every page associated with a particular user, or just once for that user and assume that they've lost their chance? Even then, you have to try to remember what you were doing and where, and then get back to it.

I don't worry about this. I'll encourage people to do their own merging, but I keep working my way through the (alpha) list. I've found that it's simpler to work from the list than up and down a given family line, which often makes me, like you, go nutso with frustration. jillaine 15:26, 14 July 2009 (EDT)

So what if we have a selective status of "condemned"? Or "fatally weak"? It would be a little like marking something for speedy delete, but if done by an administrator (perhaps it can only be done by administrators) it will cause the page to automatically be deleted if it remains untouched for some reasonable time period. Perhaps it could also send mail to any live users watching the page?--Jrm03063 12:22, 1 May 2009 (EDT)

Here is perhaps the worst page I have ever seen and it just so happens it is Mary Jones [[3]] It is so filled with crap, I'm sure the submitter has never even looked at it.--Scot 14:36, 14 July 2009 (EDT)--Scot 14:40, 14 July 2009 (EDT)

But it's not crap now. It's been cleaned up and looks not all that bad. I.e., when you clear the crap out, it's okay. So I would NOT recommend this for speedy delete. We do have a lot of crap from the past, but hopefully the new upload process will result in stronger contributions. God, I hope so! jillaine 15:26, 14 July 2009 (EDT)

BTW, Jillaine, I didn't post the above. Were you asking me those questions or JRM? I am not complaining about working on other people's pages, but as to why I do sometimes, it is:

I like genealogy and have access to certain sources
to increase the number of pages having reasonable sources so people coming here will be more likely to see what is, hopefully, serious genealogy
to make valid changes causing people to be notified and hopefully learn from seeing good sources, and seeing their errors corrected, or if I make an error, hoping they might come back with a better source to show that
to get accurate data so the next person is more likely to match the right person, thereby protecting close matches from damage. (You know, those pages that name the mother, father, and one daughter named Mary with nary a date in sight, nor any of the other children's names that might give you a clue if it is the same page you are looking for or not.
I got tired of mindless merging like you, plus I was worried I was taking bad data and making it worse by my arbitrary decisions during the merge process, so feel this is more useful. (Personally, I think Dallan should automate all the merges in the duplicate list, and then let's clean up afterwards. Less work in the long run, I think. Those names have been sitting on people's duplicate lists for how long now?)

[add comment] [edit]

Mary Jones [16 July 2009]

Regarding the Mary Jones page, are we looking at the same page? It looks like crap to me, with one source being a will in 1718 and another source being the 1910 census, with a birth in Virgina but the narrative talking about Indiana, with birth in 1690 of Mary Jones but the narrative talking about the death of Melvinia Jones in 1923, birth in Virginia, christening in England, burial before the death, lifespan well past a century long. --Jrich 16:25, 14 July 2009 (EDT)

Take a look at the parent page Family:Richard Jones and Mary Farr (1); this page seems impossible to fix. I don't believe that this user entered the data this way; must have been a merge gone terribly wrong or major problem with the gedcom. I have not looked at any of the other pages. If his other pages are all like this, I suggest notifying the user about the problems with his pages and that they will be deleted and suggest that he upload his pages again using the new gedcom uploader. --Beth 17:13, 14 July 2009 (EDT)

It's not a result of anything that happened here. If you bring up the History of the page, you can bring up the oldest entry from the GEDCOM upload and see that that is the way he submitted it. His upload has dozens of internal duplicates, all sorts of medieval people with no source documentation. Look at his talk page his first try was to up load a zipped file, then he uploaded his GEDCOM twice. Subsequently he has never once performed an edit. Bring up Mary's page and check for duplicates and you'll see he submitted 26 copies of it or so I have seen enough to say delete it all.--Scot 18:31, 14 July 2009 (EDT)

Okay Scot, but if his pages have been merged with any others, I suppose you need to unmerge them first. Anyway you need to make the request for deletion on the duplicate review page.--Beth 19:08, 14 July 2009 (EDT)

I've done that here.

jillaine 07:23, 15 July 2009 (EDT)

But, is there someway to determine if any of the pages have been merged--Scot 10:58, 15 July 2009 (EDT)

I believe that any page that has been merged with a page that someone else (other than the submitted of the gedcom being deleted) will NOT be deleted. I.e., if one of those delightfully awful pages was merged with one of your pages, Scot, that page would not get deleted when the GEDCOM went away. jillaine 17:23, 15 July 2009 (EDT)

I note that the submitter of this Mary Jones page has begun to make some modifications to it, presumably in response to the flaging of the page for speedy delete. After re-looking at the information content there, while its still a mess, and despite my "objectiying" of the speedy delete rationale, there's probablly some value to retaining this and related materials. While some of the problems undoubtedly persist beyond this page, there appears to be a core of information that's consistent with a family line of an early Virginia settler, Joshua Wynne. Data for this Joshua presented on Ancestry is fairly consistent. Though it disagrees in some of the details with what's here, I think this can be cleaned up without too much problem. Whether there are similar problems downstream in this line I don't know, but since the original author is in fact working on this, I think it premature to jettison the work. In anycase, I'll work to sort some of this out, at least in the early Virginia period. Q 14:11, 16 July 2009 (EDT)

I took a look at his contributions; looks like he's doing a LOT of merging. But I'm not sure that merging alone is going to fix this problem. He's got a bigger problem than dupes. -- jillaine 15:33, 16 July 2009 (EDT)

Retrieved from "https://www.werelate.org/wiki/WeRelate_talk:Junk_Genealogy"

Don't want ads?

Menu

Personal tools

WeRelate talk:Junk Genealogy

Views

Watchers

Topics

A Bit of a Rant Regarding Junk [7 April 2009]

Quality Scale [8 April 2009]

Articles vs. People/Family Pages [8 April 2009]

Proposed Rating / Scoring System [8 April 2009]

Alternative Approach [13 April 2009]

An additional suggestion [11 April 2009]

NGS Standards for Sound Genealogical Research [14 April 2009]

Transfer to Mission Statement [14 April 2009]

Template for insufficiently sources pages [14 July 2009]

User Community Focus [26 April 2009]

Overall [2 May 2009]

How about a "death row" status? [15 July 2009]

Mary Jones [16 July 2009]