Source talk:Ancestral File

Ancestral File isn't even a secondary source, since a great deal of the information included -- possibly a majority -- is taken from other secondary sources. AF is, at best, a very poor tertiary source. Given the very serious and very extensive shortcomings of Ancestral File, I really do think we (at WikiPedia) should make a proscriptive value judgment here. We should consider including a statement on the page that AF is to be avoided as a source, for all the reasons noted in the text. For myself, whenever I add a source to a page that includes an AF number (almost always as part of the original GEDCOM upload), I delete it as being worse than useless. Any thoughts here? --MikeTalk 08:52, 25 December 2011 (EST)

While I agree that AF should be avoided as a source, and I only track the AFN in my own records in rare situations, I tend to leave it on WeRelate records where I find it, even (and sometimes especially) where the AF data is quite corrupted. I do this because I suspect (although I may be overly optimistic about this) that associating the AFN with a WeRelate record whose quality has been improved will reduce the likelihood that the bad data from AF will be posted to WeRelate as a new record. I'm well aware that 9 times out of 10, people will just post the bad data as a new record, but maybe sometimes, when someone does a search and finds that a WeRelate record has the same AFN as their record, they will look at it, realize that it is better data than their own, and decide not to post their bad data. Without the AFN, they might be uncertain about whether or not it is the same person - since improving the quality of data can make a record significantly different from what is in AF. Does this make sense?
Additionally, if Dallan has not already done this (I haven't checked lately), he can write an automated program to identify duplicates based on the AFN, and we can work on cleaning them up. Of course, he could write such a program based on old versions, and we can remove the AFN from the current version of a page, but the reason I give above would not be served.
Someday in the future, AFN might be so out-of-date that it should be removed from WeRelate pages, but in the meantime, it might serve a purpose. Thoughts? --DataAnalyst 12:59, 25 December 2011 (EST)
Agree with both of you. I think it's mostly awful, I think we should say so, and I leave it for the tracking/merging purposes mentioned.--Amelia 23:16, 25 December 2011 (EST)
Checking for duplicate AFN's is a good idea, and would be pretty quick to implement. Let me look into this.--Dallan 21:35, 27 December 2011 (EST)

In light of how poor a source AF is, is there a plan to delete records and facts whose only source is AF? I am in the process of cleaning up 16th and 17th century records, and while I was originally cautious about removing data and people just because they were additional to my own information, I am becoming less patient with data whose only source appears to be AF, and less apologetic about simply removing it. So far, knowing that some AF data is based on journal articles and published books, I have been making a concerted effort to see if there is an article or online book that can back up unsourced info before I remove it, but this is very time-consuming. It would be nice to have carte blanche to simply remove dates, marriages, etc. for which there is no source cited, for records prior to some cutoff year (maybe 1750). I can understand why records for more recent generations might not be sourced (to protect privacy, or someone's personal research done before they knew enough to document sources), but most of the older records without sources are from AF or a similar unreliable source (OneWorldTree, etc.).

One of the isssue with AF is that it appears that qualifiers ("bef", "aft") were dropped from many dates, making any date that does not have an alternate source useless and misleading.

If we agree that AF is such a bad source and we are serious about not wanting to simply replicate it in WeRelate, I would suggest we do more than just comment on this source page, since many people add records without adding sources, so would never look at this page. The following might be appropriate:

Add a box on the home page to emphasize that WeRelate was designed as a wiki collaborative tool to avoid the problems of previous collaborative efforts, in which (I assume, anyway) automation was used to attempt to consolidate duplicate records and sources were not made generally available. The approach used by these previous attempts has been found to be lacking / have issues (some tactful way of expressing how off-track these efforts ended up being). Therefore, members are asked to avoid adding any data whose only source is one of these previous collaborative efforts (AF, OneWorldTree, there are probably others).
Does there need to be a formal policy on this? Is there one already (other than encouragement to add sources)?
Send a link to the home page to every members' talk page to alert them to this notice.
Run a query to identify all individuals and families (before 1750) whose only cited source is AF or OneWorldTree, or for which there are no sources, and post this list somewhere, with a notification that all these individuals and families will be removed from WeRelate by a certain date unless "rescued" by members before then. Members can rescue data by adding an alternate source/citation (even if it is just - "my records and I will try to find the source later"). Actually removing these records might be a bit tricky, as some records might be links between other properly sourced records. It would be good if volunteers would help with the process of confirming records to delete and the impact on related records, rather than simply automating the delete. Knowing the volume of such records would be helpful - it might be too high to handle in any way other than automated.
Run a query to identify potential duplicates based on AFNs and post it with other potential duplicates for volunteers to merge or declare to be different records. If possible, use history pages as well as the current page - that way, we can eventually remove AFNs from the current page and still use the query.
Create a survey (with the least amount of effort, maybe use SurveyMonkey) to ask members
a) if they track AFNs in their personal records, and if so
b) if they ever use existing AFNs to determine whether or not an existing record represents the same person they are attempting to add.
If this survey shows that hardly anyone uses AFNs to avoid creating duplicates, then my main argument for keeping them will be shown to be invalid, and it will be time to remove AFNs from current pages of individuals.
Give sufficient time for all the above. People contribute to WeRelate in their spare time, and might do so only every few months. Those of us who hold down full-time jobs in addition to our hobbies sometimes need lots of time to become aware of changes and mull them over.

Dallan - if you get to the point where you want help with queries, and can publish the data formats you are using, let me know. I can write SQL and have some experience with nosing around in data to look for patterns. I start a new job on Monday, so might not have the energy in the short term, but I'd be willing to help at some point. --DataAnalyst 15:09, 7 January 2012 (EST)

User:DataAnalyst if you're interested, I could provide an XML file containing all of the content at WeRelate (it's all open-content), along with an example java program showing how to parse the XML. Then you could parse the data looking for duplicate AFNs and create a wiki page with all of the possible duplicates for people to compare, containing lines like: http://www.werelate.org/wiki/Special:Compare?ns=Person&compare=John_Doe_(1)|John_Doe_(2) (I'd also be interested if someone wanted to parse the data to generate a list of warnings for every page, like we do during gedcom upload.) Let me know.--Dallan 11:22, 9 February 2012 (EST)
Sure, I'd be interested in giving it a shot. Thanks --DataAnalyst 20:18, 9 February 2012 (EST)
Thank you! I've created a project on github. Please let me know if you have any questions or need any help.--Dallan 12:10, 14 February 2012 (EST)
I'm just going to generally comment that the idea of deleting "bad" data has been raised numerous times both generally and among admins, and the answer is nearly always no. There are significant collateral consequences, and a feeling that at least some of the data is good enough to provide clues, which is better than wholesale deletion. That's not to say I don't share your frustration with the data and agree with deleting it where it is clearly wrong. I submit, however, that deleting it just because it is sourced by AF or OWT goes to far. Under that criteria, we'd have to delete probably 90% of all pages being created, even by hand, because none of them cite any sources - despite it being clear six ways to Sunday on this site that sources are valued and considered essential to a good page.--Amelia 19:46, 7 January 2012 (EST)
I'm not sure if its needed, but I'm certainly open to more emphasis on sourcing on the main page if someone wants to take a stab at it.--Dallan 11:22, 9 February 2012 (EST)

Do we keep AFNs or remove them? [9 February 2012]

A quick search shows roughly 44,000 people with ancestral file numbers. If we vote to keep them (for at least another year), I'm willing to do a potential-duplicate search based upon same AFN's at some point if there's enough interest in that. If we vote to remove them, I'll remove them from the gedcom upload process as well.--Dallan 21:37, 5 January 2012 (EST)

Add your vote below. +1 to keep for at least another year or -1 to start deleting right away, followed by your signature.


+1 --DataAnalyst 19:12, 5 January 2012 (EST)

+1--GayelKnott 21:04, 9 January 2012 (EST)

It looks like we'll keep AFN's.--Dallan 11:22, 9 February 2012 (EST)

Why I wasn't harsher in my edits [10 January 2012]

I just expanded the Usage Tips. After some thought, I indicated only that AF should not be affirmatively added to any page if it can possibly be avoided. I did not say never, and I did not say it should always be removed, mainly because I think the result would be that the fact is unsourced rather than sourced with AF, and I don't think that's helpful. I leave AF in my own files (imported almost 20 years ago, forgive me) because 1) it's not obviously wrong, and in some cases probably right; 2) it tells me that I got it from AF and I or a family member didn't find it somewhere; and 3) it answers the question when that date pops up 17 times in other online trees as to where it probably came from. It ultimately made me uncomfortable in the context of a user-collaborative setting to say that a major effort at user collaboration is per se not just unreliable, but worse than nothing. Thus, unless and until the data can be affirmatively identified as wrong or sourced in a better way, I'd just as soon people leave the reference to AF as the source.--Amelia 00:03, 8 January 2012 (EST)

As a researcher, I'd much rather see Ancestral File as a source citation than not have any source citation, because it gives me some basis for evaluating the information, especially if I can't find an original source; and because for the older Ancestral File submissions, the originals were filmed and do allow you to see what the original submitter used as a source. And the problems with derivative sources aren't limited to Ancestral Files.--GayelKnott 21:03, 9 January 2012 (EST)