What are a False Positive and False Negative Results

Watchers
Share
Image:Long Boone Cumberland--thin.jpg
Southwest Virginia Project
Return to Southwest Virginia Project Main Page

False Positive Result: When a source, such as Source:Summers, 1929 is searched using either the works index, or the search function in an online version of the work, "false positive results can be obtained. That is, the search may generate a "hit" for the search term on a particular page, even though the search term does not actually appear on that page. This can happen either through an error on the part of the original index maker, or because of peculiarities of the search engine used when examining an electronic version of the work. In the latter case this may happen because the search engine focuses on adjacent words rather than the actual name. here's an example.

An electronic text may contain the following (totally fictious) passage:
"John Smith, Walker Jones, Paul White were ordered to view the way from Smith's place to the gap in the mountains"

Some search engines will recognize that this is a "comma delimited list" and retrieve hits for "John Smith", Walker Jones", and "Paul White". Other search engines will not recognize this, and, in addition to hits for "John Smith", Robert Jones", and "Paul White", will also return hits for "Smith Walker", and "Jones Paul". Some really aggressive search engines do not even require that the names be adjacent, but occur only within "n" words of each other. Those search engines might also return "John Walker", "John Jones" and even "John White", etc.

"False Positives" may also arise if the search engine uses "metaphone" or "soundex" matching, which only requires that the found item have the same "sound" as the search target. If one were searching for "John Smith" with such a search engine, "John Smythe" would probably return a hit. That might be a desirable outcome since "Smith" and "Smythe" might be easily interchanged in the original records, the same person appearing under both spelling variants. Sometimes "the same sound" has quite a bit of leeway, and radically and obviously different names may be returned. For example, "John Smithers" or even "John Smothers" might be returned in a search for "John Smith".


False Negatives: "False Negatives" returns may occur when a search engine fails to identify a target name, even though it appears in the document. For example, given the following list which appear in a table of persons in a militia roster:

Smith, John
Jones, Walker
White, Paul

If the target search term were "Walker Jones", some search engines would not return a hit from the above list because the "word" order does not match the target.


Example. As a "real life example" a search of Source:Summers, 1929 was recent conducted using the target "Robert Edmiston". The search engine used was that provided by Ancestry.com, searching its online version of this work. Summer's Index to this work lists three entries under this spelling of the surname.

The Ancestry search results picked up each of the three items listed in Summers Index. It also picked up 25 other hits! Of the 25 other hits, exactly one was a legitimate hit for the target. The other 24 "hits" were "False Positives"---that is, the search engine found things that it thought were the target, but were not in fact what was being sought. Most of the hits arose because of entries such as "William Edmondson, Robert Craig...." In one memorable example, a hit appears to have been returned for a passage that read "WIlliam Edmiston, James Armstrong, Robert Craig". It is worth noting that the single valid entry picked up by Ancestry, but missing in Summers own index, would be an example of a "False Negative" result when using Summers index. That is, you'd miss an entry for the target that is actually present in the work, even though Summers says its not.