YDNA Similarity Analysis

Watchers
Share
Return to The Tapestry Homepage
Enlarge
Return to The Tapestry Homepage
Return to Methods
Enlarge
Return to Methods

Contents

This is one of a series of articles on Genealogical Methods, prepared in association with The Tapestry. See Index for a list of related articles.
__________________________



Related

Similarity Analysis

In YDNA Similarity Analysis multiple haplotypes are compared with each other, on the basis of the number of markers each kit has in common with the other kits. For example, if two kits test at 37 markers, and differ from each other on 2 of those markers, their similarity would be expressed as:

∂' =35/37=94.5% similarity.

The above is equivalent to saying "35 marker match out of 37 markers", or simply a "35 out of 37 match"

This can also be expressed as dissimilarity, as:

=2/37=5.5% dissimilarity.

This is commonly referred to as a "2 off of 37 marker match", or simply a 2-off match.

Results of similarity analysis are almost always presented in terms of dissimilarity. As a simplification in presentation of results the results of similarity analysis are usually presented as (dissimilarity) but are commonly referred to as simply "similarity".


One Kit Comparison

Results of similarity analysis are commonly displayed either in terms of

One kit compared to all other kits in a set of kits
Each kit compared with every other kit.

In the former case, the primary focus of the analysis is to see how one particular kit compares to every other kit being considered. Here's an example of this.

Index Lineage* Haplogroup N Steps-off
45027 David Cowan - Sevier Co. TN R1b1b2 67 0 0%
23147 Andrew Matthew Cowan, b. c. 1759, VA. ? R1b1b2 67 0 0%
58823 R1b1b2 67 0 0%
7384 William Andrew Cowan, b-1853 TN m. Martha James R1b1b2 67 0 0%
7376 Samuel & Sarah Keith Cowan R1b1b2 37 0 0%
54325 Mathew Cowan b. 1777 & Catherine Trousdale R1b1b2 25 0 0%
72499 John Cowan 1811-1845 Knockaldie, m Margaret McNeil R1b1b2 12 0 0%
10883 J.L. Cowan b. 1822 TN & Almira S. Mahal Vance (KY) R1b1b2 37 1 3%
11178 Samuel Franklin Cowan, b. Missouri?, d Texas 1883 R1b1b2 37 1 3%
34656 Joseph Cowan b.c. 1820 married in Dandridge, TN R1b1b2 37 1 3%
7381 Jonathan Cowan, b 1803, Jefferson County, TN R1b1b2 37 1 3%
83345 John Cowan m Susannah Glover 96th District SC R1b1b2 67 2 3%
118488 John Coen (1687-1749) Baltimore MD R1b1b2 25 1 4%
11680 William R1b1b2a1b5b 25 1 4%
7386 James R1b1b2a1b5b 25 1 4%
35133 Robert & Susannah Woods Cowan, son John R1b1 67 3 4%
49023 James Alvis Cowan R1b1b2 67 3 4%
11081 Andrew Cowan b 1812 TN and Matilda Driskel b KY R1b1b2 37 2 5%
89028 James Cowan-KY, Cumberland Co R1b1b2 67 3 4%
58024 Climal R1b1b2a1b5b 67 3 4%
23925 William & Jane Walker Cowan - Blount Co. TN R1b1b2 25 2 8%
28890 William & Jane Walker Cowan - Blount Co. TN R1b1b2 25 2 8%
66884 Andrew Cowan m Martha Evans, Sevier Co, TN R1b1b2 25 2 8%
13547 James & Hannah Woods Cowan, Sr., son Hiram R1b1b2 25 2 8%
19612/54325 Mathew Cowan b. 1777 & Catherine Trousdale R1b1b2 37 2 5%
7377 Robert & Susannah Woods Cowan, son John R1b1b2 37 3 8%
87101 R1b1b2 12 1 8%
124273 James R1b1b2a1b5b 12 1 8%
26038 John Alexander Cowan(1775-1821) m RosannahGillispie R1b1b2 37 3 8%
14523 County Down Ireland 1808 R1b1b2 37 4 11%
132630 R1b1b2 37 4 11%

The original data set used to generate the above included over 130 separate kits, only 32 of which are shown. The principle advantage of the above approach is that the calculations are relatively fast, and the displays of the data easier to create. This type of analysis is best used when focused on a particular kit, and trying to identify the what other kits share a common ancestor with that kit.

Multi kit comparison

The alternative approach is to compare each and every kit in a set with each and every other kit. This approach is better suited for speaking to questions of project grouping where the desired goal is to isolate all kits that share a relatively recent common ancestor. The disadvantage of this approach is that the calculations take more and more time as the number of kits involved increases. In addition, when the number of kits involved gets very large (say 1000 kits) the displays become very combumbersome. Typically a "matrix approach" is used for these displays. Here's an example based on a sample of kits from the Stockton YDNA project.

Currently, the kits shown have been placed in a single group, the Stockton Cheshire G, based on their YDNA similarity. (Note: Empty cells in the matrix display represent kit comparisons that exceed the acceptance criteria for the "run".)