Using a Similarity Index to Group YDNA Project Results

Views

Watchers

Return to The Tapestry Homepage

Return to Methods

1 Related
2 Data Source
3 Background
4 Calculations
5 Applications
6 Footnotes

This is one of a series of articles on Genealogical Methods, prepared in association with The Tapestry. See Index for a list of related articles.

__________________________

by William M. Willis©

This page is currently under development.
{{{1}}}

[edit]

Data Source

[edit]

Background

From:Wikipedia:YDNA:

A man's patrilineal ancestry, or male-line ancestry, can be traced using the DNA on his Y chromosome (Y-DNA) through Y-STR testing. This is useful because the Y chromosome passes down almost unchanged from father to son, i.e., the non-recombining and sex-determining regions of the Y chromosome do not change. A man's test results are compared to another man's results to determine the time frame in which the two individuals shared a most recent common ancestor or MRCA. If their test results are a perfect, or nearly perfect match, they are related within genealogy's time frame. :

The key to this idea is the fact that the while the Y-chromosome does not change by very much, it does change slightly, through mutation, from generation to generation. The changes are relatively uncommon (an estimated 0.02% chance of a change on each genetic marker in each generation), but they do occur, and accumulate through time. When the YDNA for two men is compared, the larger the number of accumulated changes, the longer it has been since the two men shared a common male ancestor.

FTDNA provides a YDNA testing service for clients interested in using YDNA results to further their understanding of their family genealogy and history. FTDNA provides a "project page" which displays test results for project kits belonging to the surname project. There are several display formats available to the user, but the following (with hypothetical example) is typical:

Kit Number	Paternal Ancestor Name/data	Haplogroup	DYS1	DYS2	DYS3	DYS4	DYS5	DYS6	DYS7	DYS8	DYS9	DYS10	DYS11	DYS12
Descendants of David Smith of Derry, Ireland
H99990	David Smith 1802-1878	R1b1	13	25	14	11	11	13	12	12	11	13	14	29
H19922	Peter Smith 1824-1898	R1b1	13	25	14	11	11	13	12	12	11	13	14	29
H19922	Paul Smith 1828-1888	R1b1	13	25	14	11	11	13	12	12	11	13	14	29
Descendants of Phillip Smith of Cornwall, England
H19922	John Smith b. 1754 d. Boston Mass	I2	14	22	14	10	13	14	11	14	11	12	11	28
H12032	John Smith d1815 Ohio	I2	14	22	14	10	13	14	11	14	11	12	11	28
H7887	Paul Smith b1852 Iowa	I2	14	22	14	10	13	14	11	14	11	12	11	28
H25431	Benjamin Smith	I2	14	22	14	10	13	14	11	14	11	12	11	28
H212121	John Smith	I2	14	22	14	10	13	14	11	14	11	12	11	28

In some versions of the table the data is presented "as is"; in other cases, minimum, modal, and maximum values for a group are presented. The modal value for a group of kits is sometimes referred to as the groups "signature". In some versions of the tables, marker values are color coded to indicate whether the marker value for any given kit differs from the group mode. This helps highlight the differences between each of the kits in any given group, and the group YDNA signature. [See: YDNA. Examples of YDNA Data Tables for further explanation].

The total number of differences between two kits is then used to evaluate how closely the YDNA of each kit owner matches other kit owners. In general, the fewer the differences between any two kits, the closer the genealogical relationship. While there are a number of different approaches to grouping kits, in most YDNA projects groups are largely defined by having a minimum number of mismatches, and share a relatively recent common ancestor. Kits with fewer than "X" mismatches out of "N" markers are considered to share a relatively recent common ancestor (RCA), and are grouped together. The value of X that is used as the critieria for a RCA depends in part on the number of markers (N) tested. A single mismatch in a 12 marker kit is usually sufficient to rule out a RCA, while a mismatch of 1 marker in a 111 marker kit, is commonly accepted as indicating a relatively recent common ancestor.

^[1]

Data for a surname YDNA project is presented on FTDNA Surname Project sites in tabular form as described above. Projects range in size from small (a half dozen or so kits) to very large (up to 1500+ kits, such as the Clan Frasier Project. Marker data for each kit is entered horizontally in the FTDNA tables. The total width of the table must be sufficient to facilitate the display of results from the maximum number of markers currently tested (111)^[2] In practice, it is not possible to view such a large table, or even all of the markers for a few kits, in a single display on many computer monitors. Examining this table typically requires the user to scroll to the right to reveal values for each marker tested.

There's a lot of data in even the simplest of tables, but its still something of a challenge to visually evaluate the content of these tables, and to calculate the number of "mismatches/markers tested" for each pair of kits. When a project starts accumulateing a significant number of kits, the problem can get out of hand very quickly. In very large projects (hundreds of kits) a pairwise comparison of each and every kit becomes a real challenge. One result of this is that projects with large numbers of kits sometimes resort to simply grouping their kits by haplogroup. That approach at least reduces the magnitude of the task to a somewhat manageable size. Even so, it is not uncommon to find a "group" matched by haplotype to contain no kits with relatively recent ancestors. This is commonly the case with especially large project; in those projects there may be hundreds of kits belonging to the same haplotype. In such instances it can be challenging for the group administer to identify legitimate subgroups within the haplogroup.

[edit]

Calculations

[edit]

Applications

[edit]

Footnotes

↑ The number of markers (N) tested for a kit varies from user to user. Typically, 12, 37, 67, or 111 markers are tested. Ideally, the more markers tested the better the results, but since the cost of the test depends on the number of markers tested, the choice of N is often limited by the kit owners economic considerations. In anycase, an exact match between two kits testing at say 67 markers (e.g., a 67/67 result) is usually taken to mean that the two kit owners share a relatively recent common ancestor. A less exact match suggests that their common ancestor lies somewhat further back in time. An exact match at 12 markers may or may not indicate a relatively recent common ancestor. FTDNA Guidance on interpretting "Genetic Distance" can be found at FAQ 919
↑ Since some of the markers are multicopy, the table is currently designed to accomodate a larger number of markers. Currently, some kits that are listed as having been tested for 111 markers, actually show 113 or more markers, the two "extra markers" being the result of a relatively recent addition to the number of copies potentially present in one of the multicopy markers

Retrieved from "https://www.werelate.org/wiki/Using_a_Similarity_Index_to_Group_YDNA_Project_Results"

Categories: Genealogical Methods | Under Construction