Saturday, 7 May 2011

A little DNA genealogy analysis

Warning: this posting is on the technical side. If you're math/science averse you may want to skip this one.

In preparation for a presentation at the Roots 2011 conference, and a dress rehearsal at the Ottawa DNA interest group on 28 May, I've been looking at some autosomal DNA results from 23andMe.

I've found a case of a mother-son pair which has almost 150 identifiable partial matches with other people (DNA cousins) in the company (23and Me) database. Each point in the chart below represents a DNA cousin, more often a male, with the position on the horizontal axis the percent shared DNA with the mother, and on the vertical axis with the son.

I expected that at maximum the son would share all the same DNA as with the cousin, and as a minimum none. Whether the son inherits all, most, a little, or none of the DNA the mother has common with the "cousin" is supposedly random. It depends on which sections of chromosomes are inherited by the son.

Surprisingly there were 10 cases where the son had more DNA in common with the "cousin" than did the mother. Did he inherit the excess serendipitously from the father, or is there another explanation?

Notice on the plot that there are no points below 0.1% son shared DNA. The company does not make available results where the percent shared is so small. Also there are only three results where the mother shared DNA is less than about 0.28%, likely as the son's shared DNA is less than 0.1% in these cases.
A best fit trend line through the data (in red) and a best fit line forced to go through the origin (in black) are shown. The latter, which goes some way to compensating for the lack of data below 0.1%, has a slope a bit less than 70%. On the face of it on average the son inherits a bit less than 70% of the mother's shared DNA, not the average 50% according to theory.

