21 June 2015

Researching the wrong family tree? - refining documentary relationships with an autosomal DNA test

Following on the post Forlorn hope DNA matches I received the following email which I've made anonymous.
"I have done work on my T***d line to the late 1790’s where my 5 x great grandfather was baptized as the illegitimate son of Mary T***d. Her brother Thomas also has descendants."
Can a DNA test confirm/refine/refute the relationship?

I've been struggling with this post. Hopefully someone with a clearer head can help refine the analysis. Here goes.

Assuming two living people, descendants of Mary and Thomas, are of the same generation they would be half-sixth cousins.

Let's consider two mutually exclusive hypotheses, that the two living people are related through a parent of Mary and Thomas T***d as half sixth cousins (H), or that they are not (-H).

The documentary evidence isn't conclusive. It never is.  There could have been bad information, misunderstandings in recording or interpreting the information, or non-paternity events. You can undoubtedly think of more.

Say fourteen generations separate the two living people with all but two being inheritance through a paternal line. If the chance of there being bad information is 2% per generation, a figure typical for NPEs, then using the conservative assumption that uncertainty in the documentation is dominated by the 12 paternal link generations, the chance of at least one piece of false information existing in between is 1- ((1-0.02)^12), or about 23%.

Based on the documentary evidence we have the probability that the two living people are related through a parent of Mary and Thomas T***d  as P(H) = 77%,  and the probability they are not as P(-H) = 23%

Both living suspected T***d  descendants took an AncestryDNA autosomal test and matched at the 5th-8th cousins level. We would like to know the probability of the hypothesis H being true given the DNA evidence, P(H|E), which may be calculated from Bayes Theorem.

P(H|E) = P(E|H)*P(H) / (P(E|H)*P(H) + P(E|-H)*P(-H))

As we already have estimates for P(H) and P(-H) we need estimates for P(E|H), the probability of finding an autosomal DNA match given that the two people tested are related at about the 6th cousin level.

According to Family Tree DNA's figures, on the assumption AncestryDNA's aren't any different, there is typically a less than 2% chance for an autosomal DNA test to match 6th and more distant cousins. So the estimate is P(E|H) = 2%.

We also need an estimate for P(E|-H), the probability of finding an autosomal DNA match given that the two people tested are not related at about the 6th cousin or closer level.

If P(E|H) = P(E|-H) then P(H|E) = P(H) and the DNA evidence provides no additional information. This would be the case for highly endogamous populations such as Ashkenazi Jews where most people's DNA test shows a relationship at about the sixth cousin or better level.

Only if P(E|H) is greater than P(E|-H) does the DNA evidence improve the confidence in the relationship.

The extreme is the unlikely event that P(E|-H) = 0 when the confidence in the relationship becomes 100%. This would imply no other way the two people tested could share DNA at the 6th cousin or closer level.

The actual value of P(E|-H) should be somewhere in between.

It's not the most credible source but according to a UK Sun article:
"Analysis by AncestryDNA - part of the online family history resource - of birth rates and population figures for the past two centuries suggests that the typical Brit has 193,000 living cousins. These relatives are sixth cousins or closer and share a traceable ancestor born in the last 200 years."
That's a "one in 300 chance that a total stranger is a relative."

[Note: Thanks to Ancestry's Bryony Partridge who subsequently provided a link to their press release, http://goo.gl/Ewhn1n. Also acknowledgements to Jamie at Ancestry email support who independently provided the type of non-answer we all hope never to receive.]

A one in 300 chance can be interpreted as P(E|-H) = 0.3333%, which makes no allowance for the limited number of people who've been tested. It should arguably be less as not all real relatives will show evidence of it in their DNA, and arguably more owing to endogamy. Accepting for the time being the one in 300 figure leads, from Bayes equation above, to P(H|E) = 95%.

So the DNA test ups the confidence in the relationship from 77% to 95%. That's very tentative, estimates may be way off.

In his book Proving History Richard Carrier suggests 95% as a benchmark for very probable whereas 77% is below the 80% benchmark for probable.

Is 95% good enough?  That's a judgement call. How many people would buy a ticket on a plane if told there was a 5% chance of it crashing and burning? Consequences colour our perception of probabilities. How important is it to you to perhaps, with 5% probability, proceed to research further the wrong family tree.

No comments: