Friday, 16 May 2008

How good is census indexing?

An email arrived from findmypast.com announcing their new transcription of the 1901 census records, starting with Gloucestershire and Somersetshire. The blurb read "The accuracy of our transcriptions, along with our high quality images, is what sets our censuses apart from our competitors'. After searching our transcriptions you may find people you haven't found on other sites."

I've also heard a respected genealogist (not an employee) at the Society of Genealogists make the claim that Findmypast's transcriptions are better than Ancestry's. How well does the claim stand up? I decided to rate them on a point basis, one point for each person correctly identified in a search for Reid in Somerset. Incidentally, Wikipedia states that Somersetshire is not the correct name for the county, but an archaic form that went out of fashion in the late 19th century.

A search on the exact name Reid yields 24 hits in Findmypast (FMP) and 22 on Ancestry (ANC). Of these 18 are identical. Whichever service you choose you stand a good chance of finding the person you seek. The glass is about 80% full. ANC and FMP each receive 18 points.

The first difference is a simple transcription problem. FMP finds Adam A Reid, ANC finds Adara A Reid. Looking at the original you can see how the Chinese indexers employed by Ancestry would read m as ra, but I'm surprised this was not caught in checking the work. One point to Findmypast.

The second difference is that FMP lists an Alex Reid living in Keynsham. However, looking at the original he, his wife and infant son are residing in Bitton, Gloucestershire. FMP has them in the wrong county. Penalize FMP three points for false identification.

ANC finds 14 year old Alvan Reid, a nephew in the home of Frances E Hancock. No amount of searching found this person on FMP, nor the Hancocks, nor several other people on the page. Unfortunately FMP has no way to search by piece, folio and page number as does ANC. Maybe there's a page missed. One point to Ancestry.

ANC finds Charles Tassela Reid and his wife Emily. FMP finds Charles Tassel Reed. Both are reasonable attempts at what investigation in other sources shows should be Charles Tassell-Reed. No points awarded.

FMP finds Martha Reid which ANC transcribes as Martha Rud. Rud is an unusual name and several of those found in the 1901 census look to be doubtful transcriptions. FMP appears correct; the entry is consistent with a marriage registered in late 1900. One point to FMP.

Finally FMP finds Reginold J Reid, ANC Reginald J Reid. On the original it certainly looks like --old, but which would you rather see it indexed under? Some argue you transcribe what you see, some that you normalize to a controlled vocabulary. In practice a worthwhile search engine should find the entry despite the difference in vowel. No points awarded.

Bottom line, 17 points to Findmypast, 19 points to Ancestry. Not a conclusive win, but also not a confirmation of Findmypast's claim. While the statement "after searching our transcriptions you may find people you haven't found on other sites" is true, it could equally apply in reverse.

One thing I did appreciate about FMP is the ability to zoom in the image more than you can with ANC. That may reflect a better quality image as FMP claims.

What the evaluation does show is the kind of variations to look for when an ancestor eludes you in the census.

No comments: