Saturday, 3 March 2007

More on digitized and OCRd newspapers

I was pleased to see a comment posted by Bob Huggins, a pioneer in OCRd newspapers online, who found my blog entry on Paper of Record a few days ago.

Every so often I find myself using quotes like "half a loaf is better than none" and "seeing the glass as half full" when people complain about OCR inaccuracy, or errors in census transcriptions. You don't hear folks singing the praises after success nearly as loudly as they grumble after failing to find something.

Two of my best genealogy finds came thanks to Paper of Record where I located a wedding in PEI where a professional had hit a brick wall, and found a long report of a robbery committed by a person I was researching in Perth, Ontario.

OCR problems are a reality for old newspaper. Experienced searchers know of alternative approaches that may find the item sought when a straightforward search, which should be tried first, fails.

Assuming you've selected the most appropriate paper, it's always good practice to limit the period searched to as narrow a range as reasonable.

Think of alternate ways in which the entry might be unique. Search for the bride's name if you can't find the groom, or the name of the place where one of the fathers live, or any other aspect, such as the name of his regiment.

It can help to understand the format typically used in that paper for the period. For example, does a newspaper report of a marriage give the last names of the groom and bride as a header in a larger typeface? If so a search for adjacent names (Smith Jones) stands a better chance of success that one for John Smith or Cathy Jones.

Wildcard and proximity searches can be helpful if available. With Paper of Record you can search for a word, phrase or Boolean combination. The lack of segmentation of the page into separate stories means that you may get hits because words occur in different unrelated articles on the page.

No comments: