Monday, 8 June 2015

BAnQ to image more newspapers online.

Under the heading "BAnQ speeds up newspaper digitization project" I learned that the The Bibliothèque et Archives nationales du Québec (BAnQ) will be making available all of the 5,873 Quebec newspapers published since the 17th century that are not already available. Read about it at http://genealogyalacarte.ca/?p=9569.
This is a significant addition to the online availability of Canadian newspapers. What it does not mean is that the paper is searchable. You need to select the paper, year, month, and date to bring an image of to the screen.
I'd hoped it might mean the papers would be OCRd to make them searchable online. Alas. At least they're available to all and some day maybe BAnQ will be able to arrange the next step.

1 comment:

Rick Roberts said...

The developers possibly did not attempt an OCR search feature because their newspaper database suffers from the same problem as all of those sites that rely on low quality scans/microfilm as the source for their digital images of newspapers. Those microfilm images are usually the only available source to scan. Low quality images are impossible to ocr properly/completely.

I downloaded the July 23 1818 issue of the Quebec Mercury (page by page), then combined those pages into a single pdf document. Adobe Acrobat professional was the software I used to OCR the issue. Less than 10% of the issue OCR accurately... mostly the masthead and the odd word OCR correctly.... the rest of the OCR index was a scramble of of letters and symbols. If they provided a search tool on the site that did not include an obvious caveat that searches are far from complete, most researchers would think that the information that they seek is not in the collection as a result of a zero search result, when it was quite possible that it is.

We often encounter this type of misunderstanding in our business. More than one person told us (Global Genealogy) at OGS Conference in Barrie that they had no need to check the Perth Courier newspaper BMD transcriptions that we published last year because the newspaper is online, they've checked it, and their family information was not published in the Perth Courier. In fact the Perth Courier newspapers that are online suffer from the exact same problem as the Quebec Mercury does in today's experiment.

All this to suggest that if you hope to find information in online scanned newspapers, whether they are OCR searchable or not, seek out any/all published genealogical transcriptions/indexes of that newspaper before you accept that the info you seek is not in the original newspaper. Those publications usually identify the date of the issue and page where the information came from. Use that information to find the original material in the online scanned newspaper. Many genealogical societies have published newspaper genealogical extractions as have some private genealogical publishers.

OCR results from modern newspapers printed in the last 20-30 years tend to be much more accurate than historic newspapers due to use of modern fonts, and digital preparation.