24 February 2008

Newspapers Digitisation Project: British Newspapers 1800-1900

While on London earlier in the month I had the opportunity to try out the result of this digitization project at the British Library. First made available in October last year, it comprises 1,000,000 pages of content drawn from a geographically diverse range of British newspapers.

Searches were completed very quickly and scans of the pages found downloaded quite quickly. The page is sub-divided into articles, or segments. Searching with a combination of terms finds only items within the article, not the whole page as with other digitized newspapers I use more frequently.

I started with simple searches on surnames in my family tree across the whole range of content and found an overwhelming number of hits. To reduce them to a manageable number you need to be quite selective in the search by specifying a single publication, location and/or date range.

The quality of the OCR was exceptional. I tried checking a few pages found with a search on cholera, the words identified are highlighted, then checking for the same word not highlighted in the article. I didn't have time for an exhaustive test but found very few cases where the word was not highlighted.

Plans for the project in 2008 call for digitising 3,000,000 pages of British newspapers and to offer worldwide access to that collection via a sophisticated searching and browsing interface on the web. I hope some way will be found to provide affordable access for those of us outside the UK.

1 comment:

Anonymous said...

Actually, the British Library 19th C Newspaper project has about 2 million pages of historic newspapers - from 48 papers specially selected to cover London, regional and contry papers as well as papers with a specific interest (Chartism, Home Rule, Reform etc.)

19th C British Library is available through over 225 university and college libraries worldwide. Later in 2008 a 'pay per view' community access portal is planned.

Brandon Nordin
Gale Digital Collections