What do Australia, Finland, Russia, Vietnam and the United States have that Canada doesn't? The answer, according to this blog post, historic/old digitised newspaper sites that offer public text correction/transcription. Pioneered by the National Library of Australia on the Trove site crowd-sourcing OCR correction is catching on.
Meanwhile OCR capabilities for old newspaper digitization are improving slowly. A paper from UC Berkeley claims to achieve a word error rate of 25.6 compared to 49.2 for ABBYY Fine Reader a widely used commercial product.
No comments:
Post a Comment