Digital libraries as a test bed for evaluating the effectiveness of information searching in OCRprocessed texts

Authors

  • John Catlow Adam Mickiewicz University. Department of Information Systems
  • Mirosław Górny Adam Mickiewicz University. Department of Information Systems
  • Rafał Lewandowski Adam Mickiewicz University. Department of Information Systems

Keywords:

OCR, Optical Character Recognition, Digital Library, Searching, Effeciveness of Information Searching, Historical Publications

Abstract

The aim of this paper is to point out certain weaknesses of OCR with regard to the problem of information searching, and to describe the mechanisms involved. A methodology is presented for evaluating the effectiveness of information searching in OCR-processed texts. It is also shown to what extent relying exclusively on OCR techniques limits the possibilities of obtaining information from texts. It is indicated how and at what costs these limitations can be overcome by the use of keywords entered by a cataloguer. The research was conducted based on the resources and users of the Digital Library of Wielkopolska.

Downloads

Published

2017-06-06