Digital libraries as a test bed for evaluating the effectiveness of information searching in OCRprocessed texts
Keywords:
OCR, Optical Character Recognition, Digital Library, Searching, Effeciveness of Information Searching, Historical PublicationsAbstract
The aim of this paper is to point out certain weaknesses of OCR with regard to the problem of information searching, and to describe the mechanisms involved. A methodology is presented for evaluating the effectiveness of information searching in OCR-processed texts. It is also shown to what extent relying exclusively on OCR techniques limits the possibilities of obtaining information from texts. It is indicated how and at what costs these limitations can be overcome by the use of keywords entered by a cataloguer. The research was conducted based on the resources and users of the Digital Library of Wielkopolska.