IBM Systems Journal - 2002 Copyright

IBM Skip to main content
  Home     Products & services     Support & downloads     My account  

  Select a country  
Journals Home  
  Systems Journal  
    Current Issue  
    Recent Issues  
    Papers in Progress  
    Author's Guide  
Journal of Research
and Development
  Contact Us  
  Related links:  
     IBM Research  

IBM Journal of Research and Development  
Volume 41, Number 3, Page 494 (2002)
Artificial Intelligence
  Full article: arrowHTML arrowPDF arrowASCII   arrowCopyright info


Machine learning in a multimedia document retrieval framework

by M. P. Perrone, G. F. Russell, A. Ziq
The Pen Technologies group at IBM Research has recently been investigating methods for retrieving handwritten documents based on user queries. This paper investigates the use of typed and handwritten queries to retrieve relevant handwritten documents. The IBM handwriting recognition engine was used to generate N-best lists for the words in each of 108 short documents. These N-best lists are concise statistical representations of the handwritten words. These statistical representations enable the retrieval methods to be robust when there are machine transcription errors, allowing retrieval of documents that would be missed by a traditional transcription-based retrieval system. Our experimental results demonstrate that significant improvements in retrieval performance can be achieved compared to standard keyword text searching of machine-transcribed documents. We have developed a software architecture for a multimedia document retrieval framework into which machine learning algorithms for feature extraction and matching may be easily integrated. The framework provides a “plug-and-play” mechanism for the integration of new media types, new feature extraction methods, and new document types.
Related Subjects: