Digital Library

cab1

 
Title:      EFFICIENT PAGE-LEVEL INFORMATION RETRIEVAL FOR COMPRESSED READABLE DOCUMENTS
Author(s):      Mohsen Madi , Abdelaziz Fellah
ISBN:      978-972-8924-56-0
Editors:      Nuno Guimarães and Pedro Isaías
Year:      2008
Edition:      Single
Keywords:      Information retrieval, Huffman compression, page-level retrieval.
Type:      Full Paper
First Page:      156
Last Page:      163
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      The increasing size of electronic paged documents stored on computers is becoming more unbounded than ever before. A paged document is a human readable file containing any number of pages of text- and image-content. Such a file may be a small-paged document or a large 32-volume encyclopedia reaching up to tens of thousands of pages of text and images. In this paper, novel schemes are introduced to allow efficient storage of Huffman compressed paged documents to enable page-level retrieval. The approach described herein goes through the following steps: preprocessing of paged documents for compression and storage purposes, processing users' search string, locating positions of processed string search within sought documents, retrieving hit pages containing the sought strings, and decompressing and displaying hit pages in their original forms. The paper is illustrated with a set of experiments showing high performance with respect to both storage and retrieval.
   

Social Media Links

Search

Login