Title:
|
EFFICIENT PAGE-LEVEL INFORMATION RETRIEVAL FOR COMPRESSED READABLE DOCUMENTS |
Author(s):
|
Mohsen Madi , Abdelaziz Fellah |
ISBN:
|
978-972-8924-56-0 |
Editors:
|
Nuno Guimarães and Pedro Isaías |
Year:
|
2008 |
Edition:
|
Single |
Keywords:
|
Information retrieval, Huffman compression, page-level retrieval. |
Type:
|
Full Paper |
First Page:
|
156 |
Last Page:
|
163 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
The increasing size of electronic paged documents stored on computers is becoming more unbounded than ever before. A
paged document is a human readable file containing any number of pages of text- and image-content. Such a file may be
a small-paged document or a large 32-volume encyclopedia reaching up to tens of thousands of pages of text and images.
In this paper, novel schemes are introduced to allow efficient storage of Huffman compressed paged documents to enable
page-level retrieval. The approach described herein goes through the following steps: preprocessing of paged documents
for compression and storage purposes, processing users' search string, locating positions of processed string search within
sought documents, retrieving hit pages containing the sought strings, and decompressing and displaying hit pages in their
original forms. The paper is illustrated with a set of experiments showing high performance with respect to both storage
and retrieval. |
|
|
|
|