Title:
|
RUN-LENGTH ENCODING APPLIED TO GRID SPACE STORING AND INDEXING |
Author(s):
|
Frank Wang , Na Helian , Yau Jim Yip |
ISBN:
|
972-9027-53-6 |
Editors:
|
Pedro Isaías |
Year:
|
2002 |
Edition:
|
Single |
Keywords:
|
Data warehousing, databases, index, grid space, run-length encoding . |
Type:
|
Full Paper |
First Page:
|
461 |
Last Page:
|
471 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
In data warehousing it is common for users to want 20-40 columns of data to be searchable. In database marketing, it can be 100's of attributes. Each query to mine the data may use a different combination of criteria. In this article,
run-length encoding transforms the record space of all k key attributes into a one-dimensional run-length array. Such a compression scheme is proven to be compatible with common database access operations. As a result, in a bitmap space of k attributes, just k one-dimensional attribute arrays plus a
one-dimensional run-length array need to be maintained. Physical database records do not exist as a whole because they are resolved after being input to the system and absorbed by those k+1
one-dimensional arrays. The storage overhead has been greatly reduced due to the limited size of those
one-dimensional arrays: the size of the attribute array is equal to the cardinality of its attribute domain; the size of the run-length array is equal to, in the worst case, or very likely much smaller than the total number of the database records due to the probability of 1s appearing next to each other in the bitmap space. A fully specified query retrieves a single record in at most two disk-block accesses and potentially at much faster speed as all those k+1 one-dimensional arrays can easily reside in the main memory. The experimental simulation confirms that the search, insertion and deletion operations can be executed efficiently in a compressed bitmap. |
|
|
|
|