Title:
|
PROPOSAL AND EVALUATION OF A TECHNIQUE OF DISCOVERING XML STRUCTURES FOR EFFICIENT RETRIEVAL |
Author(s):
|
Hiroshi Ishikawa , Hajime Takekawa , Kaoru Katayama |
ISBN:
|
972-8924-19-4 |
Editors:
|
Pedro Isaías, Miguel Baptista Nunes and Inmaculada J. Martínez |
Year:
|
2006 |
Edition:
|
V I, 2 |
Keywords:
|
XML, schema discovery, database, data mining, query |
Type:
|
Full Paper |
First Page:
|
142 |
Last Page:
|
153 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
We propose an adaptable approach to discovery of database schemas for well-formed XML data such as EDI, news, and digital libraries, which we interchange, filter, or download for future retrieval and analysis. The generated schemas usually consist of more than one table. Our approach controls the number of tables to be divided by use of statistics of XML so that the total cost of processing queries is reduced. We generate schemas appropriate for complex data such as text formatting tags and child elements with the small maximum number of occurrences in order to reduce the number of tables. To this end, we introduce three functions NULL expectation, Large Leaf Fields, and Large Child Fields for controlling the number of tables to be divided. We evaluated typical XML queries over the generated schemas and normalized schemas as another approach and measured and compared both of the costs in order to validate our approach. We describe the method for discovering appropriate schemas and the evaluation of the method in detail. |
|
|
|
|