Title:
|
XML DOCUMENTS CLUSTERING BASED ON STRUCTURAL SIMILARITY |
Author(s):
|
Ali Aïtelhadj , Fatiha Souam , Mohamed Mezghiche |
ISBN:
|
978-972-8924-93-5 |
Editors:
|
Pedro Isaías, Bebo White and Miguel Baptista Nunes |
Year:
|
2009 |
Edition:
|
1 |
Keywords:
|
Clustering, structurally similar, hierarchical context, tree, threshold. |
Type:
|
Full Paper |
First Page:
|
559 |
Last Page:
|
566 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
In this paper we develop a clustering method for XML documents. Our approach is two-step. We first automatically
extract the structure from each XML document to be classified. This extracted structure is thus used as a model of
representation to classify the corresponding XML document. Our methodology consists in grouping similarly structured
XML documents in clusters in order to reduce the response time and raise accuracy of the search engine. This is based on
the idea that if the XML documents share similar structures, they are more likely to correspond to the structural part of
the same query. Note that in the XML applications, queries may have a content part and a structure part. The matching of
XML documents tree structures is based on the calculation of their similarities. Finally, for the experimentation purpose
we tested our clustering algorithm on both real and synthetic data. The results clearly demonstrate the relevance of our
approach. |
|
|
|
|