Title:
|
DESIGN OF AN AUTOMATED SYSTEM FOR CLUSTERING HETEROGENEOUS DATA |
Author(s):
|
Dorin Carstoiu , Alexandra Cernian , Adriana Olteanu , Tudor Ionescu |
ISBN:
|
978-972-8924-63-8 |
Editors:
|
Hans Weghorn and Ajith P. Abraham |
Year:
|
2008 |
Edition:
|
Single |
Keywords:
|
Clustering, classification, heterogeneous data, informational content. |
Type:
|
Short Paper |
First Page:
|
82 |
Last Page:
|
86 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
The goal of this work is to study the feasibility of a Heterogeneous Data Classification and Search (HDCS) system and to
provide a possible design for its implementing. In order to design a HDCS system we propose an actor oriented modeling
technique, for which we show the information flow. We have identified 6 different actors (subsystems) which collaborate
to construct a file sheet and produce the final search result. The first 5 actors add information to the files sheet, which is
afterwards used by the final actor to produce the desired result.
Given the vast quantity of data and the variety of formats and encodings it exists in, a semantic approach based on
metadata has been chosen. Instead of digging into the actual data for extracting information, we used the context of the
file to collect its metadata. The metadata is afterwards used for the classification process. The reason for this approach is
that data are made available by people who are interested in other people understanding what the respective data are
about. This observation provided the confidence needed to pursue the presented approach.
The HDCS system we propose combines techniques from conventional search systems, classification systems, search
results clustering systems, while also providing original solutions, such as an innovative data sampling method. |
|
|
|
|