Title:
|
CLUSTER OF REUTERS 21578 COLLECTIONS USING GENETIC ALGORITHMS AND NZIPF METHOD |
Author(s):
|
José Luis Castillo Sequera , José R. Fernández Del Castillo , León González Sotos |
ISBN:
|
978-972-8924-88-1 |
Editors:
|
Ajith P. Abraham |
Year:
|
2009 |
Edition:
|
Single |
Keywords:
|
Clustering, Information Management, Information Search and Retrieval, Data Mining. |
Type:
|
Poster/Demonstration |
First Page:
|
174 |
Last Page:
|
176 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
In this paper, we discuss a feature reduction technique and their application to document clustering, showing that feature
reduction improves efficiency as well as accuracy. We select the terms starting from the Goffman point, selecting an area
of suitable transition making use for it of the Zipf law (our method is called NZIPF). Finally, we demonstrate
experimentally that the transition zone that provides better results is taking 40 terms starting from the Goffman point for a
cluster of documents with a genetic algorithm non-supervised. The experiments are carried out with the collection
Reuters 21578 and the results are grouped by new genetic operators designed to find the affinity and similarity of the
documents without having prior knowledge of other characteristics. |
|
|
|
|