Digital Library

cab1

 
Title:      CLUSTER OF REUTERS 21578 COLLECTIONS USING GENETIC ALGORITHMS AND NZIPF METHOD
Author(s):      José Luis Castillo Sequera , José R. Fernández Del Castillo , León González Sotos
ISBN:      978-972-8924-88-1
Editors:      Ajith P. Abraham
Year:      2009
Edition:      Single
Keywords:      Clustering, Information Management, Information Search and Retrieval, Data Mining.
Type:      Poster/Demonstration
First Page:      174
Last Page:      176
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      In this paper, we discuss a feature reduction technique and their application to document clustering, showing that feature reduction improves efficiency as well as accuracy. We select the terms starting from the Goffman point, selecting an area of suitable transition making use for it of the Zipf law (our method is called NZIPF). Finally, we demonstrate experimentally that the transition zone that provides better results is taking 40 terms starting from the Goffman point for a cluster of documents with a genetic algorithm non-supervised. The experiments are carried out with the collection Reuters 21578 and the results are grouped by new genetic operators designed to find the affinity and similarity of the documents without having prior knowledge of other characteristics.
   

Social Media Links

Search

Login