Title:
|
FEATURE REDUCTION FOR DOCUMENT CLUSTERING WITH NZIPF METHOD |
Author(s):
|
José Luis Castillo Sequera , José R. Fernández Del Castillo , León González Sotos |
ISBN:
|
978-972-8924-78-2 |
Editors:
|
Piet Kommers and Pedro Isaías |
Year:
|
2009 |
Edition:
|
2 |
Keywords:
|
Clustering, Information Management, Information Search and Retrieval, Data Mining. |
Type:
|
Short Paper |
First Page:
|
205 |
Last Page:
|
209 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
In this paper, we discuss a feature reduction technique and their application to document clustering, showing that feature reduction improves efficiency as well as accuracy. We select the terms starting from the Goffman point, selecting an area of suitable transition making use for it of the Zipf law (our method is called NZIPF). The experiments are carried out with the collection Reuters 21578 and the results are compared with other methods to validate their efficiency. Finally, we demonstrate experimentally that the transition zone that provides better results is taking 40 terms starting from the Goffman point for a supervised clustering algorithm. |
|
|
|
|