Digital Library

cab1

 
Title:      DOCUMENTS CLUSTERING FOR NOISE REMOVAL
Author(s):      Abdulmohsen Algarni, Nasser Tairan
ISBN:      978-989-8704-10-8
Editors:      Ajith P. Abraham, Antonio Palma dos Reis and Jörg Roth
Year:      2014
Edition:      Single
Keywords:      Text mining, Clustering, Pre-processing, Dimension reduction
Type:      Short Paper
First Page:      185
Last Page:      189
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      Term-based approaches can extract many features in text documents, but most include noise. Many popular text mining strategies have been adapted to reduce noisy information from extracted features, but these still contain some noise features. However, these noise features can be extracted from the same training documents that the good features were extracted from. Therefore, the main problem is that some training documents contain a large amount of noise data. Reducing the noise data in the training documents would help to reduce noise in the extracted features. Moreover, we believe that removing some training documents (documents that contain more noise data than useful data) can help to improve the effectiveness of a classifier. Using the advantages of the clustering method can help to reduce the effect of noise data. The main problem of clustering is defined to be that of finding groups of similar projects in the data. In this paper, we introduce the methodology of using a clustering algorithm to group training data before it is used. We also test our theory that not all training documents are useful in training the classifier.
   

Social Media Links

Search

Login