DOCUMENTS CLUSTERING FOR NOISE REMOVAL

Abdulmohsen Algarni; Nasser Tairan

Home

Digital Library

Visit Digital Library

Conference Proceedings

IADIS International Conference Intelligent Systems and Agents - ISA

IADIS International Conference Intelligent Systems and Agents 2014 (part of MCCSIS 2014)

Document Info

Title:	DOCUMENTS CLUSTERING FOR NOISE REMOVAL
Author(s):	Abdulmohsen Algarni, Nasser Tairan
ISBN:	978-989-8704-10-8
Editors:	Ajith P. Abraham, Antonio Palma dos Reis and Jörg Roth
Year:	2014
Edition:	Single
Keywords:	Text mining, Clustering, Pre-processing, Dimension reduction
Type:	Short Paper
First Page:	185
Last Page:	189
Language:	English
Cover:
Full Contents:	click to dowload
Paper Abstract:	Term-based approaches can extract many features in text documents, but most include noise. Many popular text mining strategies have been adapted to reduce noisy information from extracted features, but these still contain some noise features. However, these noise features can be extracted from the same training documents that the good features were extracted from. Therefore, the main problem is that some training documents contain a large amount of noise data. Reducing the noise data in the training documents would help to reduce noise in the extracted features. Moreover, we believe that removing some training documents (documents that contain more noise data than useful data) can help to improve the effectiveness of a classifier. Using the advantages of the clustering method can help to reduce the effect of noise data. The main problem of clustering is defined to be that of finding groups of similar projects in the data. In this paper, we introduce the methodology of using a clustering algorithm to group training data before it is used. We also test our theory that not all training documents are useful in training the classifier.

	Go Back

Social Media Links

amazon

Search

Login

Top Visited