Title:
|
A NEW APPROACH FOR DOCUMENT CLUSTERING USING MAPREDUCE (VAR-SECTING CLUSTERING) |
Author(s):
|
Abdelrahman Elsayed, Osama Ismail, Hoda M. O. Mokhtar |
ISBN:
|
978-989-8533-39-5 |
Editors:
|
Ajith P. Abraham, Antonio Palma dos Reis and Jörg Roth |
Year:
|
2015 |
Edition:
|
Single |
Keywords:
|
Clustering; MapReduce; K-means algorithm; Distributed computing |
Type:
|
Full Paper |
First Page:
|
57 |
Last Page:
|
64 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Document clustering is the process of grouping related documents with each other. It facilitates organizing search results and document management. K-means algorithm and its variant bisecting k-means have been applied for document clustering and approved good clustering results. The increased number of available documents requires utilizing of distributed computing and huge number of computer resources which are available through cloud computing. This paper introduces Var-secting k-means algorithm. In addition to generating binary tree as in Bisecting k-means algorithm, it can generate hierarchy tree with variable number of nodes per tree level. The experimental results show that Var-secting k-means algorithm utilizes distributed computing nodes better than Bisecting k-means, especially when using MapReduce programming model. |
|
|
|
|