Title:
|
INVESTIGATION OF TERM WEIGHTING SCHEMES IN CLASSIFICATION OF IMBALANCED TEXTS |
Author(s):
|
Behzad Naderalvojoud, Ahmet Selman Bozkir, Ebru Akcapinar Sezer |
ISBN:
|
978-989-8704-10-8 |
Editors:
|
Ajith P. Abraham, Antonio Palma dos Reis and Jörg Roth |
Year:
|
2014 |
Edition:
|
Single |
Keywords:
|
Class imbalance problem, machine learning, text classification, term weighting, feature selection |
Type:
|
Full Paper |
First Page:
|
39 |
Last Page:
|
46 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Class imbalance problem in data, plays a critical role in use of machine learning methods for text classification since feature selection methods expect homogeneous distribution as well as machine learning methods. This study investigates two different kinds of feature selection metrics (one-sided and two-sided) as a global component of term weighting schemes (called as tffs) in scenarios where different complexities and imbalance ratios are available. Traditional term weighting approach (tfidf) is employed as a base line to evaluate the effects of tffs weighting. In fact, this study aims to present which kind of weighting schemes are suitable for which machine learning algorithms on different imbalanced cases. Four classification algorithms are used to indicate the effects of term weighting schemes on the imbalanced datasets. According to our findings, regardless of tfidf, term weighting methods based on one-sided feature selection metrics are better approaches for SVM and k-NN algorithms while two-sided based term weighting methods are the best choice for MultiNB and C4.5 on the imbalanced texts. As a result, the use of term weighting methods based on one-sided feature selection metrics is recommended for SVM and tfidf is suitable weighting method for k-NN algorithm in text classification tasks. |
|
|
|
|