Digital Library

cab1

 
Title:      INVESTIGATION OF TERM WEIGHTING SCHEMES IN CLASSIFICATION OF IMBALANCED TEXTS
Author(s):      Behzad Naderalvojoud, Ahmet Selman Bozkir, Ebru Akcapinar Sezer
ISBN:      978-989-8704-10-8
Editors:      Ajith P. Abraham, Antonio Palma dos Reis and Jörg Roth
Year:      2014
Edition:      Single
Keywords:      Class imbalance problem, machine learning, text classification, term weighting, feature selection
Type:      Full Paper
First Page:      39
Last Page:      46
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      Class imbalance problem in data, plays a critical role in use of machine learning methods for text classification since feature selection methods expect homogeneous distribution as well as machine learning methods. This study investigates two different kinds of feature selection metrics (one-sided and two-sided) as a global component of term weighting schemes (called as tffs) in scenarios where different complexities and imbalance ratios are available. Traditional term weighting approach (tfidf) is employed as a base line to evaluate the effects of tffs weighting. In fact, this study aims to present which kind of weighting schemes are suitable for which machine learning algorithms on different imbalanced cases. Four classification algorithms are used to indicate the effects of term weighting schemes on the imbalanced datasets. According to our findings, regardless of tfidf, term weighting methods based on one-sided feature selection metrics are better approaches for SVM and k-NN algorithms while two-sided based term weighting methods are the best choice for MultiNB and C4.5 on the imbalanced texts. As a result, the use of term weighting methods based on one-sided feature selection metrics is recommended for SVM and tfidf is suitable weighting method for k-NN algorithm in text classification tasks.
   

Social Media Links

Search

Login