Digital Library

cab1

 
Title:      A GENETIC RULE LEARNING APPROACH TO DEAL WITH IMBALANCED DATASETS
Author(s):      Aouatef Mahani, Sadjia Benkhider, Ahmed Riadh Baba-Ali
ISBN:      978-989-8533-39-5
Editors:      Ajith P. Abraham, Antonio Palma dos Reis and Jörg Roth
Year:      2015
Edition:      Single
Keywords:      Data Mining, Supervised Classification, Imbalanced Datasets, GA sampling. words.
Type:      Short Paper
First Page:      151
Last Page:      156
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      The apparition of imbalanced datasets in many domains created a new sort of classification called partial classification. The difficulty of this kind of datasets and their apparition in many domains have created an active research axis and the classification is actually divided into two groups: the complete one used for balanced datasets and the partial one known as "Nugget Discovery" and used for imbalanced datasets. Our approach provides a new type of undersampling based on a classifier. This classifier is itself constructed using a genetic learning algorithm. Indeed in the literature there are two sorts of datasets: the balanced and the imbalanced datasets. These later are composed of two sorts of instances: majority instances (the most frequent) and minority ones (less frequent) which belong to a less frequent class. The learning algorithm provides in general rules with high values of precision and coverage. Consequently, these rules are specific to majority instances, whereas there is smallest number of exact rules representing the minority instances. We conclude that these algorithms and their used measures cannot respond for our needs. Our major interests consist of producing an accurate model of classification. For this, we firstly try to transform the imbalanced dataset in a balanced one. Two phases are used: we firstly construct a classifier (a set of rules) based on a genetic algorithm (GA). Then, we delete the majority instances which are well classified by the classification rules of the produced classifier. Consequently, our approach uses a new kind of undersampling technique, which we called “undersampling GA based approach”. In the second phase we divide at first, the balanced dataset BDS into two datasets, training dataset (TBDS) and test dataset (BDtest). Then, we apply the learning algorithm on training dataset (TBDS) to produce a new classifier C2. The performances of C2 are then tested using the test dataset. We will also analyze its advantages using the appropriate measures of imbalanced datasets such as precision and AUC.
   

Social Media Links

Search

Login