The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling
Wacharasak Siriseriwan and Krung Sinapiromsaran* Author for corresponding; e-mail address: wacharasak.s@gmail.com; krung.s@chula.ac.th
Volume: Vol.43 No.1 (JANUARY 2016)
Research Article
DOI:
Received: 10 June 2013, Revised: -, Accepted: 12 November 2014, Published: -
Citation: Siriseriwan W. and Sinapiromsaran K., The Effective Redistribution for Imbalance Dataset : Relocating Safe-Level SMOTE with Minority Outcast Handling, Chiang Mai Journal of Science, 2016; 43(1): 234-246.
Abstract
The redistribution of the target class by oversampling synthetic minority instances is one of the effective directions for class imbalance problem. Safe-level SMOTE generates synthetic minority instances around original instances while avoiding nearby majority ones. However, despite of this intention, it is still possible that some synthetic instances can be placed too close to nearby majority instances which possibly confuse some classifiers. Moreover, Safe-Level SMOTE technically avoids using minority outcast instances for generating synthetic instances. This generated dataset may lose some precious information of minority class. Our paper aims to remedy these two drawbacks of Safe-Level SMOTE by combining two processes. The first one is checking and moving these synthetic instances away from possibly surrounding majority instances. The second is handling minority outcast with 1-nearest neighbor model. The empirical results on UCI and PROMISE datasets show the improvements of F-measure, which is the performance measure used in the class imbalance problem, for various classifiers such as decision tree, naïve Bayes classifier, multilayer perceptron, support vector machine and K-nearest neighbor. The improvements are tested by Wilcoxon sign test to show its significance.