A Hybrid Simplified Swarm Optimization Method for Imbalanced Data Feature Selection
Main Article Content
Abstract
In recent years, feature selection has become an important field in data mining and is being used heavily in numerous areas. The purpose of feature selection is to search for an optimal subset of features from existing data to maximize the accuracy. However, there are still only a few studies investigating the impact of data imbalance - the existence of underrepresented categories of data - on feature selection problem. The aim of this study is therefore to provide a feature selection method for increasing classifying high-dimensional imbalanced data accuracy. In this study, we propose a hybrid method which can spot a better optimal features subset. In the proposed method, information gain as a filter selects the most informative features from the original dataset. The imbalance of the dataset with selected features is justified by using Synthetic minority over-sampling technique. Simplified swarm optimization is then implemented as feature search engine to guide the search for an optimal feature subset. Finally, support vector machine serves as a classifier to evaluate the performance of the proposed method. To evaluate the performance of proposed algorithm, we apply our algorithm in four benchmark datasets and compare the results with existing algorithm. The results show that our algorithm has a better performance than its competitor.
Keywords: Data Mining; Feature Selection; Imbalanced Data; Soft Computing; Simplified Swarm Optimization; Support Vector Machine
Australian Academy of Business and Economics Review, vol 2, issue 3, July 2016, pp 263-275
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.