Improving Floating Search Feature Selection using Genetic Algorithm

Kanyanut Homsapaya, Ohm Sornil

Abstract


Classification, a process for predicting the class of a given input data, is one of the most fundamental tasks in data mining. Classification performance is negatively affected by noisy data and therefore selecting features relevant to the problem is a critical step in classification, especially when applied to large datasets. In this article, a novel filter-based floating search technique for feature selection to select an optimal set of features for classification purposes is proposed. A genetic algorithm is employed to improve the quality of the features selected by the floating search method in each iteration. A criterion function is applied to select relevant and high-quality features that can improve classification accuracy. The proposed method was evaluated using 20 standard machine learning datasets of various size and complexity. The results show that the proposed method is effective in general across different classifiers and performs well in comparison with recently reported techniques. In addition, the application of the proposed method with support vector machine provides the best performance among the classifiers studied and outperformed previous researches with the majority of data sets.

Keywords


classification; evaluation; feature selection; floating search; genetic algorithm.

Full Text:

PDF

References


Wu, X. & Zhu, X., Mining with Noise Knowledge: Error-aware Data Mining, IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and Humans, 38(4), pp. 917-932, 2008.

Sáez, José A., Luengo, J. & Herrera, F., Evaluating the Classifier Behavior with Noisy Data Considering Performance and Robustness: the Equalized Loss of Accuracy Measure, Neurocomputing, 176, pp. 26-35, 2016.

Sáez, José A., Galar, M., Luengo, J. & Herrera, F. Analyzing the Presence of Noise in Multi-class Problems: Alleviating Its Influence with the One-vs-One Decomposition, Knowledge and Information systems, 38(1), pp. 179-206, 2014.

Dash, M. & Liu, H., Feature Selection for Classification, Intelligent Data Analysis, 1(1-4), pp. 131–156, Mar. 1997.

Bins, J. & Draper, B., Feature Selection from Huge Feature Sets, Proceedings on Eighth IEEE International Conference on Computer Vision, pp. 159-165, 2001.

Kira, K. & Rendell, L., The Feature Selection Problem: Traditional Methods and a New Algorithm, AAAI-92 Proceedings, pp. 129-134, 1992.

MacQueen, J., Some Methods for Classification and Analysis of Multivariate Observations, Proceedings of the Fifth Berkley Symposium on Mathematics, Statistics and Probability, pp. 281-297, 1967.

Somol, P., Pudil, P., Novovicova, J. & Pacliik, P., Flexible Hybrid Sequential Floating Search in Statistical Feature Selection, Structural, Syntactic, and Statistical Pattern Recognition, Lecture Notes in Computer Science, Vol. 4109, Fred, A., Caelli, T., Duin, R.P.W., Campilho, A., Ridder, D., eds., pp. 632-639, Springer, 2006.

Guyon, I. & Elisseeff, A., An Introduction to Variable and Feature Selection, Journal of Machine Learning Research, 3, pp. 1157-1182, 2003.

Maroño, N.S., Betanzos, A. & Castillo, E., A New Wrapper Method for Feature Subset Selection, Proceedings European Symposium on Artificial Neural Networks, pp. 515-520, 2005.

Karegowda, A.G., Jayaram, M.A. & Manjunath, A.S., Feature Subset Selection using Cascaded GA & CFS: A Filter Approach in Supervised Learning, International Journal of Computer Applications, 23(2), pp. 1-10, Jun. 2011.

Zhang, H. & Sun, G., Feature Selection using Tabu Search Method, Pattern Recognition, 35, pp. 701-711, 2002.

Pudil, P. Novovicova, J. & Kittler, J., Floating Search Methods in Feature Selection, Pattern Recognition Letters, 15, pp. 1119-1125, 1994.

Nakariyakul, S. & Casasent, D.P. An Improvement on Floating Search Algorithms for Feature Subset Selection, Pattern Recognition, 41(9), pp. 1932-1940, Sep 2009.

Chaiyakarn, J. & Sornil, O., A Two-Stage Automatic Feature Selection for Classification, International Journal of Advancements in Computing Technology, 5(14), pp. 168-179, Oct. 2013.

De Maesschalck, Roy, Jouan-Rimbaud, Delphine. & Massart, D.L., The Mahalanobis Distance., Chemometrics and Intelligent Laboratory Systems, 50, (1), pp. 1-18, 2000.

Bruzzone, L. & Serpico, S.B., A Technique for Feature Selection in Multiclass Problems, International Journal of Remote Sensing, 21(3), pp. 549-563, 2000.

Battiti, R., Using Mutual Information for Selecting Features in Supervised Neural Net Learning, IEEE Transactions on Neural Networks, 5(4), pp. 537-550, 1994.

Goldberg, D., Genetic Algorithms in Search, Optimization, and Machine Learning, Addison Wesley, 1989.

Brill, F., Brown, D. & Martin, W., Fast Genetic Selection of Features for Neural Networks Classifiers, IEEE Transactions: Neural Networks, 3(2), pp. 324–328, March 1992.

Cortes, C. & Vapnik, V., Support-Vector Networks, Machine Learning, 20(3), pp. 273-297, 1995.

Huang, C.-L. & Wang, C.-J., A GA-based Feature Selection and Parameters Optimization for Support Vector Machines., Expert Systems with Applications, 31(2), pp. 231-240, 2000.

Tsai, C-J., Lee, C-I. & Yang, W-P., A Discretization Algorithm based on Class-attribute Contingency Coefficient., Information Science, 178, pp. 714-731, 2008.

Brieman, L., Friedman, J., Olshen, R. & Stone, C., Classification of Regression Trees, Wadsworth Inc., 1984.

Oh, I.S., Lee, J.S. & Moon, B.R., Hybrid Genetic Algorithms for Feature Selection, IEEE Transactions: Pattern Analysis and Machine Intelligence, 26(11), pp. 1424–1437, Nov. 2004.

Asuncion, A. & Newman, D.J., UCI Machine Learning Repository, University of California, Department of Information and Computer Science, http://www.ics.uci.edu/~mlearn/MLRepository.html (2007).

Yang, T., Cao, L. & Zhang, C., A Novel Prototype Reduction Method for the K-nearest Neighbor Algorithm with K≥ 1, Pacific-Asia Conference on Knowledge Discovery and Data Mining – Volume part II, pp. 89-100, 2010.

Gupta, A., Classification of Complex UCI Datasets using Machine Learning and Evolutionary Algorithms, International Journal of Scientific & Technology Research, 4(5), pp. 85-94, May 2015.

Liu, H., Motoda, H. & Yu, L., A Selective Sampling Approach to Active Feature Selection, Artificial Intelligence, 159(1), pp. 49-74, Nov. 2004.

Ratanamahatana, C. & Gunopulos, D., Scaling up the Naïve Bayesian Classifier using Decision Trees for Feature Selection, Applied Artificial Intelligence, 17(5-6), pp. 475-487, 2003.

Anwar, H., Qamar, U., & Qureshi, A.W.M., Global Optimization Ensemble Model for Classification Methods, The Scientific World Journal, 2014, pp. 1-9, Apr. 2014.

Tsai, C-F., Lin, W-Y., Hong, Z-F. & Hsieh, C-Y., Distance-based Features in Pattern Classification, EURASIP Journal on Advances in Signal Processing, 2011(62), pp. 2-11, Dec. 2011.

Lavanya, D., & Usha Rani, K., Analysis of Feature Selection with Classification: Breast Cancer Datasets, Indian Journal of Computer Science and Engineering, 2(5), pp. 756-763, 2011.




DOI: http://dx.doi.org/10.5614%2Fitbj.ict.res.appl.2017.11.3.6

Refbacks

  • There are currently no refbacks.


Contact Information:

ITB Journal Publisher, LPPM – ITB, 

Center for Research and Community Services (CRCS) Building Floor 7th, 
Jl. Ganesha No. 10 Bandung 40132, Indonesia,

Tel. +62-22-86010080,

Fax.: +62-22-86010051;

e-mail: jictra@lppm.itb.ac.id.