Machine-Learning Classifiers for Malware Detection Using Data Features
DOI:
https://doi.org/10.5614/itbj.ict.res.appl.2021.15.3.5Keywords:
artificial intelligence, cyber-attacks, machine learning, malware, ransomwareAbstract
The spread of ransomware has risen exponentially over the past decade, causing huge financial damage to multiple organizations. Various anti-ransomware firms have suggested methods for preventing malware threats. The growing pace, scale and sophistication of malware provide the anti-malware industry with more challenges. Recent literature indicates that academics and anti-virus organizations have begun to use artificial learning as well as fundamental modeling techniques for the research and identification of malware. Orthodox signature-based anti-virus programs struggle to identify unfamiliar malware and track new forms of malware. In this study, a malware evaluation framework focused on machine learning was adopted that consists of several modules: dataset compiling in two separate classes (malicious and benign software), file disassembly, data processing, decision making, and updated malware identification. The data processing module uses grey images, functions for importing and Opcode n-gram to remove malware functionality. The decision making module detects malware and recognizes suspected malware. Different classifiers were considered in the research methodology for the detection and classification of malware. Its effectiveness was validated on the basis of the accuracy of the complete process.
Downloads
References
Santos, I., Penya, Y.K., Bringas, P.G. & Devesa, J., N-grams-based File Signatures for Malware Detection, Proceedings of the 11th International Conference on Enterprise Information Systems - Artificial Intelligence and Decision Support Systems, pp. 317-320. 9, 2009.
Rieck, K., Holz, T., Willems, C., Dsel, P. & Laskov, P., Learning and Classification of Malware Behavior, International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 108-125, 2008.
Konstantinou, E. & Wolthusen, S., Metamorphic Virus: Analysis and Detection, Technical Report, RHUL-MA-2008-02, Royal Holloway University of London, 2008.
Horton, J. & Seberry, J., Computer Viruses: An Introduction, University of Wollongong , 1997.
Smith, C., Matrawy, A., Chow, S. & Abdelaziz, B., Computer Worms, Architectures, Evasion Strategies, and Detection Mechanisms, Journal of Information Assurance and Security, 4, pp. 69-83, 2009.
Moffie, M., Cheng, W., Kaeli, D. & Zhao, Q. Hunting Trojan Horses, Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability, pp. 12-17, October, 2006.
Chien, E., Techniques of Adware and Spyware, Proceedings of the Fifteenth Virus Bulletin Conference, Dublin Ireland, 47, 2005.
Chuvakin, A., An Overview of Unix Rootkits, iALERT White Paper, iDefense Labs, http://www.megasecurity.org/papers/Rootkits.pdf, 2003.
Chumachenko, K., Machine Learning Methods for Malware Detection and Classification, Department of Information Technology, University of Applied Science, Bremen, 2017.
Savage, K., Coogan, P. & Lau, H., The Evolution of Ransomware, Version 1.0, Symantec Corporation, http://www.symantec.com/content/en/us/ enterprise/media/security_response/whitepapers/the-evolution-of-ransomware.pdf., August 6, 2015.
Prasad, B.J., Annangi, H. & Pendyala, K.S., Basic Static Malware Analysis Using Open-Source Tools, 2016.
Egele, M., Scholte, T., Kirda, E. & Kruegel, C., A Survey on Automated Dynamic Malware-analysis Techniques and Tools, ACM computing surveys (CSUR), 44(2), pp. 1-42. 2008.
Ronen, R., Radu, M., Feuerstein, C., Yom-Tov, E. & Ahmadi, M., Microsoft Malware Classification Challenge, arXiv preprint arXiv:1802.10135, 2018.
Gibert, D., Mateu, C. & Planes, J., The Rise of Machine Learning for Detection and Classification of Malware: Research Developments, Trends and Challenge, Journal of Network and Computer Applications, 153, 102526, 2020.
Chu, Q., Liu, G. & Zhu, X., Visualization Feature and CNN Based Homology Classification of Malicious Code, Chinese Journal of Electronics, 29(1), pp. 154-160, 2020.
Baskaran, B. & Ralescu, A., A Study of Android Malware Detection Techniques and Machine Learning, MAICS, pp. 15-23, 2016.
Rieck, K., Trinius, P., Willems, C. & Holz, T., Automatic Analysis of Malware Behavior Using Machine Learning, Journal of Computer Security, 19(4), pp. 639-668, 2011.
Schultz, M.G., Eskin, E., Zadok, E. & Stolfo, S.J., Data Mining Methods for Detection of New Malicious Executables, in Proceedings 2001 IEEE Symposium on Security and Privacy, pp. 38-49, IEEE, 2000.
Bilar, D., Opcodes as Predictor for Malware, International Journal of Electronic Security and Digital Forensics, 1(2), pp. 156-168, 2007.
Sharma, S., Krishna, C.R. & Sahay, S.K., Detection of Advanced Malware by Machine Learning Techniques, Soft Computing: Theories and Applications, Springer, Singapore, pp. 333-342., 2019.
Shabtai, A., Moskovitch, R., Elovici, Y. & Glezer, C., Detection of Malicious Code by Applying Machine Learning Classifiers on Static Features: A State-of-the-Art Survey, Information Security Technical Report, 14(1), pp. 16-29, 2009.
Moskovitch, R., Feher, C., Tzachar, N., Berger, E., Gitelman, M., Dolev, S. & Elovici, Y., Unknown Malcode Detection Using Opcode Representation, European Conference on Intelligence and Security Informatics, Springer, Berlin, Heidelberg, pp. 204-215, 2008.
Santos, I., Nieves, J. & Bringas, P.G., Semi-supervised Learning For Unknown Malware Detection, International Symposium on Distributed Computing and Artificial Intelligence, Springer, Berlin, Heidelberg, 2011.
Santos, I., Brezo, F., Ugarte-Pedrero, X. & Bringas, P.G., Opcode Sequences as Representation of Executables for Data Mining-based Unknown Malware Detection, Information Sciences, 231, pp. 64-82, 2013.
Shabtai, A., Kanonov, U., Elovici, Y., Glezer, C. & Weiss, Y., ?Andromaly?: A Behavioral Malware Detection Framework for Android Devices, Journal of Intelligent Information Systems, 38(1), pp. 161-190, 2012.
Sharma, A. & Sahay, S.K., An Effective Approach for Classification of Advanced Malware with High Accuracy, arXiv preprint arXiv:1606.06897, 2016.
Sahay, S.K. & Sharma, A., Grouping the Executables to Detect Malware with High Accuracy, arXiv preprint arXiv:1606.06908, 2016.
Rohan, P., Microsoft Malware Classification Challenge (BIG 2015), Microsoft, https://www.kaggle.com/c/malware-classification, (10 Dec. 2016.
Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M. & Giacinto, G. Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification, Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 183-194, 2016.
Drew, J., Hahsler, M. & Moore, T., Polymorphic Malware Detection Using Sequence Classification Methods, 2016 IEEE Security and Privacy Workshops (SPW), IEEE, 2016.
Souri, A. & Hosseini, R., A State-of-the-Art Survey of Malware Detection Approaches Using Data Mining Techniques, Human-Centric Computing and Information Sciences 8, 3, 2018. DOI: 10.1186/s13673-018-0125-x.
Ucci, D., Aniello, L. & Baldoni, R., Survey of Machine Learning Techniques for Malware Analysis, Computers & Security, 81, pp. 23-147, 2019.
Ye, Y., Li, T., Adjeroh, D. & Iyengar, S.S., A Survey on Malware Detection Using Data Mining Techniques, ACM Computing Surveys (CSUR), 50, pp. 31-40, 2017.
Ab Razak, M.F., Anuar, N.B., Salleh, R. & Firdaus, A., The Rise of ?Malware?: Bibliometric Analysis of Malware Study, Journal of Network and Computer Applications, 75, pp. 58-76, 2016.
You, I. & Yim, K., Malware Obfuscation Techniques: A Brief Survey, International Conference on Broadband, Wireless Computing, Communication and Applications, IEEE, 2010. DOI: 10.1109/BWCCA.2010.85.
O?Kane, P., Sezer, S. & McLaughlin, K., Detecting Obfuscated Malware Using Reduced Opcode Set and Optimised Runtime Trace, Security Informatics, 5, 2, 2016. DOI: 10.1186/s13388-016-0027-2.
Shirataki, S. & Yamaguchi, S., A Study on Interpretability of Decision of Machine Learning, 2017 IEEE International Conference on Big Data (Big Data), IEEE, PP. 4830-4831, 2017.
Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M. & Kagal, L., Explaining Explanations: An Overview of Interpretability of Machine Learning, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), pp. 80-89, IEEE, 2018.
Tian, R., Batten, L., Islam, Md.R. & Versteeg, S., An Automated Classification System Based on the Strings of Trojan and Virus Families, 2009 4th International Conference on Malicious and Unwanted Software (MALWARE), pp. 23-30, IEEE, 2009.
Ye, Y., Li, T., Chen, Y. & Jiang, Q, Automatic Malware Categorization Using Cluster Ensemble, Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 95-104, July, 2010.
Qinghua, H., Yu, D., Xie, Z. & Li, X., EROS: Ensemble Rough Subspaces, Pattern Recognition, 40(12), pp. 3728-3739, 2007.
Tao, H., Ma, X-P. & Qiao, M-Y., Subspace Selective Ensemble Algorithm Based on Feature Clustering, Journal of Computers 8(2), pp. 509-516, 2013.
Jarvis, R.A. & Patrick., E.A., Clustering using a Similarity Measure Based on Shared Near Neighbors, IEEE Transactions on Computers, 100(11), pp. 1025-1034, 1973.
Sakhnini, J., Karimipour, H., Dehghantanha, A., Parizi, R.M. & Srivastava, G., Security Aspects of Internet of Things Aided Smart Grids: A Bibliometric Survey, Elsevier?s Internet of Things, 100111, 2019.
Yazdinejad, A., HaddadPajouh, H., Dehghantanha, A., Parizi, R.M., Srivastava, G. & Chen, M-Y., Cryptocurrency Malware Hunting: A Deep Recurrent Neural Network Approach, Applied Soft Computing, 96, 106630, 2020.
Laitner, J.A., Nadel, S., Elliott, R.N., Sachs, H. & Khan, S., The Long-Term Energy Efficiency Potential: What The Evidence Suggests, E121, American Council for an Energy-Efficient Economy, Washington DC, 2012.
Amos, B., Turner, H. & White, J., Applying Machine Learning Classifiers to Dynamic Android Malware Detection at Scale, 2013 9th International Wireless Communications And Mobile Computing Conference (IWCMC), pp. 1666-1671, IEEE, 2013.
Yerima, S.Y., Sezer, S. & McWilliams, G., Analysis of Bayesian Classification-based Approaches for Android Malware Detection, IET Information Security, 8(1), pp. 25-36, 2013.
Canfora, F., Nonlinear Superposition Law and Skyrme Crystals, Physical Review D, 88(6), 065028, 2013.
Wu, D-J., Mao, C-H., Wei, T-E., Lee, H-M. & Wu, K-P., Droidmat: Android Malware Detection through Manifest and API Calls Tracing, 2012 Seventh Asia Joint Conference on Information Security, pp. 62-69, IEEE, 2012.
 
						











