A Classifier to Detect Profit and Non Profit Websites Upon Textual Metrics for Security Purposes
Keywords:classifier, cyber-attacks, defense system, network traffic, nonprofit, profit, security polices, textual metrics, website
Currently, most organizations have a defense system to protect their digital communication network against cyberattacks. However, these defense systems deal with all network traffic regardless if it is from profit or non-profit websites. This leads to enforcing more security policies, which negatively affects network speed. Since most dangerous cyberattacks are aimed at commercial websites, because they contain more critical data such as credit card numbers, it is better to set up the defense system priorities towards actual attacks that come from profit websites. This study evaluated the effect of textual website metrics in determining the type of website as profit or nonprofit for security purposes. Classifiers were built to predict the type of website as profit or non-profit by applying machine learning techniques on a dataset. The corpus used for this research included profit and non-profit websites. Both traditional and deep machine learning techniques were applied. The results showed that J48 performed best in terms of accuracy according to its outcomes in all cases. The newly built models can be a significant tool for defense systems of organizations, as they will help them to implement the necessary security policies associated with attacks that come from both profit and non-profit websites. This will have a positive impact on the security and efficiency of the network.
Gangeshwer, D.K., E-Commerce or Internet Marketing: A Business Review from Indian Context, International Journal of u-and e-Service, Science and Technology, 6, pp.187-194, 2013.
Ebay.com, https://www.ebay.com.au/ (7 Sept 2021).
Svaiko, G., The 10 Most Common Website Security Attacks, https://www.tripwire.com/state-of-security/featured/most-common-website-security-attacks-and-how-to-protect-yourself/, (15 Dec 2021).
Blog, I.S.B., Common Cybersecurity Threats for E-Commerce Businesses, https://www.insureon.com/blog/top-cybersecurity-threats-for-ecommerce-businesses, (15 Dec 2021).
Johnson, N., Why Website Security is Important for Your Business, https://www.inmotionhosting.com/blog/why-website-security-is-important-for-your-business/, (7 Sept 2021).
Babapour, S.M. & Roostaee, M., Web Pages Classification: An Effective Approach Based on Text Mining Techniques, IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI), 2017.
Qazi, A. & Goudar, R.H., An Ontology-based Term Weighting Technique for Web Document Categorization, International Conference on Robotics and Smart Manufacturing, 133, pp. 75-81, 2018.
Sun, A., Lim, E.P. & Ng, W.K., Web Classification Using Support Vector Machine, Proceeding WIDM '02 Proceedings of the 4th International Workshop on Web Information and Data Management, pp. 96- 99, 2002.
Hongjian, G. & Yifei, C., Web Classification Algorithm Using Support Vector Machine and Particle Swarm Optimization, IJACT, 4(17), pp. 514 ? 520, 2012.
Chun, Y., Yazhou, L. & Qiong, Q., An Approach for News Web-Pages Content Extraction Using Densitometric Features, Advances in Electric and Electronics Lecture Notes in Electrical Engineering, 155, pp. 135-139, 2012.
Yazdani, M., Eftekhar, M. & Abolhassani, H., Tree-Based Method for Classifying Websites Using Extended Hidden Markov Models, Advances in Knowledge Discovery and Data Mining, 13th Pacific-Asia Conference, pp.780-787, 2009.
Fiol-Roig, G., MirJuli M. & Herraiz, E., Data Mining Techniques for Web Page Classification, Highlights in Practical Applications of Agents and Multiagent Systems Advances in Intelligent and Soft Computing, 89, pp. 61-68, 2011.
Ali, A.H., Hussain, Z.F. & Abd, S.N., Big Data Classification Efficiency Based on Linear Discriminant Analysis, Iraqi Journal for Computer Science and Mathematics, pp. 2788-7421, September, 2020.
Ali, A.H. & Abdullah, M.Z., A Novel Approach for Big Data Classification Based on Hybrid Parallel Dimensionality Reduction Using Spark Cluster, Computer Science, 20(4), December, 2019.
Ali, A.H. & Abdullah, M.Z., A Parallel Grid Optimization of SVM Hyperparameter for Big Data Classification using Spark Radoop, Journal of Modern Science, 6(1), 3, March 2020.
Reviews, W., Readability Test Tool, https://www.webpagefx.com/tools/ read-able/, (7 Sept 2021).