Scalable and Efficient Student Behavior Prediction using Parallelized Clustering and AHP-weighted KNN

Authors

  • Li Guozhang College of Information Engineering, Hainan Vocational University of Science and Technology, Haikou 571126, Hainan
  • Rayner Alfred Faculty of Computing and Informatics, Universiti Malaysia Sabah, Jalan UMS, 88400 Kota Kinabalu, Sabah,
  • Rayner Pailus Faculty of Computing and Informatics, Universiti Malaysia Sabah, Jalan UMS, 88400 Kota Kinabalu, Sabah
  • Xu Fengchang Shandong Light Industry Vocational College, Zibo City, Shandong Province,
  • Haviluddin Haviluddin Faculty of Computer Science and Information Technology, Universitas Mulawarman, Jalan Sambaliung, Kampus Gunung Kelua, Samarinda, East Kalimantan,

DOI:

https://doi.org/10.5614/itbj.ict.res.appl.2025.19.2.3

Keywords:

analytic hierarchy process , density-based optimized k-means, educational data mining, feature weighting, hybrid model, parallelized feature selection, parallelized model training, student behavior prediction

Abstract

This study proposes a scalable and efficient approach for predicting student behaviour in large-scale educational environments. It introduces a parallelized hybrid model that combines Density-Based Optimized K-Means clustering, Analytic Hierarchy Process (AHP) feature weighting, and Hierarchical K-Nearest Neighbours (KNN), implemented using Apache Spark. The main research question is how to improve scalability, accuracy, and computational efficiency of student behaviour prediction when dealing with large, complex datasets. The model addresses key limitations of traditional methods, such as handling heterogeneous data, treating all features equally, and high computational cost. Two main innovations are presented. First, AHP is used to assign structured importance to features, allowing critical factors like attendance and study time to have greater influence on prediction accuracy. Second, clustering and prediction are parallelized using Spark, enabling efficient real-time processing of large datasets. The approach was evaluated using 18,586 student records and more than 20 million behavioural entries. Results show that Hierarchical KNN consistently outperforms standard KNN as dataset size increases. While traditional KNN shows unstable error rates, peaking at 9.4%, Hierarchical KNN maintains lower and more stable errors between 5.16% and 6.08%. Execution time was also significantly reduced through parallel processing, though gains were limited by communication overhead. Overall, the proposed model offers a robust framework for real-time behaviour analysis, academic risk detection, and targeted educational intervention.

Downloads

Download data is not yet available.

References

Luo, Y., Han, X. & Zhang, C., Prediction of Learning Outcomes with a Machine Learning Algorithm based on Online Learning Behavior Data in Blended Courses, Asia Pacific Education Review, 25(2), 267-285, 2024. DOI: 10.1007/s12564-022-09749-6

Alhammadi, A., Shayea, I., El-Saleh, A.A., Azmi, M.H., Ismail, Z.H., Kouhalvandi, L. & Saad, S. A., Artificial Intelligence in 6G Wireless Networks: Opportunities, Applications, and Challenges, International Journal of Intelligent Systems, 2024, 8845070, 2024. DOI: 10.1155/2024/8845070

Ng, T.K., New Interpretation of Extracurricular Activities Via Social Networking Sites: A Case Study of Artificial Intelligence Learning at a Secondary School in Hong Kong, Journal of Education and Training Studies, 9(1), pp.49-60, 2021. DOI: 10.11114/jets.v9i1.5105

Camerer, C.F., Artificial Intelligence and Behavioral Economics, in Agrawal, A., Gans, J. & Goldfarb, A. (eds), The Economics of Artificial Intelligence: An Agenda, 587-608, University of Chicago Press, 2019. DOI: 10.7208/chicago/9780226613475.001.0001

Hu, J., Huang, Z., Li, J., Xu, L. & Zou, Y., Real-time Classroom Behavior Analysis for Enhanced Engineering Education: An AI-assisted Approach, International Journal of Computational Intelligence Systems, 17(1), 167, 2024. DOI: 10.1007/s44196-024-00572-y

Ya?c?, M., Educational Data Mining: Prediction of Students' Academic Performance using Machine Learning Algorithms, Smart Learn. Environ, 9, 11, 2022. DOI: 10.1186/s40561-022-00192-z

Song, X., Student Performance Prediction Employing K-nearest Neighbor Classification Model and Meta-heuristic Algorithms. Multiscaleand Multidisciplinary Modeling, Experiments and Design, 1-16, 2024. https://doi.org/10.1007/s41939-024-00481-9

Alqatow, I., Rattrout, A. & Jayousi, R., Prediction of Student Performance with Machine Learning Algorithms based on Ensemble Learning Methods. In: Zhang, F., Wang, H., Barhamgi, M., Chen, L., Zhou, R. (eds) Web Information Systems Engineering ? WISE 2023. WISE 2023. Lecture Notes in Computer Science, 14306, Springer, Singapore, 2023. DOI: 10.1007/978-981-99-7254-8_40

Chen, Y. & Zhai, L., A Comparative Study on Student Performance Prediction using Machine Learning, Educ. Inf. Technol., 28, pp. 12039-12057, 2023. DOI: 10.1007/s10639-023-11672-1

Yang, S., Choi, J., Bae, S. & Chung, M., A Hybrid Prediction Model Integrating FCM Clustering Algorithm with Supervised Learning. In: Park, DS., Chao, HC., Jeong, YS., Park, J. (eds) Advances in Computer Science and Ubiquitous Computing. Lecture Notes in Electrical Engineering, 373. Springer, Singapore, 2015. https://doi.org/10.1007/978-981-10-0281-6_88

Hajirahimi, Z. & Khashei, M., A Novel Parallel Hybrid Model based on Series Hybrid Models of ARIMA and ANN Models, Neural Process Lett, 54, pp. 2319-2337, 2022. DOI: 10.1007/s11063-021-10732-2

Khotimah, B.K., Anamisa, D.R., Kustiyahningsih, Y., Fauziah, A.N. & Setiawan, E., Enhancing Small and Medium Enterprises: A Hybrid Clustering and AHP-TOPSIS Decision Support Framework. Ingierie des Systes d?Information, 29(1), pp. 313-321, 2024. DOI: 10.18280/isi.290131

Al-Sayed, Amna, Mashael, M., Khayyat, & Nuha Zamzami, Predicting Heart Disease using Collaborative Clustering and Ensemble Learning Techniques. Applied Sciences 13(24), 13278, 2023. DOI: 10.3390/app132413278

Maddukuri, C.D. & Senapati, R., Hybrid Clustering-based Fast Support Vector Machine Model for Heart Disease Prediction, In: Udgata, S.K., Sethi, S., Gao, XZ. (eds) Intelligent System, ICMIB 2023. Lecture Notes in Networks and Systems, 728, Springer, Singapore, 2024. DOI: 10.1007/978-981-99-3932-9_24

Zhang, L., Zhu, Y., Su, J., Lu, W., Li, J. & Yao, Y., A Hybrid Prediction Model based on KNN-LSTM for Vessel Trajectory, Mathematics, 10(23), 4493, 2022. DOI: 10.3390/math10234493

Dziewior, J., Carr, L.J., Pierce, G.L. & Whitaker, K., College Students Report Less Physical Activity and More Sedentary Behavior during the COVID-19 Pandemic, Journal of American College Health, 72(7), pp. 2022-2030, 2024. DOI: 10.1080/07448481.2022.2100708

Shen, X. & Yuan, C., A College Student Behavior Analysis and Management Method Based on Machine Learning Technology, Wireless Communications and Mobile Computing, 2021, pp. 1-10, 2021. DOI: 10.1007/978-3-030- 89508-2_19

Li, X., Zhang, Y., Cheng, H., Zhou, F. & Yin, B., An Unsupervised Ensemble Clustering Approach for the Analysis of Student Behavioral Patterns, IEEE Access, 9, pp. 7076-7091, 2021. DOI: 10.1109/ACCESS.2021.3049157

Ding, D., Li, J., Wang, H. & Liang, Z., December. Student Behavior Clustering Method based on Campus Big Data, in 2017 13th International Conference on Computational Intelligence and Security (CIS), pp. 500-503, IEEE, 2017. DOI: 10.1109/CIS.2017.00116

Ali El-Sayed Ali, H., Alham, M.H. & Ibrahim, D.K., Big Data Resolving using Apache Spark for Load Forecasting And Demand Response in Smart Grid: A Case Study of Low Carbon London Project. Journal of Big Data, 11(1), 59, 2024. DOI: 10.1186/s40537-024-00909-6

Pourahmad, S., Basirat, A., Rahimi, A. & Doostfatemeh, M., Does the Determination of Initial Cluster Centroids Improve the Performance of the Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm, Minimum Spanning Tree, and Hierarchical Clustering in An Applied Study, Computational and Mathematical Methods in Medicine, 2020. DOI: 10.1155/2020/7636857

Ahmed, M.A., Baharin, H. & Nohuddin, P.N., Analysis of K-means, DBSCAN, and OPTICS Cluster Algorithms on Al-quran Verses, International Journal of Advanced Computer Science and Applications, 11(8), 248-254, 2020. DOI: 10.14569/IJACSA.2020.0110832

Yang, K., Mohammadi Amiri, M. & Kulkarni, S.R., Greedy Centroid Initialization for Federated K-means. Knowledge and Information Systems, 1-33, 2024. DOI: 10.1109/CISS56502.2023.10089666

Frti, P. & Sieranoja, S., How Much Can K-means be Improved by using Better Initialization and Repeats?, Pattern Recognition, 93, 95-112, 2019. DOI: 10.1016/j.patcog.2019.04.014

Truss, M. & Schmitt, M., Human-centered AI Product Prototyping with No-code Automl: Conceptual Framework, Potentials and Limitations, International Journal of Human?Computer Interaction, 1-16, 2024. DOI: 10.1080/10447318.2024.2425454

Hasan, A.S., Jalayer, M., Das, S. & Kabir, M.A.B., Application of Machine Learning Models and SHAP to Examine Crashes Involving Young Drivers in New Jersey, International Journal of Transportation Science and Technology, 14, pp. 156-170, 2024. DOI:.1016/j.ijtst.2023.04.005

Pamadi, E.V.N., Khan, S. & Goel, E,O., A Comparative Study on Enhancing Container Management with Kubernetes, International Journal of Advanced Research and Interdisciplinary Scientific Endeavours, 1(3), pp. 116-133, 2024. DOI: 10.61359/11.2206-2411

Edeni, C.A., Adeleye, O.O. & Adeniyi, I.S., The Role of AI-enhanced Tools in Overcoming Socioeconomic Barriers in Education: A Conceptual Analysis, World Journal of Advanced Research and Reviews, 21(3), pp. 944-951, 2024. DOI: 10.30574/wjarr.2024.21.3.0780

Wang, Y., Yang, C., Lan, S., Zhu, L., & Zhang, Y., End-edge-cloud Collaborative Computing for Deep Learning: A Comprehensive Survey. IEEE Communications Surveys & Tutorials, 26(4), pp. 2647-2683, 2024. DOI: 10.1109/COMST.2024.3393230

Cherukuri, B.R., Serverless Computing: How to Build and Deploy Applications without Managing Infrastructure, World Journal of Advanced Engineering Technology and Sciences, 11(2), pp. 650-663, 2024. DOI: 10.30574/wjaets.2024.11.2.0074

Alfred, R., Summarizing Relational Data using Semi-supervised Genetic Algorithm-based Clustering Techniques, Journal of Computer Science, 6(7), 775, 2010.

Alfred, R. & Kazakov, D., Data Summarization Approach to Relational Domain Learning based on Frequent Pattern to Support the Development of Decision Making, In International Conference on Advanced Data Mining and Applications, (pp. 889-898), Berlin, Heidelberg: Springer Berlin Heidelberg, August, 2006. DOI: 10.1007/11811305_97

Alfred, R., DARA: Data Summarisation with Feature Construction. in 2008 Second Asia International Conference on Modelling & Simulation (AMS) (pp. 830-835), IEEE, May, 2008. DOI: 10.1109/AMS.2008.131

Sainin, M.S., Alfred, R. & Ahmad, F., Ensemble Meta Classifier with Sampling and Feature Selection for Data with Imbalance Multiclass Problem, Journal of Information and Communication Technology, 20(2), pp. 103-133. 2021. DOI: 10.32890/jict2021.20.2.1.

Downloads

Published

2025-12-31

How to Cite

Guozhang, L., Alfred, R., Pailus, R. ., Fengchang, X., & Haviluddin, H. (2025). Scalable and Efficient Student Behavior Prediction using Parallelized Clustering and AHP-weighted KNN. Journal of ICT Research and Applications, 19(2), 142-165. https://doi.org/10.5614/itbj.ict.res.appl.2025.19.2.3