Partitional Clustering of Underdeveloped Area Infrastructure with Unsupervised Learning Approach: A Case Study in the Island of Java, Indonesia

Bambang Widjanarko Otok; Agus Suharsono; Purhadi Purhadi; Rahmawati Erma  Standsyah; Harun Al Azies

doi:10.5614/jpwk.2022.33.2.3

Authors

Bambang Widjanarko Otok Department of Statistics, Sepuluh Nopember Institute of Technology, Surabaya,
Agus Suharsono Institut Teknologi Sepuluh Nopember
Purhadi Purhadi Institut Teknologi Sepuluh Nopember
Rahmawati Erma Standsyah Institut Teknologi Sepuluh Nopember
Harun Al Azies Institut Teknologi Sepuluh Nopember

DOI:

https://doi.org/10.5614/jpwk.2022.33.2.3

Keywords:

CLARA clustering, infrastructure, underdeveloped areas, unsupervised learning

Abstract

This study attempted to identify underdeveloped areas in regencies/cities on the island of Java, Indonesia, based on a number of infrastructure indicators. An unsupervised learning approach was used to perform partition clustering with the K-Means, K-Medoids, and CLARA methods. In addition to technically obtaining clustering results and conducting a performance comparison of the three unsupervised learning methods, another objective of this research was to map the clustering results to make it easier to recognize the characteristics of the regions indicated as underdeveloped areas, which should be absolute priorities for infrastructure development. It was found that the best clustering method was the CLARA method, with a connectivity coefficient of 7.4794 and a Dunn?s index value of 0.1042. The partition clustering of regencies/cities on Java Island using the CLARA method based on infrastructure indicators resulted in 99 regencies/cities included in the cluster of areas with underdeveloped infrastructure, while 12 regencies/cities were included in the cluster of areas with developing infrastructure, and 8 regencies/cities were included in the cluster of areas with developed infrastructure.

Downloads

Download data is not yet available.

References

Al Azies, H., and Anuraga, G. (2021). Classification of Underdeveloped Areas in Indonesia Using the SVM and k-NN Algorithms. Jurnal ILMU DASAR 22(1), 31-38.

Al Azies, H., and Rositawati, A. F. D. (2021). Mapping of the Reading Literacy Activity Index in East Java Province, Indonesia: An Unsupervised Learning Approach. In Proceedings of The International Conference on Data Science and Official Statistics Vol. 2021, No. 1, pp. 211-223.

Al Azies, H. (2022). Meta Analytic Second Order Confirmatory Factor Analysis Dengan Two Stage-SEM dan Generalized Method of Moments Pada Faktor-Faktor Yang Mempengaruhi Infrastruktur Daerah Tertinggal Di Pulau Jawa. The thesis of Statistics. Institut Teknologi Sepuluh Nopember.

Arora, P., and Varshney, S (2016) Analysis of K-Means and K-Medoids Algorithm for Big Data. Procedia Computer Science 78, 507-512.

Brock, G., Pihur, V., and Datta, S (2008) clValid: An R Package for Cluster Validation. Journal of Statistical Software 25(4), 1?22.

Cheung, M (2013) Multivariate Meta-Analysis as Structural Equation Models. Structural Equation Modeling: A Multidisciplinary Journal 20, 429 - 454.

Clayman, C.L., Srinivasan, S., and Sangwan, R (2020) K-means Clustering and Principal Components Analysis of Microarray Data of L1000 Landmark Genes. Procedia Computer Science 168, 97-104.

Direktorat Perencanaan Dan Identifikasi Daerah Tertinggal (2016) Petunjuk Pelaksanaan (Juklak) Identifikasi Masalah-Masalah Ketertinggalan Kabupaten Daerah Tertinggal. Kementerian Desa, Pembangunan Daerah Tertinggal Dan Transmigrasi. Jakarta, Indonesia. [Online]. Available from: https://ditjenpdt.kemendesa.go.id/index.php/download/getdata/Juklak_Identifikasi_Daerah_Tertinggal.pdf.

Direktorat Utama Pembinaan Dan Pengembangan Hukum Pemeriksaan Keuangan Negara (2015) Peraturan Presiden Republik Indonesia Nomor 131 Tahun 2015. Badan Pemeriksa Keuangan Republik Indonesia - BPK RI. Jakarta, Indonesia. [Online]. Available from: https://peraturan.bpk.go.id/Home/Download/34831/Perpres%20Nomor%20131%20Tahun%20%202015.pdf

Govender, P., and Sivakumar, V. (2020). Application of k-means and hierarchical clustering techniques for analysis of air pollution: A review (1980?2019). Atmospheric Pollution Research 11(1), 40-56.

Gupta, T., & P. Panda, S (2019) A Comparison of K-Means Clustering Algorithm and CLARA Clustering Algorithm on Iris Dataset. International Journal of Engineering & Technology 7(4), 4766-4768.

Han, J., Kamber, M (2006) Data Mining: Concept and Techniques. Waltham: Morgan Kauffman Publisher.

Han, J., Kamber, M., and Pei, J (2011) Data Mining: Concepts and Techniques 3rd edition. Waltham: Morgan Kauffman Publisher.

Kaur, N. K., Kaur, U., and Singh, D. D (2014) K-Medoid clustering algorithm-a review. International Journal of Computer Application and Technology (IJCAT) 1(1), 2349-1841.

Landi, I., Mandelli, V., and Lombardo, M. V. (2021). Reval: A Python package to determine best clustering solutions with stability-based relative clustering validation. Patterns 2(4), 100228.

Likas, A., Vlassis, N., & Verbeek, J (2003) The global k-means clustering algorithm. Pattern Recognit 36, 451-461.

Manochandar, S., Punniyamoorthy, M., and Jeyachitra, R. K. (2020). Development of new seed with modified validity measures for k-means clustering. Computers & Industrial Engineering 141, 106290.

Martin, D.P., and Oertzen, T.V (2015) Growth Mixture Models Outperform Simpler Clustering Algorithms When Detecting Longitudinal Heterogeneity, Even with Small Sample Sizes. Structural Equation Modeling: A Multidisciplinary Journal 22, 264 - 275.

Monica, M., Ayuningtiyas, N. U., Azies, H. A., Riefky, M., Khusna, H., and Rahayu, S. P. (2021). Unsupervised Learning Approach for Evaluating the Impact of COVID-19 on Economic Growth in Indonesia. In International Conference on Soft Computing in Data Science (pp. 54-70). Springer, Singapore.

Nakayama, A., and Shinji, D. (2020). Non-hierarchical Clustering for Large Data Without Recalculating Cluster Center. In Advanced Studies in Classification and Data Science pp. 71-78.

Ohanuba, F. O., Ismail, M. T., and Ali, M. M. (2021). Topological data analysis via unsupervised machine learning for recognizing atmospheric river patterns on flood detection. Scientific African 13, e00968.

Otok, B. W., Agus Suharsono, P., Standsyah, R. E., and Al Azies, H (2020) A Meta Confirmatory Factor Analysis of the Underdeveloped Areas in the Java Island. Paper presented at International Conference on Basic Sciences 2020, Online Conference: November 4th ? 5th, 2020

Otok, B. W., Agus Suharsono, P., Standsyah, R. E., and Al Azies, H (2021) MASEM Infrastructure in Underdeveloped Areas of Java Island. Journal of Southwest Jiaotong University 56(1). 99-107.

Park, H., and Jun, C (2009) A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl 36, 3336-3341.

Park, J., Park, K. V., Yoo, S., Choi, S. O., and Han, S. W. (2020). Development of the WEEE grouping system in South Korea using the hierarchical and non-hierarchical clustering algorithms. Resources, Conservation and Recycling, 161, 104884.

Schubert, E., and Rousseeuw, P. J. (2021). Fast and eager k-medoids clustering: O (k) runtime improvement of the PAM, CLARA, and CLARANS algorithms. Information Systems, 101, 101804.

Sidey-Gibbons, J. A., and Sidey-Gibbons, C. J. (2019). Machine learning in medicine: a practical introduction. BMC medical research methodology, 19(1), 1-18.

Trishnanti, D., and Al Azies, H. (2019). Comparison of Support Vector Machine Method (SVM) and K-Nearest Neighbor (K-NN) in Classification of Human Development Index (HDI). Paper presented at Asean Youth Conference, Kuala Lumpur, 12-13 October.

Viol, C., Roso-Llorach, A., Foguet-Boreu, Q., Guisado-Clavero, M., Pons-Vigu, M., Pujol-Ribera, E., and Valderas, J. M. (2018). Multimorbidity patterns with K-means nonhierarchical cluster analysis. BMC family practice, 19(1), 1-11.

Yang, J., Lee, J. Y., Choi, M., and Joo, Y. (2019). A new approach to determine the optimal number of clusters based on the gap statistic. In International Conference on Machine Learning for Networking pp. 227-239.