Sentiment Classification for Film Reviews in Gujarati Text Using Machine Learning and Sentiment Lexicons


  • Parita Shah Department of Computer Engineering, Sarva Vidyalaya Kelavani Mandal managed Vidush Somany Institute of Technology and Research, Kadi, India
  • Priya Swaminarayan Faculty of Information Technology and Computer Science, Parul University, Vadodara, India
  • Maitri Patel Department of Computer Engineering, Gandhinagar University, India



Gujarati Text, lexicon, machine classifier, movie reviews, sentiment analysis


In this paper, two techniques for sentiment classification are proposed: Gujarati Lexicon Sentiment Analysis (GLSA) and Gujarati Machine Learning Sentiment Analysis (GMLSA) for sentiment classification of Gujarati text film reviews. Five different datasets were produced to validate the machine learning-based and lexicon-based methods? accuracy. The lexicon-based approach employs a sentiment lexicon known as GujSentiWordNet, which identifies sentiments with a sentiment score for feature generation, while in the machine learning-based approach, five classifiers are used: logistic regression (LR), random forest (RF), k-nearest neighbors (KNN), support vector machine (SVM), naive Bayes (NB) with TF-IDF, and count vectorizer for feature selection. Experiments were carried out and the results obtained were compared using accuracy, precision, recall, and F-score as performance evaluation criteria. According to the test results, the machine learning-based technique improved accuracy by 3 to 10% on average when compared to the lexicon-based approach.


Download data is not yet available.


Kaur, J. & Saini, J.R., A Study and Analysis of Opinion Mining Research in Indo-Aryan, Dravidian and Tibeto-Burman Language Families, International Journal of Data Mining and Emerging Technologies [Internet]. Diva Enterprises Private Limited, 4(2), 53, 2014. DOI: 10.5958/2249-3220.2014.00002.0

Gukanesh, A.V. & Kumar, G.K., Saranya KKRK| N. Twitter Data Analytics ? Sentiment Analysis of an Election, International Journal of Trend in Scientific Research and Development [Internet]. South Asia Management Association, 2(3), pp. 1600-1603, 2018. DOI: 10.31142/ijtsrd11457.

Fouad, M.M. & Gharib, T.F. & Mashat A.S., Efficient Twitter Sentiment Analysis System with Feature Selection and Classifier Ensemble, Advances in Intelligent Systems and Computing [Internet]. Springer International Publishing, pp. 516-527, 2018. DOI: 10.1007/978-3-319-74690-6_51.

Ahuja, R., Chug, A., Kohli, S., Gupta, S. & Ahuja, P., The Impact of Features Extraction on the Sentiment Analysis, Procedia Computer Science, Elsevier BV, 152, pp. 341-348, 2019. DOI: 10.1016/j.procs.2019.05.008.

Gohil, L. & Patel, D., A Sentiment Analysis of Gujarati Text using Gujarati Senti Word Net, Regular Issue, Blue Eyes Intelligence Engineering and Sciences Engineering and Sciences Publication ? BEIESP, 8(9), pp. 2290?2292, 2019. DOI: 10.35940/ijitee.i8443.078919.

Mumtaz, D. & Ahuja, B., Sentiment Analysis of Movie Review Data Using Senti-Lexicon Algorithm, 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), IEEE, 2016. DOI: 10.1109/icatcct.2016.7912069.

Jha, V., Manjunath, N., Shenoy, P.D., Venugopal, K.R. & Patnaik, L.M., HOMS: Hindi Opinion Mining System, 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS), IEEE; Jul 2015. DOI: 10.1109/retis.2015.7232906.

Liu, B., Opinion Mining and Sentiment Analysis, Web Data Mining, Springer Berlin Heidelberg, pp. 459-526, 2011. DOI: 10.1007/978-3-642-19460-3_11.

Pandey, P. & Govilkar, S., A Framework for Sentiment Analysis in Hindi using HSWN. International Journal of Computer Applications, Foundation of Computer Science, 119(19), pp. 23-26, 2015. DOI: 10.5120/21176-4185.

Rehman, Z.U. & Bajwa, I.S., Lexicon-based sentiment analysis for Urdu language. 2016 Sixth International Conference on Innovative Computing Technology (INTECH), IEEE; 2016. DOI: 10.1109/intech.2016.7845095

Popale L & Bhattacharyya, P., Creating Marathi WordNet, The WordNet in Indian Languages. Springer Singapore, pp. 147?66, 2016, DOI: 10.1007/978-981-10-1909-8_8.

Rohini, V., Thomas, M. & Latha, C.A., Domain Based Sentiment Analysis in Regional Language-Kannada Using Machine Learning Algorithm, 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT). IEEE, May 2016. DOI: 10.1109/rteict.2016.7807872.

Joshi, V.C. & Vekariya, V.M., An Approach to Sentiment Analysis On Gujarati Tweets. Advances in Computational Sciences and Technology, 10(5), pp. 1487-1493, 2017.

Mishra, A., Joshi, A. & Bhattacharyya, P., A Cognitive Study of Subjectivity Extraction in Sentiment Annotation. Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Association for Computational Linguistics, 2014. DOI: 10.3115/v1/w14-2623.

Ghosal, T., Das, S.K. & Bhattacharjee, S., Sentiment Analysis on (Bengali horoscope) Corpus. 2015 Annual IEEE India Conference (INDICON). IEEE, 2015. DOI: 10.1109/indicon.2015.7443551.

Se, S., Vinayakumar, R., Anand Kumar, M. & Soman, K.P., Predicting the Sentimental Reviews in Tamil Movie using Machine Learning Algorithms, Indian Journal of Science and Technology, Indian Society for Education and Environment, 9(45), pp. 1-5, 2016. DOI: 10.17485/ijst/2016/v9i45/106482.

Kaur, A. & Gupta, V., N-gram Based Approach for Opinion Mining of Punjabi Text, Multi-disciplinary Trends in Artificial Intelligence, Springer International Publishing, pp. 81-88, 2014. DOI: 10.1007/978-3-319-13365-2_8.

Deepamala, N. & Ramakanth Kumar, P., Polarity Detection of Kannada Documents, 2015 IEEE International Advance Computing Conference (IACC), IEEE, 2015, DOI: 10.1109/iadcc.2015.7154810.

Venugopalan, M. & Gupta, D., Sentiment Classification for Hindi Tweets in a Constrained Environment Augmented Using Tweet Specific Features. Lecture Notes in Computer Science, Springer International Publishing, pp. 664-670, 2015.DOI: 10.1007/978-3-319-26832-3_63.

Prasad, S.S., Kumar, J., Prabhakar, D.K. & Pal, S., Sentiment Classification: An Approach for Indian Language Tweets Using Decision Tree, Lecture Notes in Computer Science, Springer International Publishing, pp.656-663, 2015. DOI: 10.1007/978-3-319-26832-3_62.

Ashna, M.P. & Sunny, AK., Lexicon Based Sentiment Analysis System for Malayalam Language, 2017 International Conference on Computing Methodologies and Communication (ICCMC), IEEE, Jul 2017. DOI: 10.1109/iccmc.2017.8282571.

Mittal, N., Agarwal, B., Chouhan, G., Pareek, P. & Bania, N., Discourse Based Sentiment Analysis for Hindi Reviews, Pattern Recognition and Machine Intelligence, Springer Berlin Heidelberg, pp. 720-725, 2013. DOI: 10.1007/978-3-642-45062-4_102.

Sarkar, K. & Chakraborty, S., A Sentiment Analysis System for Indian Language Tweets. Lecture Notes in Computer Science, Springer International Publishing, pp. 694-702, 2015. DOI: 10.1007/978-3-319-26832-3_66.

Deepamala, N. & Ramakanth Kumar, P., Polarity detection of Kannada Documents, 2015 IEEE International Advance Computing Conference (IACC), IEEE, 2015. DOI: 10.1109/iadcc.2015.7154810.

Kour, K., Kour, J. & Singh, P., Lexicon-Based Sentiment Analysis. Advances in Communication and Computational Technology, Springer Singapore, pp. 1421?30, 2020. DOI: 10.1007/978-981-15-5341-7_108.

Machov K., Mikula, M., Gao, X. & Mach, M., Lexicon-based Sentiment Analysis Using the Particle Swarm Optimization. Electronics, MDPI AG, 9(8), 1317, 2020. DOI: 10.3390/electronics9081317.

Esuli, A., Sebastiani, F. & Abasi, A., AI and Opinion Mining, Part 2. IEEE Intelligent Systems, Institute of Electrical and Electronics Engineers (IEEE), 25(4), pp, 72-79, 2010. DOI: 10.1109/mis.2010.94.

Medagoda, N., Shanmuganathan, S. & Whalley, J., A Comparative Analysis of Opinion Mining and Sentiment Classification in Non-English Languages, 2013 International Conference on Advances in ICT for Emerging Regions (ICTer), IEEE, 2013. DOI: 10.1109/icter.2013.6761169.

Turner, Z., Labille, K. & Gauch, S., Lexicon-Based Sentiment Analysis for Stock Movement Prediction, Journal of Construction Materials, Institute of Construction Materials, 2(3), 2021. DOI: 10.36756/jcm.v2.3.5.

Feldman, R., Techniques and Applications for Sentiment Analysis, Communications of the ACM, Association for Computing Machinery (ACM), 56(4), pp. 82-9, 2013. DOI: 10.1145/2436256.2436274.

Shah, P. & Swaminarayan, P., Lexicon-Based Sentiment Analysis on Movie Review in the Gujarati Language, International Journal of Information Technology, Communications and Convergence, Inderscience Publishers, 4(1), pp. 63, 2021. DOI: 10.1504/ijitcc.2021.10042767.

Shah, P.V. & Swaminarayan, P., Sentiment Analysis ? An Evaluation of the Sentiment of the People: A Survey, Data Science and Intelligent Applications. Springer Singapore, pp. 53-61, 2020. DOI: 10.1007/978-981-15-4474-3_6.

Shah, P., Swaminarayan, P. & Patel, M., Sentiment Analysis on Film Review in Gujarati Language Using Machine Learning, International Journal of Electrical and Computer Engineering (IJECE). Institute of Advanced Engineering and Science, 12(1), pp. 1030, 2022. DOI: 10.11591/ijece.v12i1.pp1030-1039.

Shah, P.V. & Swaminarayan, P., Sentiment Analysis on Gujarati Text: A Survey, Journal of Computational and Theoretical Nanoscience, American Scientific Publishers, 17(9), pp. 4075?82, 2020. DOI: 10.1166/jctn.2020.9022.

Shah, P., Swaminarayan, P., Patel, M. & Patel, N., Sentiment Analysis on Movie Reviews in Regional Language Gujarati Using Machine Learning Algorithm, International Journal of Engineering Trends and Technology. Seventh Sense Research Group, 70(1), pp. 313-326, 2022. DOI: 10.14445/22315381/IJETT-V70I1P236.




How to Cite

Shah, P., Swaminarayan, P. ., & Patel, M. (2023). Sentiment Classification for Film Reviews in Gujarati Text Using Machine Learning and Sentiment Lexicons. Journal of ICT Research and Applications, 17(1), 1-16.