A New Term Frequency with Gaussian Technique for Text Classification and Sentiment Analysis


  • Vuttichai Vichianchai Hardware-Human Interface and Communications (H2I-Comm) Laboratory, Department of Computer Science, Faculty of Science, Khon Kaen University, 123 Mittraphap Road., Nai Mueang, Mueang Khon Kaen 40002, Thailand
  • Sumonta Kasemvilas Hardware-Human Interface and Communications (H2I-Comm) Laboratory, Department of Computer Science, Faculty of Science, Khon Kaen University, 123 Mittraphap Road., Nai Mueang, Mueang Khon Kaen 40002, Thailand




This paper proposes a new term frequency with a Gaussian technique (TF-G) to classify the risk of suicide from Thai clinical notes and to perform sentiment analysis based on Thai customer reviews and English tweets of travelers that use US airline services. This research compared TF-G with term weighting techniques based on Thai text classification methods from previous researches, including the bag-of-words (BoW), term frequency (TF), term frequency-inverse document frequency (TF-IDF), and term frequency-inverse corpus document frequency (TF-ICF) techniques. Suicide risk classification and sentiment analysis were performed with the decision tree (DT), nae Bayes (NB), support vector machine (SVM), random forest (RF), and multilayer perceptron (MLP) techniques. The experimental results showed that TF-G is appropriate for feature extraction to classify the risk of suicide and to analyze the sentiments of customer reviews and tweets of travelers. The TF-G technique was more accurate than BoW, TF, TF-IDF and TF-ICF for term weighting in Thai suicide risk classification, for term weighting in sentiment analysis of Thai customer reviews for Burger King, Pizza Hut, and Sizzler restaurants, and for the sentiment analysis of English tweets of travelers using US airline services.


Salton, G. & Buckley, C., Term-weighting Approaches in Automatic Text Retrieval, Inf., Process., Manage., 24(4), pp. 513-523, 1988.

Salton, G. & McGill, M., Introduction to Modern Information Retrieval, McGraw-Hill Book Company, New York, NY, 1983.

Alsmadi, I. & Hoon, G.K., Term Weighting Scheme for Short-text Classification: Twitter Corpuses, Neural Computing and Applications, 31(8), pp. 3819-3831, 2019.

Inrak, P. & Sinthupinyo, S., Applying Latent Semantic Analysis to Classify Emotions in Thai Text, in 2010 2nd International Conference on Computer Engineering and Technology, IEEE, 6, pp. V6-450, 2010.

Chirawichitchai, N., Emotion Classification of Thai Text Based Using Term Weighting and Machine Learning Techniques, 2014 11th International Joint Conference on Computer Science and Software Engineering JCSSE, IEEE, pp. 91-96, May. 2014.

Charoensuk, J. & Sornil, O., A Hierarchical Emotion Classification Technique for Thai Reviews, Journal of ICT Research and Applications, 12(3), pp. 280-296, 2018.

Hemtanon, S. & Kittiphattanabawon, N., An Automatic Screening for Major Depressive Disorder from Social Media in Thailand, 2019 10th National and International Research Conference and Presentation, 1(10), pp. 103-113, 2019.

Mazyad, A., Teytaud, F. & Fonlupt, C., A Comparative Study on Term Weighting Schemes for Text Classification, International Workshop on Machine Learning, Optimization, and Big Data, Springer, Cham., pp. 100-108, September. 2017.

Mazyad, A., Teytaud, F. & Fonlupt, C., Generating Term Weighting Schemes through Genetic Programming, International Conference on Machine Learning, Optimization, and Data Science, Springer, Cham., pp. 92-103, September. 2018.

McTear, M.F., Callejas, Z. & Griol, D., The Conversational Interface, Cham: Springer, 6(94), pp. 102, 2016.

Reed, J.W., Jiao, Y., Potok, T.E., Klump, B.A., Elmore, M.T. & Hurson, A.R., TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams, 2006 5th International Conference on Machine Learning and Applications ICMLA?06, IEEE, pp. 258-263, 2006.

Han, J., Kamber, M. & Pei, J., Mining Frequent Patterns, Associations, and Correlations., Data Mining: Concepts and Techniques, pp. 227-283, 2006.

Chen, J., Huang, H., Tian, S. & Qu, Y., Feature Selection for Text Classification with Nae Bayes, Expert Systems with Applications, 36(3), pp. 5432-5435, 2009.

Suykens, J.A. & Vandewalle, J., Least Squares Support Vector Machine Classifiers, Neural Processing Letters, 9(3), pp. 293-300, 1999.

Liaw, A. & Wiener, M., Classification and Regression by Randomforest, R news, 2(3), pp. 18-22, 2002.

Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M. & Gao, J., Deep Learning-based Text Classification: A Comprehensive Review, arXiv preprint arXiv:2004.03705, 2020.

Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N. & Kingsbury, B., Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Processing Magazine 29(6), pp. 82-97, 2012.

JagerV3, sentiment_analysis_thai/corpus, https://github.com/JagerV3/ sentiment_analysis_thai, 2017. (22 June 2021)

Twitter US Airline Sentiment, Analyze How Travelers in February 2015 Expressed Their Feelings on Twitter, https://www.kaggle.com/ crowdflower/twitter-airline sentiment?select=Tweets.csv, 2018. (23 June 2021)

Poovorawan, Y. & Imarom, V., Dictionary-based Thai Syllable Segmentation, 9th Electrical Engineering Conference, 1986.

Sornlertlamvanich, V., Word Segmentation for Thai in Machine Translation System, Machine Translation, NECTEC, pp. 556-561, 1993.

Sammons, M., Christodoulopoulos, C., Kordjamshidi, P., Khashabi, D., Srikumar, V. & Roth, D., Edison: Feature Extraction for NLP, Simplified, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC?16), pp. 4085-4092, 2016.

Fang, X. & Zhan, J., Sentiment Analysis Using Product Review Data, Journal of Big Data, 2(1), pp. 1-14, 2015.

Hutto, C. & Gilbert, E., VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of social media Text, 2014 International AAAI Conference on Web and Social Media, 8(1), May. 2014.