Hate Speech Classification in Indonesian Language Tweets by Using Convolutional Neural Network
Keywords:convolutional neural network, deep learning, hate speech, Indonesian language, text classification
AbstractThe rapid development of social media, added with the freedom of social media users to express their opinions, has influenced the spread of hate speech aimed at certain groups. Online based hate speech can be identified by the used of derogatory words in social media posts. Various studies on hate speech classification have been done, however, very few researches have been conducted on hate speech classification in the Indonesian language. This paper proposes a convolutional neural network method for classifying hate speech in tweets in the Indonesian language. Datasets for both the training and testing stages were collected from Twitter. The collected tweets were categorized into hate speech and non-hate speech. We used TF-IDF as the term weighting method for feature extraction. The most optimal training accuracy and validation accuracy gained were 90.85% and 88.34% at 45 epochs. For the testing stage, experiments were conducted with different amounts of testing data. The highest testing accuracy was 82.5%, achieved by the dataset with 50 tweets in each category.
Fauzi, M.A. & Yuniarti, A., Ensemble Method for Indonesian Twitter Hate Speech Detection, Indonesian Journal of Electrical Engineering and Computer Science, 11(1), pp. 294-299, 2018.
MacAvaney, S., Yao, H.R., Yang, E., Russell, K., Goharian, N. & Frieder, O., Hate Speech Detection: Challenges and Solutions, PLoS ONE, 14(8), pp.1-16, 2019.
Varshney, D. & Vishwakarma, D.K. A Review on Rumour Prediction and Veracity Assessment in Online Social Network, Expert Systems with Applications, pp. 114208, 2020.
Vishwakarma, D.K., Varshney, D. & Yadav, A., Detection and Veracity Analysis of Fake News Via Scrapping and Authenticating the Web Search. Cognitive Systems Research, 58, pp. 217-229, 2019.
Meel, P. & Vishwakarma, D.K., Fake News, Rumor, Information Pollution in Social Media and Web: A Contemporary Survey of State-of-the-Arts, Challenges and Opportunities, Expert Systems with Applications, 153, 112986, 2020.
Burnap, P. & Williams, M.L., Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making, Policy and Internet, 2015.
GambaÌˆck, B. & Sikdar, U.K., Using Convolutional Neural Networks to Classify Hate-Speech, Proceedings of the First Workshop on Abusive Language Online, pp. 85-90, 2017.
Alfina, I., Mulia, R., Fanany, M.I. & Ekanata, Y., Hate Speech Detection in the Indonesian Language: A Dataset and Preliminary Study, In 2017 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017, 2017.
Ibrohim, M.O. & Budi, I., A Dataset and Preliminaries Study for Abusive Language Detection in Indonesian Social Media, Procedia Computer Science, 135, pp. 222-229, 2018.
Patihullah, J. & Winarko, E., Hate Speech Detection for Indonesia Tweets Using Word Embedding and Gated Recurrent Unit, IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 13(1), pp.43-52, 2019.
Hassan, A. & Mahmood, A., Convolutional Recurrent Deep Learning Model for Sentence Classification, IEEE Access, 6, pp. 13949-13957, 2018.
Malhotra, B. & Vishwakarma, D. K., Classification of Propagation Path and Tweets for Rumor Detection using Graphical Convolutional Networks and Transformer based Encodings, 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM), pp. 183-190, 2020.
Agarwal, A., Yadav, A. & Vishwakarma, D.K., Multimodal Sentiment Analysis via RNN Variants, Proceedings - 2019 IEEE / ACIS 4th International Conference on Big Data, Cloud Computing, and Data Science, BCD 2019, pp. 19-23, 2019.
Bisht, A., Singh, A., Bhadauria, H.S., Virmani, J. & Kriti., Detection of Hate Speech and Offensive Language in Twitter Data using LSTM Model. Advances in Intelligent Systems and Computing, 1124, Springer Singapore, 2020.
Jianqiang, Z., Xiaolin, G. & Xuejun, Z., Deep Convolution Neural Networks for Twitter Sentiment Analysis, IEEE Access, 6, pp. 23253-23260, 2018.
Gitari, N.D., Zuping, Z., Damien, H. & Long, J., A Lexicon-Based Approach for Hate Speech Detection, International Journal of Multimedia and Ubiquitous Engineering, 10(4), pp.215-230, 2015.
Watanabe, H., Bouazizi, M. & Ohtsuki, T., Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection. IEEE Access, 6, pp. 13825-13835, 2018.
Wadera, M., Mathur, M. & Vishwakarma, D.K., Sentiment Analysis of Tweets - A Comparison of Classifiers on Live Stream of Twitter. Proceedings of the International Conference on Intelligent Computing and Control Systems, ICICCS 2020, (Iciccs), pp. 968-972, 2020.
Alsmadi, I. & Hoon, G.K., Term Weighting Scheme for Short-text Classification: Twitter Corpuses, Neural Computing and Applications, 8, pp. 1-13, 2018.
Widyasanti, N. K., Putra, I.K.G.D. & Rusjayanthi, N.K.D., Selection of Word Weight Features Using the TFIDF Method for Summary Indonesian, Merpati, 6(2), pp. 119-126, 2018. (Text in Indonesian)
Chen, J., Yan, S. & Wong, K.C., Verbal Aggression Detection on Twitter Comments: Convolutional Neural Network for Short-text Sentiment Analysis, Neural Computing and Applications, pp. 1-10, 2018.
Yang, J. & Yang, G., Modified Convolutional Neural Network Based on Dropout and the Stochastic Gradient Descent Optimizer, Algorithms, 11(3), 28, 2018.
Zulfa, I. & Winarko, E., Sentiment Analysis of Indonesian Language Tweets with Deep Belief Network, Indonesian Journal of Computing and Cybernetics Systems, 11(2), pp. 187, 2017. (Text in Indonesian)
Hidayatullah, A.F., Ratnasari, C.I. & Wisnugroho, S., Analysis of Stemming Influence on Indonesian Tweet Classification, Telkomnika (Telecommunication Computing Electronics and Control), 14(2), pp. 665-673, 2016.
Prihatini, P.M., Putra, I., Giriantari, I. & Sudarma, M., Indonesian Text Feature Extraction Using Gibbs Sampling and Mean Variational Inference Latent Dirichlet Allocation, QiR 2017 - 2017 15th International Conference on Quality in Research (QiR): International Symposium on Electrical and Computer Engineering, pp. 40-44, December 2017.
Yusliani, N., Primartha, R. & Diana, M., Multiprocessing Stemming: A Case Study of Indonesian Stemming, International Journal of Computer Applications, 182(40), pp. 15-19, 2019.
Purbolaksono, M.D., Reskyadita, F.D., Adiwijaya, Suryani, A.A. & Huda, A. F., Indonesian Text Classification Using Back Propagation and Sastrawi Stemming Analysis with Information Gain for Selection Feature, International Journal on Advanced Science, Engineering and Information Technology, 10(1), pp. 234-238, 2020.