Social Media Text Classification by Enhancing Well-Formed Text Trained Model

Phat Jotikabukkana, Virach Sornlertlamvanich, Okumura Manabu, Choochart Haruechaiyasak

Abstract


Social media are a powerful communication tool in our era of digital information. The large amount of user-generated data is a useful novel source of data, even though it is not easy to extract the treasures from this vast and noisy trove. Since classification is an important part of text mining, many techniques have been proposed to classify this kind of information. We developed an effective technique of social media text classification by semi-supervised learning utilizing an online news source consisting of well-formed text. The computer first automatically extracts news categories, well-categorized by publishers, as classes for topic classification. A bag of words taken from news articles provides the initial keywords related to their category in the form of word vectors. The principal task is to retrieve a set of new productive keywords. Term Frequency-Inverse Document Frequency weighting (TF-IDF) and Word Article Matrix (WAM) are used as main methods. A modification of WAM is recomputed until it becomes the most effective model for social media text classification. The key success factor was enhancing our model with effective keywords from social media. A promising result of 99.50% accuracy was achieved, with more than 98.5% of Precision, Recall, and F-measure after updating the model three times.

Full Text:

PDF

References


Simon, K., Digital, Social & Mobile Worldwide in 2015, We Are Social Ltd., http://wearesocial.net/tag/statistics/ (21 January 2015).

Twitter, Twitter Usage/Company Facts, Twitter, Inc., https://about. twitter.com/company (30 June 2015).

Dave, C. Global Social Media Research Summary 2015, Smart Insights (Marketing Intelligence), Ltd., http://www.smartinsights.com/social-media-marketing/social-media-strategy/new-global-social-media-research (13 October 2015).

Twitter, Entities in Objects, Twitter, Inc., 2015. https://dev.twitter.com/

overview/api/entities-in-twitter-objects (13 October 2015).

Twitter, API Rate Limits, Twitter, Inc., 2015. https://dev.twitter.com/rest/

public/rate-limiting (13 October 2015).

Episod, Search API is limited to the last 7 days?, Twitter, Inc., https://twittercommunity.com/t/search-api-is-limited-to-the-last-7-days/ 11603 (17 July 2013).

Irfan, R., King, C.K., Grages, D., Ewen, S., Khan, S.U., Madani, S.A., Kolodziej, J., Wang, L., Chen, D., Rayes, A., Tziritas, N., Xu, C.Z., Zomaya, A.Y., Alzahrani, A.S. & Li, H., A Survey on Text Mining in Social Networks, Cambridge Journal, The Knowledge Engineering Review, 30(2), pp. 157-170, 2015.

Patel, P. & Mistry, K., A Review: Text Classification on Social Media Data, IOSR Journal of Computer Engineering, 17(1), pp. 80-84, 2015.

Lee, K., Palsetia, D., Narayanan, R., Patwary, Md.M.A., Agrawal, A. & Choudhary, A.S, Twitter Trending Topic Classification, in Proceeding of the 2011 IEEE 11th International Conference on Data Mining Workshops, ICDW’11, pp. 251-258, 2011.

Kateb, F. & Kalita, J., Classifying Short Text in Social Media: Twitter as Case Study, International Journal of Computer Applications, 111(9), pp. 1-12, 2015.

Chirawichitichai, N., Sanguansat, P. & Meesad, P., A Comparative Study on Feature Weight in Thai Document Categorization Framework, 10th International Conference on Innovative Internet Community Services (I2CS), IICS, pp. 257-266, 2010.

Theeramunkong, T. & Lertnattee, V., Multi-Dimension Text Classification, SIIT, Thammasat University, 2005.http://www.aclweb.org

/anthology/C02-1155 (25 October 2015).

Viriyayudhakorn, K., Kunifuji, S. & Ogawa, M., A Comparison of Four Association Engines in Divergent Thinking Support Systems on Wikipedia, Knowledge, Information, and Creativity Support Systems, KICSS2010, Springer, pp. 226-237, 2011.

Sornlertlamvanich, V., Pacharawongsakda, E. & Charoenporn, T., Understanding Social Movement by Tracking the Keyword in Social Media, in MAPLEX2015, Yamagata, Japan, February 2015.

Olston, C. & Najork, M., Web Crawling, Foundation and Trends in Information Retrieval, 4(3), pp. 175-246, 2010.

RapidMiner, The Open Source Platform of Choice, Rapid Miner, 2015. https://rapidminer.com/ (15 October 2015).

Dailynews, Online News, Dailynews web, Ltd., 2015, http://www.daily news.co.th/ (15 October 2015).

Meknavin, S., Charoenpornsawat, P. & Kijsirikul, B., Feature-based Thai Word Segmentation, National Electronics and Computer Technology Center, 1997, http://www.cs.cmu.edu/~paisarn/papers/nlprs97.pdf (15 October 2015).

Wu, H.C., Luk, R.W.P., Wong, K.F. & Kwok, K.L., Interpreting TF-IDF Term Weights as Making Relevance Decisions, ACM Transactions on Information Systems, 26(3), Article 13, pp. 1-37, 2008.

Vembunarayanan, J., Tf-Idf and Cosine Similarity, https://janav. wordpress.com/2013/10/27/tf-idf-and-cosine-similarity/ (27 October 2013).




DOI: http://dx.doi.org/10.5614%2Fitbj.ict.res.appl.2016.10.2.6

Refbacks

  • There are currently no refbacks.


Contact Information:

ITB Journal Publisher, LPPM – ITB, 

Center for Research and Community Services (CRCS) Building Floor 7th, 
Jl. Ganesha No. 10 Bandung 40132, Indonesia,

Tel. +62-22-86010080,

Fax.: +62-22-86010051;

e-mail: jictra@lppm.itb.ac.id.