Document Grouping by Using Meronyms and Type-2 Fuzzy Association Rule Mining
The growth of the number of textual documents in the digital world, especially on the World Wide Web, is incredibly fast. This causes an accumulation of information, so we need efficient organization to manage textual documents. One way to accurately classify documents is using fuzzy association rules. The quality of the document clustering is affected by phase extraction of key terms and type of fuzzy logic system (FLS) used for clustering. The use of meronyms in the extraction of key terms to obtain cluster labels helps obtaining meaningful cluster labels and in addition ambiguities and uncertainties that occur in the rules of type-1 fuzzy logic systems can be overcome by using type-2 fuzzy sets. This study proposes a method of key term extraction based on meronyms with an initialization cluster using fuzzy association rule mining for document clustering. This method consists of four stages, i.e. preprocessing of the document, extraction of key terms with meronyms, extraction of candidate clusters, and cluster tree construction. Testing of this method was done with three different datasets: classic, Reuters, and 20 Newsgroup. Testing was done by comparing the overall F-measure of the method without meronyms and with meronyms. Based on the testing, the method with meronyms in the extraction of keywords produced an overall F-measure of 0.5753 for the classic dataset, 0.3984 for the Reuters dataset, and 0.6285 for the 20 Newsgroup dataset.
Luo, C., Li, Y. & Chung, S.M., Text Document Clustering Based on Neighbors, Data & Knowledge Engineering, 68(1), pp. 1271-1288, Jul. 2009.
Chen, C.L., Tseng, F.S.C. & Liang, T., An Integration of WordNet and Fuzzy Association Rule Mining for Multi-label Document Clustering, Data & Knowledge Engineering, 69(1), pp. 1208-1226, Sep. 2010.
Saracoglu, R., Tutuncu, K. & Allahverdi, N., A New Approach on Search for Similiar Documents with Multiple Categories using Fuzzy Clustering, Expert Systems with Applications, pp. 2545-2554, 2008.
Beil, F., Ester, M. & Xu, X., Frequent Term-Based Text Clustering, Proc. of Int'l Conf. on knowledge Discovery and Data Mining, pp. 436-442, 2002.
Fung, B.C.M, Wang, K. & Ester, M., Hierarchical Document Clustering using frequent itemset, Simon Fraser University, 2002.
Chen, C.L., Tseng, F.S.C. & Liang, T., Mining Fuzzy Frequent Itemset for Hierarchical Document Clustering, Information Processing and Management, 46, pp. 193-211, Oct. 2010.
Sari, S., Document-based Clustering Hierarchically based on Fuzzy Sets of Trapezoidal and Triangular Types of Frequent Itemset, Teknik Informatika, Institut Teknologi Sepuluh Nopember, 2012. (Text in Indonesian)
Tseng, Y.H., Generic Title Labeling for Clustered Documents, Expert Systems with Applications, 37, pp. 2247-2254, 2010.
Wei, T., Lu, Y., Chang, H., Zhou, X. & Bao, X., A Semantic Approach for Text Clustering using WordNet and Lexical Chains, Expert Systems with Applications, 42, pp. 2264-2275, Oct. 2015.
Tseng, Y.H., Lin, C.J, Chen, H.H. & Lin, Y., Toward Generic Title Generation for Clustered Documents, Springer-Verlag, pp. 145-157, 2006.
Thangamani, M. & Thangaraj, P., Ontology Based Fuzzy Document Clustering Scheme, Modern Applied Science, 4(7), pp. 148-156, Jul. 2010.
Priya, S. & Priyadharshini, Clustering Technique in Data Mining for Text Documents, (IJCSIT) International Journal of Computer Science and Information Technologies, 3(1), pp. 2943-2947, 2012.
Mendel, J.M. & John, R.I.B., Type-2 Fuzzy Sets Made Simple, IEEE Transactions on Fuzzy System, pp. 117-127, 2002.
Starczewski, J.T, Centroid of triangular and Gaussian type-2 fuzzy sets, Information Sciences, 280, pp. 289-306, May 2014.
Kahraman, C., Oztaysi, B., Sari, I.U. & Turanoglu, E., Fuzzy Analytic Hierarchy Process with Interval Type-2 Fuzzy Sets, Knowledge-Based Systems, 59, pp. 48-57, Feb. 2014.
Starczewski, J.T., Efficient Triangular Type-2 Fuzzy Logic Systems, International Journal of Approximate Reasoning, 50, pp. 799-811, 2009.
- There are currently no refbacks.
ITB Journal Publisher, LPPM – ITB,
Center for Research and Community Services (CRCS) Building Floor 7th,
Jl. Ganesha No. 10 Bandung 40132, Indonesia,