Mining High Utility Itemsets with Regular Occurrence

Komate Amphawan; Philippe Lenca; Anuchit Jitpattanakul; Athasit Surarerks

doi:10.5614/itbj.ict.res.appl.2016.10.2.5

Authors

Komate Amphawan Burapha University, Computational Innovation Laboratory, 20131 Chonburi,
Philippe Lenca Institut Telecom, Telecom Bretagne, UMR CNRS 3192 Lab-STICC,
Anuchit Jitpattanakul Faculty of Applied Science, KNUTNB, 10800 Bangkok
Athasit Surarerks Chulalongkorn University, ELITE Laboratory, 10330 Bangkok,

DOI:

https://doi.org/10.5614/itbj.ict.res.appl.2016.10.2.5

Abstract

High utility itemset mining (HUIM) plays an important role in the data mining community and in a wide range of applications. For example, in retail business it is used for finding sets of sold products that give high profit, low cost, etc. These itemsets can help improve marketing strategies, make promotions/ advertisements, etc. However, since HUIM only considers utility values of items/itemsets, it may not be sufficient to observe product-buying behavior of customers such as information related to "regular purchases of sets of products having a high profit margin". To address this issue, the occurrence behavior of itemsets (in the term of regularity) simultaneously with their utility values was investigated. Then, the problem of mining high utility itemsets with regular occurrence (MHUIR) to find sets of co-occurrence items with high utility values and regular occurrence in a database was considered. An efficient single-pass algorithm, called MHUIRA, was introduced. A new modified utility-list structure, called NUL, was designed to efficiently maintain utility values and occurrence information and to increase the efficiency of computing the utility of itemsets. Experimental studies on real and synthetic datasets and complexity analyses are provided to show the efficiency of MHUIRA combined with NUL in terms of time and space usage for mining interesting itemsets based on regularity and utility constraints.

Downloads

Download data is not yet available.

References

Agrawal, R., Imielinski, T. & Swami, A., Mining Association Rules Between Sets of Items in Large Databases, in Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, pp. 207-216, 1993.

Agrawal, R. & Srikant, R., Fast Algorithms for Mining Association Rules in Large Databases, in Proceedings of the 1994 ACM SIGMOD international conference on Management of data, Minneapolis, MN, USA, pp. 487-499, 1994.

Tanbeer, S.K., Ahmed, C.F., Jeong, B.S. & Lee, Y.K., Discovering Periodic-Frequent Patterns in Transactional Databases, in Proceedings of the 13th Pacific-Asia Knowledge Discovery and Data Mining conference (PAKDD 2009), Bangkok, Thailand, pp. 242-253, 2009.

Chan, R., Yang, Q. & Shen, Y.D., Mining High Utility Itemsets, in Proceedings of IEEE International Conference on Data Mining (ICDM), Melbourne, Florida, USA, pp. 19-26, 2003.

Amphawan, K. & Surarerks, A., Pushing Regularity Constraint on High Utility Itemsets Mining, in Proceedings of International Conference on Advanced Informatics: Concept, Theory and Application (ICAICTA 2015), Chonburi, Thailand, 2015.

Liu, Y., Liao, W.K. & Choudhary, A., A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets, in Advances in Knowledge Discovery and Data Mining, 3518, pp. 689-695, 2005.

Liu, M. & Qu, J., Mining High Utility Itemsets without Candidate Generation, in Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, HI, USA, pp. 55-64, 2012.

Lin, C.W., Hong, T.P. & Lu, W.H., An Effective Tree Structure for Mining High Utility Itemsets, Expert Systems with Application, 38(6), pp.7419-7424, 2011.

Tseng, V.S., Wu, C.W., Shie, B.E. & Yu, P.S., Up-Growth: An Efficient Algorithm for High Utility Itemset Mining, in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, Washington DC, DC, USA, pp.18-27, 2010.

Wu, C.W., Fournier-Viger, P., Yu, P.S. & Tseng, V.S, Efficient Algorithms for Mining the Concise and Lossless Representation of Closed+ High Utility Itemsets, in Proceedings of the 11th IEEE International Conference on Data Mining, Vancouver, Canada, pp. 824-833, 2011.

[1] Ahmed, C., Tanbeer, S.K., Jeong, B.S. & Lee, Y.K., Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases, IEEE TKDE, 21(12), pp. 1708-1721, 2009.

Lin, C-W., Hong, T.-P., Lan, G.-C., Wong, J.-W. & Lin, W.-Y., Efficient Updating of Discovered High-Utility Itemsets for Transaction Deletion in Dynamic Databases, Advanced Engineering Informatics, 29(1), pp. 16-27, 2015.

Lin, C-W., Gan, W. & Hong, T.P., A Fast Updated Algorithm to Maintain the Discovered High-Utility Itemsets for Transaction Modification, Advanced Engineering Informatics, 29(3), pp. 562-574, 2015.

Feng, L., Wang, L. & Jin, B., UT-Tree: Efficient Mining of High Utility Itemsets from Data Streams, Intelligence Data Analysis, pp. 585-602, 2013.

Li, H.-F., Huang, H.-Y. & Lee, S.-Y., Fast and Memory Efficient Mining of High-Utility Itemsets from Data Streams: With and Without Negative Item Profits, Knowledge and Information Systems, 28(3), pp. 495-522, 2011.

Fournier-Viger, P., FHN: Efficient Mining of High-Utility Itemsets with Negative Unit Profits, in Proceedings of the 10th International Conference on Advanced Data Mining and Applications, Guilin, China, pp. 16-29, 2014.

Wu, C.W., Shie, B-E., Tseng, V.S. & Yu, P.S., Mining Top-k High Utility Itemsets, in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, pp. 78-86, 2012.

Ryang, H. & Yun, U., Top-k High Utility Pattern Mining with Effective Threshold Raising Strategies, Knowledge-Based Systems, 76, pp. 109-126, 2015.

Tseng, V.S., Wu, C.-W., Fournier-Viger, P. & Yu, P.S., Efficient Algorithms for Mining Top-K High Utility Itemsets, IEEE Transactions on Knowledge and Data Engineering, 28(1), pp. 54-67, 2015.

Podpecan, V., Lavrac, N. & Kononenko, I,. A Fast Algorithm for Mining Utility Frequent Itemsets, in Proceedings of the International Workshop on Constraint-based Mining and Learning at ECML/PKDD, Warsaw, Poland, pp. 9-20, 2007.

Sugunadevi, P. & Mythily, A.S., Efficient Algorithm for Mining High Utility Itemsets, The International Journal Of Science & Technology, 2(5), pp. 250-253, 2014.

Glynn, E.F, Chen, J. & Mushegain, A.R., Detecting Periodic Patterns in Unevenly Spaced Gene Expression, Time Series Using Lombscargle Periodograms, Bioinformatics, 22(3), pp. 310-316, 2006.

Luth, S., Herkel, J., Kanzler, S., Frenzel, C., Galle, P.R., Dienes, H.P., Schramm, C. & Lohse A.W., Serologic Markers Compared with Liver Biopsy for Monitoring Disease Activity in Autoimmune Hepatitis, Journal of Clinical Gastorenterology, 42(8), pp. 926-930, 2008.

Khallel, M., Dash, G., Choudhary, K. & Khan, M., Medical Data Mining for Discovering Periodically Frequent Diseases From Transactional Databases, Computational Intelligence in Data Mining, 31, pp. 87-96, 2015.

Engler, J., Mining Periodic Patterns in Manufacturing Test Data, in Proceedings of International Conference IEEE SoutheastCon 2008, Huntsville, Alabama, USA, pp. 389-395, 2008.

Li, Z., Ding, B., Han, J., Kays, R. & Nye, P., Mining Periodic Behaviors For Moving Objects, in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, Washington DC, DC, USA, pp. 1099-1108, 2010.

Soulas, J. & Lenca, P., Periodic Episode Discovery Over Event Streams, in Proceedings of the 17th Portuguese Conference on Artificial Intelligence, EPIA 2015, Coimbra, Portugal, pp. 547-559, 2015.

Tanbeer, S.K., Ahmed, C.F. & Jeong, B-S., Mining Regular Patterns in Incremental Transactional Database, in Proceedings of the 12th international Asia-Pacific web conference, Busan, Korea, pp. 375-377, 2010.

Kumar, G. & Kumari, V., Sliding Window Technique to Mine Regular Frequent Pattern in Data Streams Using Vertical Format, in Proceedings of IEEE International Conference on Computational Intelligence Computing Research, Coimbatore, India, pp. 1-4, 2012.

Amphawan, K., Lenca, P. & Surarerks, A., Mining Top-K Periodic Frequent Patterns without Support Threshold, in Proceedings of the 3rd International Conference on Advances in Information Technology, Bangkok, Thailand, pp. 18-29, 2009.

Amphawan, K., Lenca, P. & Surarerks, A., Mining Top-K Regular-Frequent Itemsets Using Database Partitioning and Support Estimation, Expert Systems with Applications, 39(2), pp. 1924-1936, 2012.

Amphawan, K. & Lenca, P., Mining Top-K Frequent-Regular Closed Patterns, Expert Systems with Applications, 42(21), pp. 7882-7894, 2015.

Kiran, R.U. & Reddy, P.K., Towards Efficient Mining of Periodic-Frequent Patterns in Transactional Databases, in Proceedings of the 1st International Conference on Database and Expert Systems Applications, Bilbao, Spain, pp. 194-208, 2010.

Surana, A., Kiran, R.U. & Reddy, P.K., An Efficient Approach to Mine Periodic Frequent Patterns in Transactional Databases, in Proceedings of International Workshops on New Frontiers in Applied Data Mining, Shenzhen, China, pp. 254-266, 2012.

Amphawan, K., Soulas, J. & Lenca, P., Mining Top-K Regular Episodes from Sensor Streams, in Proceedings of the 7th International Conference on Advances in Information Technology, Bangkok, Thailand, pp. 76-85, 2015.

Fournier-Viger, P., Spmf: An Open-Source Data Mining Library, http://www.philippe-fournier-viger.com/spmf/, 2015.