Vol: 49(63) No: 3 / September 2004 Finding the Size-Restricted Frequent Itemsets in Market Basket Data Renata Ivancsy Department of Automation and Applied Informatics, Budapest University of Technology and Economics, Faculty of Electrical Engineering and Informatics, H-1111 Budapest, Goldmann Gy. ter 3, Hungary, phone: +3614632870, e-mail: renata.ivancsy@aut.bme.hu Istvan Vajk Department of Automation and Applied Informatics, Budapest University of Technology and Economics, Faculty of Electrical Engineering and Informatics, H-1111 Budapest, Goldmann Gy. ter 3, Hungary, phone: +3614632870, e-mail: vajk@aut.bme.hu Keywords: association rule, frequent itemset, constraint, size-restricted rule, Apriori, FP-growth Abstract Association rule mining techniques search for items appearing frequent together in market basket data. Most of the recommended algorithms find all the frequent itemsets however users are mainly interested only on a subset of it. By means of the constraint-based association rule mining approach users have the possibility to more specify the rules to be found. Users can give constraints, which are to be fulfilled by the found frequent itemsets. An essential request could be to mine only the frequent itemsets, which size is smaller than a given threshold. These itemsets are called size-restricted itemsets. In this paper necessary modifications are shown to the well known Apriori and FP-growth algorithms assuming that only the size-restricted itemsets are to be found. A novel method, the CodeMatrix algorithm is contributed as well, to find the short patterns quickly. Combining the Apriori and the CodeMatrix method leads to an algorithm which finds the size-restricted frequent itemsets quickly. Using the new method is the time of finding all frequent itemsets are enhanced as well. Experimental results show the time saving when using the modified algorithms and using the suggested new method. References [1] R. Agrawal, T. Imielinski and A.Swami: Mining association rules between sets of items of large databases, Proc. of the ACM SIGMOD Int. Conf. On Management of Data, Washington, D.C.,USA, 1993, 207-216 [2] R. Srikant, Q. Vu, and R. Agrawal: Mining association rules with item constraints, Proc. 3rd Int. Conf. Knowledge Discovery and Data Mining, Newport Beach, California, USA ,august 14-17. 1997, pp. 67-73. [3] B. Liu, W. Hsu and Y. Ma: Mining Association Rules with Multiple Minimum Supports, Knowledge Discovery and Data Mining, 1999, pp. 337-341 [4] K. Wang, Y. He and J. Han: Mining Frequent Itemsets Using Support Constraints, The VLDB Journal, pp. 43-52, 2000 [5] R. Ng, L. V. S. Lakshmanan, J. Han and A. Pang. Exploratory mining and pruning optimaizations of constrained associations rules, SIGMOD, pages 13-24, May 1998 [6] R. Agrawal and R. Srikant: Fast algorithms for mining association rules, Proc. 20th Very Large Databases Conference, Santiago, Chile, 1994, pp. 487-499 [7] J. S. Park, M. Chen, and P. S. Yu: An effective hash based algorithm for mining association rules Proc. of the 1995 ACM Int. Conf. on Management of Data, San Jose, California, USA, 1995, pp. 175-186 [8] S.Brin, R. Motawani, J.D. Ullman and S. Tsur: Dynamic Item set counting and implication rules for market basket data, Proc of the ACM SIGMOD Intl’l Conf. On Management of Data, Tucson, Arizona, USA, 1997, pp. 255-264 [9] J.Han, J. Pei and Y. Yin: Mining frequent patterns without candidate generation, Proc. of the 2000 ACM-SIGMOD Int. Conf. On Management of Data, Dallas, Texas, USA, 2000, pp. 1-12 |