Vol: 49(63) No: 3 / September 2004 A New PC Cluster Based Distributed Association Rule Mining Sandor Juhasz Department of Automation and Applied Informatics, Budapest University of Technology and Economics, Faculty of Electrical Engineering and Informatics, 1111 Budapest, Goldmann Gyorgy ter 3. IV. em., Hungary, phone: (36) 1 463-1648, e-mail: juhasz.sandor@aut.bme.hu Ferenc Kovacs Department of Automation and Applied Informatics, Budapest University of Technology and Economics, Faculty of Electrical Engineering and Informatics, 1111 Budapest, Goldmann Gyorgy ter 3. IV. em., Hungary, phone: (36) 1 463-1648, e-mail: ferenc.kovacs@aut.bme.hu Keywords: data mining, association rule mining, distributed data mining Abstract One of the most important problems in data mining is association rule mining. It requires very large computation and I/O traffic capacity. For that reason there are several parallel mining algorithms which can take advantage of the performance of the cluster systems. These algorithms are optimised and developed on supercomputer platforms, but nowadays the capacity of PC preserves the possibility to build cluster systems cheaper. Usage of PC cluster systems raises some issues about the optimisation of the distributed mining algorithms, especially the cost of the node to node communication and data distribution. The current association rule mining algorithms do not consider the cost of data distribution and they do not make any data processing during the data distribution phase. The main part of the distributed association rule mining algorithms is based on Apriori algorithm therefore these algorithms suffer the drawbacks of the Apriori algorithm. In this paper a new distributed association rule mining algorithm is introduced, which is based on modified version of the Apriori algorithm, which gives a solution of the Apriori algorithm bottlenecks. Other advantage of this new distributed algorithm is that it can make significant and fundamental data processing during the data distribution phase. References [1] R. Agrawal, T. Imielinski, and A. Swami, Mining association rules between sets of items in large databases, Proc. ACM-SIGMOD Conference, 1993 [2] R. Agrawal and R. Srikant, Fast algorithms for mining association rules, Proc. 20th Very Large Databases Conference, 1994 [3] J. Han J. Pei and Y. Yin, Mining Frequent Patterns without Candidate Generation, Proc. ACM SIGMOD International. Conference on Management of Data, 2000 [4] R. Agrwal and J.C. Schafer, Parallel mining of association rules, IEEE Trans. Knowledge and Data Engineering, vol. 8, no 6, 1996 [5] O. R. Zaïne, M. El-Hajj and P. Lu, Fast Parallel Association Rule Mining Without Candidacy Generation, Proc. IEEE International Conference on Data Mining , 2001 [6] Renáta Iváncsy, Ferenc Kovács and István Vajk, An Analysis of Association Rule Mining Algorithms, Proc. of International Conference of ICSC EIS, 2004 [7] Ferenc Kovács, Renáta Iváncsy and István Vajk, Evaluation of the Serial Association Rule Mining Algorithms, Proc. of International Conference of IASTED DBA, 2004 [8] M. Tamura and M. Kitsuregawa, Dynamic load balancing for parallel association rule mining on heterogeneous PC cluster system, Proc. 25th Very Large Databases Conference, 1999 [9] T. Shintani and M. Kitsuregawa, Hash based parallel algorithms for mining association rules, Proc. Parallel and Distributed Information Systems Conference, 1996 [10] T. Shimomura and S. Shibusawa, Performance Evaluation of Distributed Algorithms for Mining Association Rules on Workstation Cluster, Proc. IEEE International Workshops on Parallel Processing (ICPP’00- Workshops), 2000 [11] J. Zhang, H. Shi and L. Zheng, A Method and Algorithm of Distributed Mining Association Rules in Synchronism, Proc. IEEE International Conference on Machine Learning and Cybernetics, 2002 [12] Ferenc Kovács, Renáta Iváncsy and István Vajk, Dynamic Itemset Counting in PC Cluster Based Association Rule Mining, Proc. of International Conference of ICSC EIS, 2004 [13] S.Brin, R. Motawani, J.D. Ullman and S. Tsur, Dynamic Item set counting and implication rules for market basket data, Proc. ACM-SIGMOD Conference, 1997 [14] T. Shintani and M. Kitsuregawa: Hash based parallel algorithms for mining association rules, Proc. Parallel and Distributed Information Systems Conference, 1996 [15] S. Juhász et al., “The Pyramid Project”, Budapest University of Technology and Economics, Department of Automation and Applied Informatics, http://avalon.aut.bme.hu/~sanyo/piramis, 2002 |