Vol: 58(72) No: 2 / June 2013 Dimension Reduction by Exploring Conditional IndependencesTamás SzántaiInstitute of Mathematics, Budapest University of Technology and Economics, Műegyetem rkp. 3-9, 1111 Budapest, Hungary, phone: (+36-1) 463-1359, e-mail: szantai@math.bme.hu, web: http://www.math.bme.hu/~szantaiEdith KovácsDepartment of Methodology, Budapest College of Management, Villányi út 11-13, 1114 Budapest, Hungary, phone: (+36-1) 381-8139, e-mail: kovacs.edith@avf.hu dimension reduction, conditional independence, cherry-tree probability distribution, probabilistic machine learningKeywords:AbstractOne of the main problems of data mining is to find the dependence structure of a dataset. Using simply a dependency graph, in which the dependent features (random variables) are connected, often causes that we have to take into account too many features as the dependency graph usually is an extremely dense graph. Exploring the conditional independences yield a more efficient graph with lower number of edges. This led us to the idea of discovering the Markov network underlying the random variables. For this purpose we developed a greedy algorithm which is based on information theoretical concepts. By using this algorithm we obtain a graph structure which helps us to find those important features (a few features only) which can produce efficient clustering of the data set. We also give a method for determining the probability that a new data point belongs to a given cluster. We test the effectiveness of our method on real datasets and compare it with those of other existing methods. References[1] M. Charytanowicz, J. Niewczas, P. Kulczycki, P. A. Kowalski, S. Lukasik and S. Zak, “A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images”, in Information Technologies in Biomedicine, Ewa Pietka, Jacek Kawa (eds.), Springer-Verlag, Berlin, Heidelberg, 2010, pp. 15–24. [2] C. K. Chow and C. N. Liu, “Approximating Discrete Probability Distribution with Dependence Tree”, IEEE Transactions on Informational Theory, vol. 14, pp. 462–467, 1968. [3] L. Faivishevsky and J. Goldberger, “A Nonparametric Information Theoretic Clustering Algorithm”, in Proceedings of 27th International Conference on Machine Learning (ICML 2010), 2010, pp. 351–358. [4] P. Kulczycki and M. Charytanowicz, “A Complete Gradient Clustering Algorithm Formed with Kernel Estimators”, Int. J. Appl. Math. Comput Sci, vol. 20, pp. 123–134, 2010. [5] S. L. Lauritzen and D. J. Spiegelhalter, “Local Computations with Probabilities on Graphical Structures and their Application to Expert Systems”, J. R. Statist. Soc. B, vol. 50, pp. 157–227, 1988. [6] T. Szántai and E. Kovács, “Discovering a Junction Tree behind a Markov Network by a Greedy Algorithm”, Optimization and Engineering, to appear, http://arxiv.org/abs/1104.2762, 2011. [7] T. Szántai and E. Kovács, “Hypergraphs as Means of Discovering the Dependence Structure of a Discrete Multivariate Probability Distribution”, Annals of Operations Research, vol. 193, pp. 71–90, 2012. [8] N. X. Vinh, J. Epps and J. Bailey, “Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance”, Journal of Machine Learning Research, vol. 11, pp. 2837–2854, 2010. |