SB-UPT TACCS - View Article

Home | Issues | Profile | History | Submission | Review

Vol: 57(71) No: 4 / December 2012

Relations Between Approximate Join Cardinalities in Random Database Queries
Letiţia Velcescu
University of Bucharest/ Department of Computer Science, Bucharest, Romania, e-mail: letitia@fmi.unibuc.ro
Laurenţiu Vasile
University of Bucharest/ Department of Computer Science, Bucharest, Romania, e-mail: vsl@fmi.unibuc.ro

Keywords: Random database, Probability distribution, Approximate Join, Cardinality

Abstract
This paper introduces an analysis of the cardinality of the approximate join operation in the queries performed on random databases. This type of database is important in the context of the information characterized by uncertainty; such data appear in a variety of domains, like medicine, physics, biology etc. The volume of this type of data is usually large, so the optimization of the queries performed on them represents an important topic. As in classic databases, the most costly relational operation in a query is the join, as it might generate a cartesian product between two large record sets. Because of the uncertain character of data stored in a random database, the relational operations are modified in order to support approximations which are specified in terms of ε-close tuples. In this paper, the approximate join operation is studied for certain important types of probability distributions applied to the relations’ attributes. During our research, we have noticed the existence of several mathematical relations between the cardinalities values in the mentioned cases. In this article, we present an introduction to the domain and formulate the problem. Then, we explain the results of our analysis and we state and prove the inequalities that were found for the standard uniform, exponential and normal probability distributions of the relation’s columns.

References
[1] Seleznjev, O., Thalheim, B., Random Databases with Approximate Record Matching, Methodol. Comput. Appl. Probab., Springer Verlag, 2008.
[2] Talagrand, M., The Glivenko-Cantelli Problem, Annals of Probability, Volume 15, Number 3 (1987), 837-870.
[3] Johnson, N., Kemp, A., Kotz, S., Univariate Discrete Distributions, 3rd edition, John Wiley & Sons, 2005.
[4] Whitmore, G. A., Chapman Findlay, M., Stochastic dominance: an approach to decision-making under risk, Lexington Books, 1978.
[5] Abiteboul, S., Hull, R., Vianu, V., Foundations of Databases, Addison-Wesley, 1995.
[6] Martinez, W., Martinez, A., Computational Statistics Handbook with Matlab, Chapman&Hall, 2002.
[7] Mihoc Gh., Craiu V., Tratat de statistică matematică, vol. 2, Editura Academiei RPR, Bucuresti, 1977.
[8] Velcescu, L., Vasile, L., Relational operators in heterogeneous random databases, IEEE Proceedings of 11th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing Timişoara (SYNASC), România, 26 – 29 septembrie 2009, IEEE Computer Press, 2009.