SB-UPT TACCS - View Article

Home | Issues | Profile | History | Submission | Review

Vol: 61(75) No: 1 / March 2016

Compositional Distributional Semantics Using a Graph Digital Signal Processing Method
Mircea Trifan
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada, phone: (613)562-5800-6234, e-mail: mircea@ncct.uottawa.ca, web: http://ncct.uottawa.ca/
Bogdan Ionescu
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada, e-mail: bogdan@ncct.uottawa.ca
Cristian Gadea
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada, e-mail: cgadea@ncct.uottawa.ca
Dan Ionescu
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada, e-mail: dan@ncct.uottawa.ca

Keywords: compositional distributional semantics, Hadamard matrix, CDMA, NLP, similarity

Abstract
This paper focuses on the problem of devising a computationally tractable procedure for representing the natural language understanding (NLU). It approaches this goal, by using distributional models of meaning through a method from graphbased digital signal processing (DSP) which only recently grabbed the attention of researchers from the field of natural language processing (NLP) related to big data analysis. The novelty of our approach lies in the combination of three domains: advances in deep learning algorithms for word representation, dependency parsing for modeling inter-word relations and convolution using orthogonal Hadamard codes for composing the two previous areas, generating a unique representation for the sentence. Two types of problems are resolved in a new unified way: sentence similarity given by the cos function of the corresponding vectors and question-answering where the query is matched to possible answers. This technique resembles the spread spectrum methods from telecommunication theory where multiple users share a common channel, and are able to communicate without interference. In the content of this paper the case of individual words play the role of users sharing the same sentence. Examples of the method application to a standard set of sentences, used for benchmarking the accuracy and the execution time is also given.

References
[1] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in Proceedings of Workshop at ICLR, 2013.
[2] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Proceedings of NIPS, 2013.
[3] T. Mikolov, W. Yih, and G. Zweig, “Linguistic regularities in continuous space word representations,” in Proceedings of NAACL HLT, 2013.
[4] P. Basile, A. Caputo, and G. Semeraro, “A study on compositional semantics of words in distributional spaces,” in IEEE Sixth International Conference on Semantic Computing, 2012.
[5] T. V. de Cruys, T. Poibeau, and A. Korhonen, “A tensor-based factorization model of semantic compositionality,” in Proceedings of NAACLHLT, Atlanta, Georgia, 2013, p. 11421151.
[6] M. Baroni, R. Bernardi, and R. Zamparelli, “Frege in space: A program of compositional distributional semantics,” Linguistic Issues in Language Technology, Vol 9, vol. 9, 2014.
[7] T. Polajnar, L. Rimell, and S. Clark, “Evaluation of simple distributional compositional operations on longer texts,” in Proceedings of the 9th Language Resources and Evaluation Conference, Reykjavik, Iceland, 2014, pp. 4440–4443.
[8] C. Heunen, M. Sadrzadeh, and E. Grefenstette, Quantum Physics and Linguistics: A Compositional, Diagrammatic Discourse. Oxford University Press, 2013.
[9] A. Herbelot and E. M. Vecchi, “Building a shared world: Mapping distributional to model-theoretic semantic spaces”.
[10] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst, “The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains, volume 30,” in IEEE Signal Processing Magazine, 2013.
[11] D. I. Shuman, B. Ricaud, and V. Pierre, “Vertex-frequency analysis on graphs,” 2013.
[12] N. Perraudin, J. Paratte, D. Shuman, V. Kalofolias, P. Vandergheynst, and D. K. Hammond, “GSPBOX: A toolbox for signal processing on graphs,” ArXiv e-prints, Aug. 2014.
[13] B. A. Miller, N. Arcolano, and N. T. Bliss, “Efficient anomaly detection in dynamic, attributed graphs: Emerging phenomena and big data,” in IEEE International Conference on Intelligence and Security Informatics, 2013.
[14] Y. Goldberg and O. Levy, “word2vec explained: deriving mikolov et al.’s negative-sampling word-embedding method,” arXiv preprint arXiv:1402.3722, 2014.
[15] M. Trifan, B. Ionescu, C. Gadea, and D. Ionescu, “A graph digital signal processing method for semantic analysis,” in Applied Computational Intelligence and Informatics (SACI), 2015 IEEE 10th Jubilee International Symposium on. IEEE, 2015, pp. 187–192.