Home | Issues | Profile | History | Submission | Review
Vol: 49(63) No: 3 / September 2004

The Lexicon Semantics Specification
Emil M. Popa
Department of Computer Science , University Lucian Blaga of Sibiu , Faculty of Sciences, str. Dr.I.Ratiu 5-7, 550012 Sibiu, ROMANIA, Sibiu, phone: 0269/214216, e-mail: emilm.popa@ulbsibiu.ro, web: http://www.ulbsibiu.ro


Keywords: finite automata, language, language specification, lexicon, lexicon specification, lexical analyzer, scanner, scanner generator.

Abstract
This paper discusses a new methodology for scanner generation that supports language independent lexicon specification and automatic generation of stand alone lexical analyzers from specifications. The mechanism that sits at the basis of this methodology is the layering of the lexicon specification on two levels: in the first level, \"universal\" lexical constructs which are used as building blocks by most programming languages are defined, and in the second level, customized lexical constructs of specific programming languages are specified in terms of universal lexical constructs. The universal lexicon is specified by regular expressions over a global alphabet used by most programming languages, such as the character set of a keyboard, and is efficiently implemented by deterministic finite automat. The customized lexicon is conveniently specified by regular expressions of properties of universal lexical constructs and is implemented by nondeterministic automat whose transition function is determined by the true value of properties of universal lexemes rather than by the letters of some alphabet. Tools that transform a lexicon specification into a stand alone lexical analyzer are developed and reported. The methodology discussed in this paper is using convenient because the user operates with meaningful constructs and no programming as usual is required. The lexical analyzers those specified are efficient and their correctness is mathematically guaranteed.

References
[1] A.V.Aho, R.Sethi, and J.D.Ullman. COMPILERS Principles, Techniques , and Tools. Addison-Wesley Publishing Company, Reading, Massachusetts, 1986.
[2] J.E.Hopcroft and J.D.Ullman. Introduction to Automata Theory, Languages, and Computations. Addison-Wesley Publishing Company, Reading, Massachusetts, 1979.
[3] J.Knaack and T.Rus. Twolev: A two level scanning algoritms. In Proceedings of the Second International Conference on Algebraic Methodology and Software Technology, AMAST’91, pages 175-179, Iowa City, IA, 52242, 22-25 May, 1991.
[4] J.P.LePeau and T.Rus. Interactive parser construction. Technical Report 88-02, The University of Iowa, Department of Computer Science, Iowa City, IA 52242,1988.
[5] E.Visser. Multi-level specifications. In Language Prototyping – an algebraic specification approach, AMAST Series in Computing, Vol.5, pages 105-197. World Scientific, 1996.