Vol: 54(68) No: 4 / December 2009 Extending Fuzzy Q-learning with Fuzzy Rule Interpolation Method “FIVE” Dávid Vincze Department of Information Technology, University of Miskolc, Faculty of Mechanical Engineering and Information Science, 3515 Miskolc, Hungary, phone: (+36) 46-565-333, e-mail: david.vincze@iit.uni-miskolc.hu, web: www.iit.uni-miskolc.hu/~vinczed Szilveszter Kovács Department of Information Technology, University of Miskolc, Faculty of Mechanical Engineering and Information Science, 3515 Miskolc, Hungary, e-mail: szkovacs@iit.uni-miskolc.hu, web: www.iit.uni-miskolc.hu/~szkovacs Keywords: reinforcement learning, fuzzy Q-learning, fuzzy rule interpolation. Abstract Fuzzy Q-learning, the fuzzy extension of the Reinforcement Learning (RL) is a well known topic in computational intelligence. It can be used to solve control problems in continuous unknown environments without defining an exact method on how to solve problems in various situations. In the RL concept the problem to be solved is hidden in the feedback of the environment, called reward or punishment (positive or negative reward). From these rewards the system can learn which action is considered to be the best choice in a given state. One of the most frequently applied RL method is the “Q-learning”. The goal of the Q-learning method is to find an optimal policy for the system by building the state-action-value function. The state-action-value-function is a function of the expected return (a function of the cumulative reinforcements), related to a given state and a taken action following the optimal policy. The original Q-learning method was introduced for discrete states and actions. With the application of fuzzy reasoning the method can be adapted for continuous environment, called Fuzzy Q-learning (FQ-Learning). Traditional Fuzzy Q-learning embeds the 0-order Takagi-Sugeno fuzzy inference and hence the requirement of the state-action-value-function representation as a complete fuzzy rule base. The main goal of this paper is to introduce an extension of the traditional fuzzy Q-learning method with the capability of handling sparse fuzzy rule-bases. To achieve this, the paper suggests to apply Fuzzy Rule Interpolation (FRI), namely the FIVE (Fuzzy rule Interpolation based on Vague Environment) to be the model applied with Q-learning (FRIQ-learning). The paper also includes an application example, the well known cart pole (reversed pendulum) problem, for demonstrating the applicability of the suggested FRIQ-learning. References [1] M. Appl, “Model-based Reinforcement Learning in Continuous Environment”, Ph.D. thesis, Technical University of München, München, Germany, dissertation.de, Verlag im Internet, 2000. [2] P. Baranyi, L. T. Kóczy and T. D. Gedeon, “A Generalized Concept for Fuzzy Rule Interpolation”, IEEE Trans. on Fuzzy Systems, vol. 12, no. 6, pp. 820-837, 2004. [3] R. E. Bellman, Dynamic Programming, Princeton University Press, Princeton, NJ, 1957. [4] H. R. Berenji, “Fuzzy Q-Learning for Generalization of Reinforcement Learning”, in: Proc. 5th IEEE International Conference on Fuzzy Systems, 1996, pp. 2208-2214. [5] A. Bonarini, “Delayed Reinforcement, Fuzzy Q-Learning and Fuzzy Logic Controllers”, in: Herrera, F., Verdegay, J. L. (Eds.) Genetic Algorithms and Soft Computing, (Studies in Fuzziness, 8), Physica-Verlag, Berlin, 1996. pp. 447-466. [6] T. Horiuchi, A. Fujino, O. Katai and T. Sawaragi, “Fuzzy Interpolation-Based Q-learning with Continuous States and Actions”, in Proc. 5th IEEE International Conference on Fuzzy Systems, 1996, vol. 1, pp. 594-600. [7] Zs. Cs. Johanyák, D. Tikk, Sz. Kovács and K. W. Wong, “Fuzzy Rule Interpolation Matlab Toolbox – FRI Toolbox”, in Proc. IEEE World Congress on Computational Intelligence (WCCI\'06), 15th Int. Conf. on Fuzzy Systems (FUZZ-IEEE\'06), Vancouver, BC, Canada, Omnipress, 2006, pp. 1427-1433. [8] F. Klawonn, “Fuzzy Sets and Vague Environments”, Fuzzy Sets and Systems, vol. 66, pp. 207-221, 1994. [9] Sz. Kovács, and L.T. Kóczy, “Approximate Fuzzy Reasoning Based on Interpolation in the Vague Environment of the Fuzzy Rule base as a Practical Alternative of the Classical CRI”, in: Proceedings of the 7th International Fuzzy Systems Association World Congress, Prague, Czech Republic, 1997, pp. 144-149. [10] Sz. Kovács, “Extending the Fuzzy Rule Interpolation \"FIVE\" by Fuzzy Observation”, in: Advances in Soft Computing, Computational Intelligence, Theory and Applications, Bernd Reusch (Ed.), Springer Germany, 2006, pp. 485-497. [11] Sz. Kovács, “New Aspects of Interpolative Reasoning”, in Proceedings of the 6th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Granada, Spain, 1996, pp. 477-482. [12] Sz. Kovács, “SVD Reduction in Continuos Environment Reinforcement Learning”, in: Lecture Notes in Computer Science, Vol. 2206, Computational Intelligence, Theory and Applications, Bernard Reusch (Ed.), Springer-Verlag, 2001, pp. 719-738. [13] Sz. Kovács and L.T. Kóczy, “The use of the concept of vague environment in approximate fuzzy reasoning”, Fuzzy Set Theory and Applications, Tatra Mountains Mathematical Publications, Mathematical Institute Slovak Academy of Sciences, Bratislava, Slovak Republic, vol. 12, pp. 169-181, 1997. [14] Z. Krizsán and Sz. Kovács, “Gradient based parameter optimisation of FRI \"FIVE\"”, Proceedings of 9th International Symposium of Hungarian Researchers on Computational Intelligence and Informatics, Budapest, Hungary, 2008, pp. 531-538. [15] J. A. Martin H and J. de Lope, “A Distributed Reinforcement Learning Architecture for Multi-Link Robots” in; Proceedings of 4th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2007), Angers, France, 2007, pp. 192-197. [16] G. A. Rummery and M. Niranjan, “On-line Q-learning using connectionist systems”, CUED/F-INFENG/TR 166, Cambridge University, UK, 1994. [17] D. Shepard, “A two dimensional interpolation function for irregularly spaced data”, in: Proc. 23rd ACM Internat. Conf., 1968, pp. 517-524. [18] Sz. Kovács, “Interpolative Fuzzy Reasoning in Behaviour-based Control”, in: Advances in Soft Computing, Vol. 2, Computational Intelligence, Theory and Applications, Bernd Reusch (Ed.), Springer-Verlag, 2005, pp. 159-170. [19] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, 1998. [20] D. Tikk, I. Joó, L. T. Kóczy, P. Várlaki, B. Moser, and T. D. Gedeon, “Stability of interpolative fuzzy KH-controllers”, Fuzzy Sets and Systems, vol. 125, no. 1, pp. 105-119, 2002. [21] D. Vincze and Sz. Kovács, “Using Fuzzy Rule Interpolation-based Automata for Controlling Navigation and Collision Avoidance Behaviour of a Robot”, in: Proceedings of IEEE 6th International Conference on Computational Cybernetics, Stara Lesná, Slovakia, 2008, pp. 79-84. [22] C. J. C. Hellaby Watkins, “Learning from Delayed Rewards”, Ph.D. thesis, Cambridge University, Cambridge, UK, 1989. [23] http://www.iit.uni-miskolc.hu/~vinczed/. [24] http://fri.gamf.hu/. [25] http://www.iit.uni-miskolc.hu/~szkovacs. [26] http://www.dia.fi.upm.es/~jamartin/download.htm. |