Online Learning Algorithms For Differential Dynamic Games And Optimal Control

Online Learning Algorithms For Differential Dynamic Games And Optimal Control

Date

2011-07-14

Publisher

Electrical Engineering

Abstract

Optimal control deals with the problem of finding a control law for a given system that a certain optimality criterion is achieved. It can be derived using Pontryagin's maximum principle (a necessary condition), or by solving the Hamilton-Jacobi-Bellman equation (a sufficient condition). Major drawback of optimal control is that it is offline. Adaptive control involves modifying the control law used by a controller to cope with the facts that the system is unknown or uncertain. Adaptive controllers are not optimal. Adaptive optimal controllers have been proposed by adding optimality criteria to an adaptive controller, or adding adaptive characteristics to an optimal controller. In this work, online adaptive learning algorithms are developed for optimal control and differential dynamic games by using measurements along the trajectory or input/output data. These algorithms are based on actor/critic schemes and involve simultaneous tuning of the actor/critic neural networks and provide online solutions to complex Hamilton-Jacobi equations, along with convergence and Lyapunov stability proofs. The research begins with the development of an online algorithm based on policy iteration for learning the continuous-time (CT) optimal control solution with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the optimal control design Hamilton-Jacobi (HJ) equation. This is called synchronous' policy iteration. Then it became interesting to develop an online learning algorithm to solve the continuous-time two-player zero-sum game with infinite horizon cost for nonlinear systems. The algorithm learns online in real-time the solution to the game design Hamilton-Jacobi-Isaacs equation. This algorithm is called online gaming algorithm synchronous' zero-sum game policy iteration. One of the major outcomes of this work is the online learning algorithm to solve the continuous time multi player non-zero sum games with infinite horizon for linear and nonlinear systems. The adaptive algorithm learns online the solution of coupled Riccati and coupled Hamilton-Jacobi equations for linear and nonlinear systems respectively. The optimal-adaptive algorithm is implemented as a separate actor/critic parametric network approximator structure for every player, and involves simultaneous continuous-time adaptation of the actor/critic networks. The next result shows how to implement Approximate Dynamic Programming methods using only measured input/output data from the systems. Policy and value iteration algorithms have been developed that converge to an optimal controller that requires only output feedback. The notion of graphical games is developed for dynamical systems, where the dynamics and performance indices for each node depend only on local neighbor information. A cooperative policy iteration algorithm, is given for graphical games, that converges to the best response when the neighbors of each agent do not update their policies and to the cooperative Nash equilibrium when all agents update their policies simultaneously. Finally, a synchronous policy iteration algorithm based on integral reinforcement learning is given. This algorithm does not need the drift dynamics.