Novel single-loop policy iteration for linear zero-sum games (2024)

rapid-communication

Authors: Jianguo Zhao, Chunyu Yang, Weinan Gao, and Ju H. Park

Published: 09 July 2024 Publication History

  • 0citation
  • 0
  • Downloads

Metrics

Total Citations0Total Downloads0

Last 12 Months0

Last 6 weeks0

  • Get Citation Alerts

    New Citation Alert added!

    This alert has been successfully added and will be sent to:

    You will be notified whenever a record that you have chosen has been cited.

    To manage your alert preferences, click on the button below.

    Manage my Alerts

    New Citation Alert!

    Please log in to your account

      • View Options
      • References
      • Media
      • Tables
      • Share

    Abstract

    The infinite-horizon zero-sum game of a linear system can be resorted to solve a Game algebraic Riccati equation (GARE) with indefinite quadratic term. Double-loop policy iteration algorithm is often used to find the solution of such GARE, but its calculation is usually time-consuming. In this work, we propose a novel model-based single-loop policy iteration algorithm to solve GARE and the convergence of the algorithm is guaranteed by the boundness of the iterative sequence and the comparison result. Furthermore, we devise a data-driven single-loop policy iteration algorithm for solving linear zero-sum games, without requiring the knowledge of system dynamics. Compared to the existing Newton’s single-loop methods, the initialization of our algorithms is significantly relaxed and easier to implement. Two numerical examples are included to illustrate the proposed algorithms.

    References

    [1]

    Abu-Khalaf,M., Karaman,S., & Rus,D. (2019). Shared linear quadratic regulation control: A reinforcement learning approach. In IEEE 58th conference on decision and control (pp. 4569–4576). Nice, France.

    [2]

    Abu-Khalaf M., Lewis F.L., Huang J., Policy iterations and the Hamilton-Jacobi-Isaacs equation for H ∞ state feedback control with input saturation, IEEE Transactions on Automatic Control 51 (12) (2006) 1989–1995.

    [3]

    Basar T., Bernhard P., H-infinity optimal control and related minimax design problems: A dynamic game approach, Springer, New York, NY, USA, 2008.

    [4]

    Chen C., Lewis F.L., Li B., hom*otopic policy iteration-based learning design for unknown linear continuous-time systems, Automatica 138 (2022).

    [5]

    Chen C., Lewis F.L., Xie S., Modares H., Liu Z., Zuo S., et al., Resilient adaptive and H ∞ controls of multi-agent systems under sensor and actuator faults, Automatica 102 (2019) 19–26.

    [6]

    Fu Y., Fu J., Chai T., Robust adaptive dynamic programming of two-player zero-sum games for continuous-time linear systems, IEEE Transactions on Neural Networks and Learning Systems 26 (12) (2015) 3314–3319.

    [7]

    Gao W., Deng C., Jiang Y., Jiang Z.-P., Resilient reinforcement learning and robust output regulation under denial-of-service attacks, Automatica 142 (2022).

    [8]

    Gao W., Mynuddin M., Wunsch D.C., Jiang Z.-P., Reinforcement learning-based cooperative optimal output regulation via distributed adaptive internal model, IEEE Transactions on Neural Networks and Learning Systems 3 (10) (2022) 5229–5240.

    [9]

    Horn R.A., Johnson C.R., Matrix analysis, Cambridge University Press, Cambridge, U.K., 1985.

    Digital Library

    [10]

    Ivanov I.G., Ivanov I.G., Netov N.C., On the iterative solution to H ∞ control problems, Applied Mathematics 6 (2015) 1263–1270.

    [11]

    Jha S.K., Roy S.B., Bhasin S., Initial excitation-based iterative algorithm for approximate optimal control of completely unknown LTI systems, IEEE Transactions on Automatic Control 64 (12) (2019) 5230–5237.

    [12]

    Jiang Y., Jiang Z.-P., Robust adaptive dynamic programming, Wiley-IEEE Press, Hoboken, 2017.

    [13]

    Jiang H., Zhou B., Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems, Automatica 136 (2022).

    [14]

    Kiumarsi B., Vamvoudakis K.G., Modares H., Lewis F.L., Optimal and autonomous control using reinforcement learning: A survey, IEEE Transactions on Neural Networks and Learning Systems 29 (6) (2018) 2042–2062.

    [15]

    Kleinman D., On an iterative technique for Riccati equation computations, IEEE Transactions on Automatic Control (1) (1968) 114–115.

    [16]

    Kolaric P., Lopez V.G., Lewis F.L., Optimal dynamic control allocation with guaranteed constraints and online reinforcement learning, Automatica 122 (2020).

    [17]

    Lancaster P., Rodman L., Algebraic Riccati equations, Oxford University Press, Oxford, U.K., 1995.

    [18]

    Lanzon A., Feng Y., Anderson B.D.O., Rotkowitz M., Computing the positive stabilizing solution to algebraic Riccati equations with an indefinite quadratic term via a recursive method, IEEE Transactions on Automatic Control 53 (10) (2008) 2280–2291.

    [19]

    Laub A.J., A Schur method for solving algebraic Riccati equation, IEEE Transactions on Automatic Control 24 (6) (1979) 913–921.

    [20]

    Li H., Liu D., Wang D., Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics, IEEE Transactions on Automation Science and Engineering 11 (3) (2014) 706–714.

    [21]

    Li J., Xiao Z., Fan J., Chai T., Lewis F.L., Off-policy Q-learning: Solving Nash equilibrium of multi-player games with network-induced delay and unmeasured state, Automatica 136 (2022).

    [22]

    Lian B., Lewis F.L., Hewer G., Estabridis K., Chai T., Online learning of minmax solutions for distributed estimation and tracking control of sensor networks in graphical games, IEEE Transactions on Control of Network Systems 9 (4) (2022) 1923–1936.

    [23]

    Lian B., Xue W., Lewis F.L., Chai T., Inverse reinforcement learning for multi-player noncooperative apprentice games, Automatica 145 (2022).

    [24]

    Lin F., Robust control design: An optimal control approach, Wiley, Hoboken, NJ, USA, 2007.

    [25]

    Liu M., Wan Y., Lewis F.L., Lopez V.G., Adaptive optimal control for stochastic multiplayer differential games using on-policy and off-policy reinforcement learning, IEEE Transactions on Neural Networks and Learning Systems 31 (12) (2020) 5522–5533.

    [26]

    Liu M., Wan Y., Lopez V.G., Lewis F.L., Hewer G.A., Estabridis K., Differential graphical game with distributed global Nash solution, IEEE Transactions on Control of Network Systems 8 (3) (2021) 1371–1382.

    [27]

    Liu Z., Wu H.-N., New insight into the simultaneous policy update algorithms related to H ∞ state feedback control, Information Sciences 484 (2019) 84–94.

    [28]

    Liu D., Xue S., Zhao B., Luo B., Wei Q., Adaptive dynamic programming for control: A survey and recent advances, IEEE Transactions on Systems, Man, and Cybernetics: Systems 51 (1) (2021) 142–160.

    [29]

    Lopez V.G., Lewis F.L., Dynamic multiobjective control for continuous-time systems using reinforcement learning, IEEE Transactions on Automatic Control 64 (7) (2019) 2869–2874.

    [30]

    Lopez V.G., Lewis F.L., Wan Y., Liu M., Hewer G., Estabridis K., Stability and robustness analysis of minmax solutions for differential graphical games, Automatica 121 (2020).

    [31]

    Odekunle A., Gao W., Davari M., Jiang Z.-P., Reinforcement learning and non-zero-sum game output regulation for multi-player linear uncertain systems, Automatica 112 (2020).

    [32]

    Rizvi S.A.A., Lin Z., Output feedback adaptive dynamic programming for linear differential zero-sum games, Automatica 122 (2020).

    [33]

    Sassano M., Astolfi A., Combining Pontryagin’s principle and dynamic programming for linear and nonlinear systems, IEEE Transactions on Automatic Control 65 (12) (2020) 5312–5327.

    [34]

    Vamvoudakis K.G., Fotiadis F., Kanellopoulos A., Kokolakis N.-M.T., Nonequilibrium dynamical games: A control systems perspective, Annual Reviews in Control 53 (2022) 6–18.

    [35]

    Vamvoudakis K.G., Hespanha J.P., Cooperative Q-learning for rejection of persistent adversarial inputs in networked linear quadratic systems, IEEE Transactions on Automatic Control 63 (4) (2018) 1018–1031.

    [36]

    Vamvoudakis K.G., Modares H., Kiumarsi B., Lewis F.L., Game theory-based control system algorithms with real-time reinforcement learning: How to solve multiplayer games online, IEEE Control Systems Magazine 37 (1) (2017) 33–52.

    [37]

    Vrabie D., Lewis F.L., Adaptive dynamic programming for online solution of a zero-sum differential game, Journal of Control Theory and Applications 9 (3) (2011) 353–360.

    [38]

    Wu C., Li X., Pan W., Liu J., Wu L., Zero-sum game-based optimal secure control under actuator attacks, IEEE Transactions on Automatic Control 66 (8) (2021) 3773–3780.

    [39]

    Wu H.-N., Luo B., Simultaneous policy update algorithms for learning the solution of linear continuous-time H ∞ state feedback control, Information Sciences 222 (2013) 472–485.

    [40]

    Zhao J., Yang C., Dai W., Gao W., Reinforcement learning-based composite optimal operational control of industrial systems with multiple unit devices, IEEE Transactions on Industrial Informatics 18 (2) (2022) 1091–1101.

    [41]

    Zhao J., Yang C., Gao W., Modares H., Chen X., Dai W., Linear quadratic tracking control of unknown systems: A two-phase reinforcement learning method, Automatica 148 (2023).

    [42]

    Zhao J., Yang C., Gao W., Zhou L., Reinforcement learning and optimal control of PMSM speed servo system, IEEE Transactions on Industrial Electronics 70 (8) (2023) 8305–8313.

    [43]

    Zhou K., Doyle J.C., Glover K., Robust and optimal control, Prentice Hall, Upper Saddle River, NJ, 1996.

    [44]

    Zhou Y., Vamvoudakis K.G., Haddad W.M., Jiang Z.-P., A secure control learning framework for cyber-physical systems under sensor and actuator attacks, IEEE Transactions on Cybernetics 51 (9) (2021) 4648–4660.

    Recommendations

    • Pure strategy equilibria in symmetric two-player zero-sum games

      Abstract

      We observe that a symmetric two-player zero-sum game has a pure strategy equilibrium if and only if it is not a generalized rock-paper-scissors matrix. Moreover, we show that every finite symmetric quasiconcave two-player zero-sum game has a pure ...

      Read More

    • Convergence Properties of Policy Iteration

      This paper analyzes asymptotic convergence properties of policy iteration in a class of stationary, infinite-horizon Markovian decision problems that arise in optimal growth theory. These problems have continuous state and control variables and must ...

      Read More

    • Policy gradient adaptive dynamic programming for nonlinear discrete-time zero-sum games with unknown dynamics

      Abstract

      A novel policy gradient (PG) adaptive dynamic programming method is developed to deal with nonlinear discrete-time zero-sum games with unknown dynamics. To facilitate the implementation, a policy iteration algorithm is established to approximate ...

      Read More

    Comments

    Information & Contributors

    Information

    Published In

    Novel single-loop policy iteration for linear zero-sum games (1)

    Automatica (Journal of IFAC) Volume 163, Issue C

    May 2024

    520 pages

    ISSN:0005-1098

    Issue’s Table of Contents

    Copyright © 2024.

    Publisher

    Pergamon Press, Inc.

    United States

    Publication History

    Published: 09 July 2024

    Author Tags

    1. Policy iteration
    2. Zero-sum games
    3. Game algebraic Riccati equations
    4. Adaptive dynamic programming

    Qualifiers

    • Rapid-communication

    Contributors

    Novel single-loop policy iteration for linear zero-sum games (2)

    Other Metrics

    View Article Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Total Citations

    • Total Downloads

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0

    Other Metrics

    View Author Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    Get this Publication

    Media

    Figures

    Other

    Tables

    Novel single-loop policy iteration for linear zero-sum games (2024)

    References

    Top Articles
    Latest Posts
    Article information

    Author: Zonia Mosciski DO

    Last Updated:

    Views: 5355

    Rating: 4 / 5 (71 voted)

    Reviews: 94% of readers found this page helpful

    Author information

    Name: Zonia Mosciski DO

    Birthday: 1996-05-16

    Address: Suite 228 919 Deana Ford, Lake Meridithberg, NE 60017-4257

    Phone: +2613987384138

    Job: Chief Retail Officer

    Hobby: Tai chi, Dowsing, Poi, Letterboxing, Watching movies, Video gaming, Singing

    Introduction: My name is Zonia Mosciski DO, I am a enchanting, joyous, lovely, successful, hilarious, tender, outstanding person who loves writing and wants to share my knowledge and understanding with you.