Yiming Yang, Dengpeng Xing, Wannian Xia, Peng Wang. Guided Proximal Policy Optimization with Structured Action Graph for Complex Decision-making[J]. Machine Intelligence Research. DOI: 10.1007/s11633-024-1503-7
Citation: Yiming Yang, Dengpeng Xing, Wannian Xia, Peng Wang. Guided Proximal Policy Optimization with Structured Action Graph for Complex Decision-making[J]. Machine Intelligence Research. DOI: 10.1007/s11633-024-1503-7

Guided Proximal Policy Optimization with Structured Action Graph for Complex Decision-making

  • Reinforcement learning encounters formidable challenges when tasked with intricate decision-making scenarios, primarily due to the expansive parameterized action spaces and the vastness of the corresponding policy landscapes. To surmount these difficulties, we devise a practical structured action graph model augmented by guiding policies that integrate trust region constraints. Based on this, we propose guided proximal policy optimization with structured action graph (GPPO-SAG), which has demonstrated pronounced efficacy in refining policy learning and enhancing performance across sophisticated tasks characterized by parameterized action spaces. Rigorous empirical evaluations of our model have been performed on comprehensive gaming platforms, including the entire suite of StarCraft II and Hearthstone, yielding exceptionally favorable outcomes. Our source code is at https://github.com/sachiel321/GPPO-SAG.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return