A Metaheuristic-Based Weight Optimization for Robust Deep Reinforcement Learning in Continuous Control
Published in Swarm and Evolutionary Computation, 2025
In recent studies, the policy-based deep reinforcement learning (DRL) algorithms have exhibited superior performance in addressing continuous control problems, such as machine arms control and robot gait learning. However, these algorithms frequently face challenges inherent in gradient descent-based weight optimization methods, including susceptibility to local optima, slow learning speeds due to saddle points, approximation errors, and suboptimal hyperparameters. This instability leads to significant performance discrepancies among agent instances trained under identical settings, which complicates the practical application of reinforcement learning. To address this, we propose a metaheuristic-based weight optimization framework designed to mitigate learning instability in DRL for continuous control tasks. The proposed framework introduces a two-phase optimization process, where an additional search phase using swarm intelligence algorithms is conducted at the end of the learning phase utilizing DRL. In numerical experiments, the proposed framework demonstrated superior and more stable performance compared to conventional DRL algorithms in robot locomotion tasks.
Recommended citation: Ko, G., & Huh, J.* (2025), A Metaheuristic-Based Weight Optimization for Robust Deep Reinforcement Learning in Continuous Control, Swarm and Evolutionary Computation, 95, 101920. (SCIE)
Download Paper
