Research Article
Exploration Entropy for Reinforcement Learning
Figure 11
The learning control performance of RL algorithms is used by the Bang-Bang control method. For each algorithm, the step convergence effect, the Exploration Entropy effect, the quantum system state transition path, and the control sequence learned (−1 for negative pulse and +1 for positive pulse) are shown separately. (a) Q-learning. (b) PQL.
(a) |
(b) |