Exploration Entropy for Reinforcement Learning

<div>The learning control performance of RL algorithms is used by the three-switch control method. For each algorithm, the step convergence effect, the <i>Exploration Entropy</i> effect, the quantum system state transition path, and the control sequence learned (0 for no pulse, −1 for negative pulse, and +1 for positive pulse) are shown separately. (a) Q-learning. (b) PQL.</div>

Mathematical Problems in Engineering

Exploration Entropy for Reinforcement Learning

Figure 10