Research Article

Exploration Entropy for Reinforcement Learning

Algorithm 1

Q-learning.
Initialize arbitrarily
Initialize the policy
repeat
 Initialize s, , τ
repeat
  a  action with probability for
  Take action a, observe reward r, and next state
  
  Update with Softmax strategy
  
  ,
until is destination
until the learning process ends