Research Article
Exploration Entropy for Reinforcement Learning
Algorithm 2
Probabilistic Q-learning.
| Initialize arbitrarily | | Initialize the policy | | repeat | | Initialize s, | | repeat | | a action with probability for | | Take action a, observe reward r, and next state | | | | | | Normalize | | | | , | | until is destination | | | | until the learning process ends |
|