Mathematical Problems in Engineering

Research Article

Exploration Entropy for Reinforcement Learning

Probabilistic Q-learning.

	Initialize arbitrarily
	Initialize the policy
	repeat
	Initialize s,
	repeat
	a action with probability for
	Take action a, observe reward r, and next state


	Normalize

	,
	until is destination

	until the learning process ends