Mathematical Problems in Engineering

Research Article

Exploration Entropy for Reinforcement Learning

Q-learning.

	Initialize arbitrarily
	Initialize the policy
	repeat
	Initialize s, , τ
	repeat
	a action with probability for
	Take action a, observe reward r, and next state

	Update with Softmax strategy

	,
	until is destination

	until the learning process ends