Research Article
Reinforcement Learning for Interference Coordination Stackelberg Games in Heterogeneous Cellular Networks
Algorithm 1
Two-stage
-learning algorithm.
Initializes S=S0 and Q(s,a)=0. | Sets the values =0.1 | Loop % start an update episode t | If t<50 % In the pre-training stage | The agent selects an action randomly from the action set; | Update Q value according to | | Else | Generate a random number num; | If num<ε %‘exploration’ is selected | The agent selects an action which can get largest Q value from action set; | Else% ‘exploitation’ is selected | Agent select an action randomly from the action set; | Update Q value according to | | Update state | End if | End if | Until t is terminal | End Loop |
|