Research Article

Reinforcement Learning for Interference Coordination Stackelberg Games in Heterogeneous Cellular Networks

Algorithm 1

Two-stage -learning algorithm.
Initializes S=S0 and Q(s,a)=0.
Sets the values =0.1
Loop % start an update episode t
If t<50 % In the pre-training stage
   The agent selects an action randomly from the action set;
   Update Q value according to
   
Else
   Generate a random number num;
    If num<ε %‘exploration’ is selected
      The agent selects an action which can get largest Q value from action set;
    Else% ‘exploitation’ is selected
      Agent select an action randomly from the action set;
    Update Q value according to
      
    Update state
    End if
  End if
  Until t is terminal
  End Loop