Research Article
Modeling the Peak-Period Bus Commuting Behavior with Staggered Work Hours Using a Regret-Minimizing Learning Method
Algorithm 1
Algorithm overview (for one agent only).
(1) | Initialize Q-table: ; | (2) | Initialize history of estimates: ; | (3) | Initialize learning and exploration rates: , ; | (4) | Fordo | (5) | Receive app recommendations ; | (6) | Update learning and exploration rates: , ; | (7) | Choose action using policy derived from Q-table; | (8) | Take action and observe the commuting cost using equation (1); | (9) | Update estimate using equation (5); | (10) | Update regret of action using equation (6); | (11) | Update Q value of using equation (8): | (12) | End |
|