Skip to main content

Research Repository

Advanced Search

A robust exploration strategy in reinforcement learning based on temporal difference error.

Hajar, Muhammad Shadi; Kalutarage, Harsha; Al-Kadri, M. Omar

Authors

M. Omar Al-Kadri



Contributors

Haris Aziz
Editor

Débora Corrêa
Editor

Tim French
Editor

Abstract

Exploration is a critical component in reinforcement learning algorithms. Exploration exploitation trade-off is still a fundamental dilemma in reinforcement learning. The learning agent needs to learn how to deal with a stochastic environment in order to maximize the accumulated long-term reward. This paper proposes a robust exploration strategy (RES) based on the temporal difference error. In RES, the exploration problem is modeled using Beta probability distribution to control the exploration rate. Moreover, the most promising action is selected during the exploration with a view to maximizing the accumulated reward and avoiding un-rewardable wrong actions. RES has been evaluated on the k-armed bandit problem. The simulation results show superior performance without the need to tune parameters.

Citation

HAJAR, M.S., KALUTARAGE, H. and AL-KADRI, M.O. 2022. A robust exploration strategy in reinforcement learning based on temporal difference error. In Aziz, H., Corrêa, D. and French, T. (eds.) AI 2022: advances in artificial intelligence; proceedings of the 35th Australasian joint conference 2022 (AI 2022), 5-8 December 2022, Perth, Australia. Lecture notes in computer science (LNCS), 13728. Cham: Springer [online], pages 789-799. Available from: https://doi.org/10.1007/978-3-031-22695-3_55

Conference Name 35th Australasian joint conference 2022 (AI 2022)
Conference Location Perth, Australia
Start Date Dec 5, 2022
End Date Dec 8, 2022
Acceptance Date Sep 12, 2022
Online Publication Date Dec 3, 2022
Publication Date Dec 31, 2022
Deposit Date Jan 12, 2023
Publicly Available Date Dec 4, 2023
Publisher Springer
Pages 789-799
Series Title Lecture notes in computer science (LNCS)
Series Number 13728
Series ISSN 0302-9743
Book Title AI 2022: advances in artificial intelligence; proceedings of the 35th Australasian joint conference 2022 (AI 2022), 5-8 December 2022, Perth, Australia
ISBN 9783031226946
DOI https://doi.org/10.1007/978-3-031-22695-3_55
Keywords Reinforcement learning; Exploration; Exploitation; Q-learning; k-armed bandit; ε-greedy; Softmax
Public URL https://rgu-repository.worktribe.com/output/1823882

Files




You might also like



Downloadable Citations