A robust exploration strategy in reinforcement learning based on temporal difference error.

Hajar, Muhammad Shadi; Kalutarage, Harsha; Al-Kadri, M. Omar

doi:10.1007/978-3-031-22695-3_55

A robust exploration strategy in reinforcement learning based on temporal difference error.

Hajar, Muhammad Shadi; Kalutarage, Harsha; Al-Kadri, M. Omar

Authors

Dr Muhammad Shadi Hajar m.hajar1@rgu.ac.uk
Lecturer

Dr Harsha Kalutarage h.kalutarage@rgu.ac.uk
Senior Lecturer

M. Omar Al-Kadri

Contributors

Haris Aziz
Editor

Débora Corrêa
Editor

Tim French
Editor

Abstract

Exploration is a critical component in reinforcement learning algorithms. Exploration exploitation trade-off is still a fundamental dilemma in reinforcement learning. The learning agent needs to learn how to deal with a stochastic environment in order to maximize the accumulated long-term reward. This paper proposes a robust exploration strategy (RES) based on the temporal difference error. In RES, the exploration problem is modeled using Beta probability distribution to control the exploration rate. Moreover, the most promising action is selected during the exploration with a view to maximizing the accumulated reward and avoiding un-rewardable wrong actions. RES has been evaluated on the k-armed bandit problem. The simulation results show superior performance without the need to tune parameters.

Citation

HAJAR, M.S., KALUTARAGE, H. and AL-KADRI, M.O. 2022. A robust exploration strategy in reinforcement learning based on temporal difference error. In Aziz, H., Corrêa, D. and French, T. (eds.) AI 2022: advances in artificial intelligence; proceedings of the 35th Australasian joint conference 2022 (AI 2022), 5-8 December 2022, Perth, Australia. Lecture notes in computer science (LNCS), 13728. Cham: Springer [online], pages 789-799. Available from: https://doi.org/10.1007/978-3-031-22695-3_55

Conference Name	35th Australasian joint conference 2022 (AI 2022)
Conference Location	Perth, Australia
Start Date	Dec 5, 2022
End Date	Dec 8, 2022
Acceptance Date	Sep 12, 2022
Online Publication Date	Dec 3, 2022
Publication Date	Dec 31, 2022
Deposit Date	Jan 12, 2023
Publicly Available Date	Dec 4, 2023
Publisher	Springer
Pages	789-799
Series Title	Lecture notes in computer science (LNCS)
Series Number	13728
Series ISSN	0302-9743
Book Title	AI 2022: advances in artificial intelligence; proceedings of the 35th Australasian joint conference 2022 (AI 2022), 5-8 December 2022, Perth, Australia
ISBN	9783031226946
DOI	https://doi.org/10.1007/978-3-031-22695-3_55
Keywords	Reinforcement learning; Exploration; Exploitation; Q-learning; k-armed bandit; ε-greedy; Softmax
Public URL	https://rgu-repository.worktribe.com/output/1823882