Dr Muhammad Shadi Hajar m.hajar1@rgu.ac.uk
Lecturer
A robust exploration strategy in reinforcement learning based on temporal difference error.
Hajar, Muhammad Shadi; Kalutarage, Harsha; Al-Kadri, M. Omar
Authors
Dr Harsha Kalutarage h.kalutarage@rgu.ac.uk
Associate Professor
M. Omar Al-Kadri
Contributors
Haris Aziz
Editor
Débora Corrêa
Editor
Tim French
Editor
Abstract
Exploration is a critical component in reinforcement learning algorithms. Exploration exploitation trade-off is still a fundamental dilemma in reinforcement learning. The learning agent needs to learn how to deal with a stochastic environment in order to maximize the accumulated long-term reward. This paper proposes a robust exploration strategy (RES) based on the temporal difference error. In RES, the exploration problem is modeled using Beta probability distribution to control the exploration rate. Moreover, the most promising action is selected during the exploration with a view to maximizing the accumulated reward and avoiding un-rewardable wrong actions. RES has been evaluated on the k-armed bandit problem. The simulation results show superior performance without the need to tune parameters.
Citation
HAJAR, M.S., KALUTARAGE, H. and AL-KADRI, M.O. 2022. A robust exploration strategy in reinforcement learning based on temporal difference error. In Aziz, H., Corrêa, D. and French, T. (eds.) AI 2022: advances in artificial intelligence; proceedings of the 35th Australasian joint conference 2022 (AI 2022), 5-8 December 2022, Perth, Australia. Lecture notes in computer science (LNCS), 13728. Cham: Springer [online], pages 789-799. Available from: https://doi.org/10.1007/978-3-031-22695-3_55
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | 35th Australasian joint conference 2022 (AI 2022) |
Start Date | Dec 5, 2022 |
End Date | Dec 8, 2022 |
Acceptance Date | Sep 12, 2022 |
Online Publication Date | Dec 3, 2022 |
Publication Date | Dec 31, 2022 |
Deposit Date | Jan 12, 2023 |
Publicly Available Date | Dec 4, 2023 |
Publisher | Springer |
Peer Reviewed | Peer Reviewed |
Pages | 789-799 |
Series Title | Lecture notes in computer science (LNCS) |
Series Number | 13728 |
Series ISSN | 0302-9743 |
Book Title | AI 2022: advances in artificial intelligence; proceedings of the 35th Australasian joint conference 2022 (AI 2022), 5-8 December 2022, Perth, Australia |
ISBN | 9783031226946 |
DOI | https://doi.org/10.1007/978-3-031-22695-3_55 |
Keywords | Reinforcement learning; Exploration; Exploitation; Q-learning; k-armed bandit; ε-greedy; Softmax |
Public URL | https://rgu-repository.worktribe.com/output/1823882 |
Files
HAJAR 2022 A robust exploration strategy (AAM)
(514 Kb)
PDF
You might also like
ETAREE: an effective trend-aware reputation evaluation engine for wireless medical sensor networks.
(2020)
Presentation / Conference Contribution
TrustMod: a trust management module for NS-3 simulator.
(2021)
Presentation / Conference Contribution
Downloadable Citations
About OpenAIR@RGU
Administrator e-mail: publications@rgu.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search