Muhammad Shadi Hajar
A robust exploration strategy in reinforcement learning based on temporal difference error.
Hajar, Muhammad Shadi; Kalutarage, Harsha; Al-Kadri, M. Omar
Authors
Contributors
Haris Aziz
Editor
Débora Corrêa
Editor
Tim French
Editor
Abstract
Exploration is a critical component in reinforcement learning algorithms. Exploration exploitation trade-off is still a fundamental dilemma in reinforcement learning. The learning agent needs to learn how to deal with a stochastic environment in order to maximize the accumulated long-term reward. This paper proposes a robust exploration strategy (RES) based on the temporal difference error. In RES, the exploration problem is modeled using Beta probability distribution to control the exploration rate. Moreover, the most promising action is selected during the exploration with a view to maximizing the accumulated reward and avoiding un-rewardable wrong actions. RES has been evaluated on the k-armed bandit problem. The simulation results show superior performance without the need to tune parameters.
Citation
HAJAR, M.S., KALUTARAGE, H. and AL-KADRI, M.O. 2022. A robust exploration strategy in reinforcement learning based on temporal difference error. In Aziz, H., Corrêa, D. and French, T. (eds.) AI 2022: advances in artificial intelligence; proceedings of the 35th Australasian joint conference 2022 (AI 2022), 5-8 December 2022, Perth, Australia. Lecture notes in computer science (LNCS), 13728. Cham: Springer [online], pages 789-799. Available from: https://doi.org/10.1007/978-3-031-22695-3_55
Conference Name | 35th Australasian joint conference 2022 (AI 2022) |
---|---|
Conference Location | Perth, Australia |
Start Date | Dec 5, 2022 |
End Date | Dec 8, 2022 |
Acceptance Date | Sep 12, 2022 |
Online Publication Date | Dec 3, 2022 |
Publication Date | Dec 31, 2022 |
Deposit Date | Jan 12, 2023 |
Publicly Available Date | Dec 4, 2023 |
Publisher | Springer |
Pages | 789-799 |
Series Title | Lecture notes in computer science (LNCS) |
Series Number | 13728 |
Series ISSN | 0302-9743 |
Book Title | AI 2022: advances in artificial intelligence; proceedings of the 35th Australasian joint conference 2022 (AI 2022), 5-8 December 2022, Perth, Australia |
ISBN | 9783031226946 |
DOI | https://doi.org/10.1007/978-3-031-22695-3_55 |
Keywords | Reinforcement learning; Exploration; Exploitation; Q-learning; k-armed bandit; ε-greedy; Softmax |
Public URL | https://rgu-repository.worktribe.com/output/1823882 |
Files
This file is under embargo until Dec 4, 2023 due to copyright reasons.
Contact publications@rgu.ac.uk to request a copy for personal use.
You might also like
Keep the moving vehicle secure: context-aware intrusion detection system for in-vehicle CAN bus security.
(2022)
Conference Proceeding
Developing secured android applications by mitigating code vulnerabilities with machine learning.
(2022)
Conference Proceeding
Robust, effective and resource efficient deep neural network for intrusion detection in IoT networks.
(2022)
Conference Proceeding
Improving intrusion detection through training data augmentation.
(2021)
Conference Proceeding
Reasoning with counterfactual explanations for code vulnerability detection and correction.
(2021)
Conference Proceeding